Tenzai Onboarding Guide

⚡ Prerequisites

Tool	Purpose	Installation
Docker Desktop	Containers, local k8s	docker.com
mise	Tool version management	`brew install mise`
AWS CLI v2	AWS SSO authentication	`brew install awscli`
GitHub CLI	Workflow triggers, auth	`brew install gh`

What is mise?

mise (pronounced "meez") is a polyglot tool version manager and task runner. It replaces tools like nvm, pyenv, rbenv, and make.

Why we use mise

Consistent tooling: Everyone uses the same versions of node, terraform, kubectl, etc.
Project-specific versions: Each repo defines its tool versions in mise.toml
Task runner: Common commands like mise run all, mise run test are defined per-project
Automatic activation: Tools are available when you cd into the project

Install mise

# Install
brew install mise

# Add to shell (zsh)
echo 'eval "$(mise activate zsh)"' >> ~/.zshrc
source ~/.zshrc

# For bash
echo 'eval "$(mise activate bash)"' >> ~/.bashrc
source ~/.bashrc

☁️ AWS Setup

Initial SSO Configuration

The first run of mise run all or mise run agent in the tenzai repo auto-configures AWS SSO. For manual setup:

aws configure sso
# SSO start URL: https://tenzai.awsapps.com/start/#
# SSO region: eu-north-1
# Default region: eu-central-1
# Output format: json

AWS Start Page

Before diving into local development, verify you have AWS access:

Open https://tenzai.awsapps.com/start/#
Login with your Tenzai credentials
You should see available AWS accounts (dev, staging, prod)
Try clicking on "dev" → "Management console" to verify access
If you can't access, contact your manager to get AWS permissions

Available Profiles

Profile	Purpose
`dev`	Development/sandbox default
`staging`	Staging environment
`prod`	Production use with caution

Daily Login

The easiest way to login to AWS is through mise:

# This will open browser for SSO login and decrypt secrets
cd ~/projects/tenzai
mise run secrets

Alternatively, use AWS CLI directly:

aws sso login --profile dev

Sessions expire after ~8 hours. If you get authentication errors, just run the login command again.

📦 Repository Overview

Repo	Purpose	Clone URL
tenzai	Main platform (API, UI, Agent)	`git@github.com:TenzaiLtd/tenzai.git`
evaluation	Benchmarking & evaluation	`git@github.com:TenzaiLtd/evaluation.git`
labs	Vulnerable applications	`git@github.com:TenzaiLtd/labs.git`

Clone All Repos

cd ~/projects
git clone git@github.com:TenzaiLtd/tenzai.git
git clone git@github.com:TenzaiLtd/evaluation.git
git clone git@github.com:TenzaiLtd/labs.git

🚀 Tenzai Platform

Location: ~/projects/tenzai

Get Started

cd ~/projects/tenzai
git checkout main && git pull

# Trust and install tools
mise trust
mise install

# Start full platform
mise run all

Directory Structure

Core Services (the main components)

Directory	Purpose
`agent/`	AI security testing agent - the brain that performs automated pentesting. Contains master agents, sub-agents, and phase-based workflow logic
`platform/`	FastAPI backend server - REST API, GraphQL, webhooks, and business logic
`ui/`	Angular frontend application (pnpm/nx workspace)

Agent Toolboxes (containers used by the agent)

Directory	Purpose
`hackbox/`	Containerized security tools (nmap, sqlmap, ffuf, nuclei, etc.) exposed via HTTP API
`browserbox/`	Playwright-based browser automation container for web interaction during security testing
`proxybox/`	OWASP ZAP proxy container for intercepting and analyzing HTTP traffic

Infrastructure

Directory	Purpose
`k8s/`	Kubernetes Helm charts (tenzai-chart, otel-collector-chart)
`infra/`	Terraform infrastructure code and SOPS-encrypted secrets for all environments

Supportive Components

Directory	Purpose
`common/`	Shared Python utilities, database models, and SDK used across services
`cli/`	Command-line interface and TUI (terminal UI) for interacting with the platform
`agent-job-watcher/`	Kopf-based K8s controller that watches agent job events and updates the platform
`lambda/`	AWS Lambda functions for async operations
`integration-tests/`	End-to-end integration test suite
`tests/`	Unit tests for all Python components
`scripts/`	Utility scripts for development and operations

Config Files

File	Purpose
`Tiltfile`	Tilt configuration for local development orchestration
`mise.toml`	Tool versions and task definitions

Local Development with Tilt

What is Tilt?

Tilt is a toolkit for local Kubernetes development. It watches your source code, automatically rebuilds containers, and updates your cluster in real-time. Think of it as "hot reload for Kubernetes."

Key benefits

Automatic rebuilds when you save files
Unified dashboard showing all services
Log aggregation across containers
One command to spin up the entire platform

📚 Docs: https://docs.tilt.dev/

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐ │ Local Development Environment │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ UI │ │ Platform │ │ Agent │ │ │ │ Angular │────▶│ FastAPI │────▶│ Python │ │ │ │ :4200 │ │ :8000 │ │ │ │ │ └──────────────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ PostgreSQL │ │ LocalStack │ │ Hackbox │ │ │ │ :5432 │ │ :4566 │ │ Sec Tools │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ S3 / SQS / │ │ │ │ SNS (mock) │ │ │ └──────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ k3d Cluster (tenzai-local) │ │ │ │ Registry: localhost:5001 │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘

What is LocalStack?

LocalStack emulates AWS services locally so you don't need real AWS resources during development:

S3: File storage for reports and artifacts
SQS: Message queues for async job processing
SNS: Notifications and event publishing

Running the Platform

mise run all

This command:

Syncs Python dependencies (uv sync)
Decrypts SOPS secrets
Creates k3d cluster tenzai-local with local registry
Deploys all services via Helm chart
Sets up LocalStack for AWS emulation
Runs database migrations
Starts the UI dev server

Local Endpoints

Service	URL
UI	http://localhost:4200
API	http://localhost:8000
API Docs	http://localhost:8000/docs
GraphQL	http://localhost:8000/graphql
LocalStack	http://localhost:4566
PostgreSQL	localhost:5432

Common mise Tasks

mise run all           # Full platform via Tilt
mise run agent         # Run agent standalone
mise run lint          # Lint codebase
mise run test          # Unit tests
mise run secrets       # Edit encrypted secrets (SOPS)
mise run local-db:psql # Open psql shell
mise run local-db:reset # Reset DB + seed
mise run jwt           # Get JWT for API testing
mise run k9s:local     # k9s for local cluster
mise run clean         # Complete cleanup

CI/CD Pipeline

The tenzai repo uses GitHub Actions for continuous integration and deployment.

CI Gates (Pull Requests)

Gate	Description
Lint	Ruff (Python) + ESLint (TypeScript) code style checks
Type Check	mypy (Python) + TypeScript compiler
Unit Tests	pytest for Python, Jest for TypeScript
Integration Tests	End-to-end tests against a test cluster
Build	Docker image builds for all services
Security Scan	Dependency vulnerability scanning

PR Environments

When you open a PR, a preview environment is automatically deployed:

URL Pattern: https://pr{number}.dev.tenzai.io/ (e.g., https://pr693.dev.tenzai.io/)
Lifetime: Active while PR is open, destroyed on merge/close
Purpose: Test changes in isolation before merging to main

Deployment Flow

PR Created → CI Checks → PR Environment → Review → Merge

↓

Deploy to Dev → Deploy to Staging → Deploy to Prod

📊 Evaluation System

Location: ~/projects/evaluation

What is the Evaluation System?

The evaluation repo is the benchmarking infrastructure that measures the performance of the Tenzai security agent. It answers: "How well does our agent find vulnerabilities compared to ground truth?"

The evaluation flow:

Deploy a vulnerable lab (from the labs repo) to a GKE cluster
Run the Tenzai agent against the deployed lab
Compare agent findings against known vulnerabilities
Score the results using LLM-based judges
Report metrics to Slack, BigQuery, and dashboards

This allows us to track agent improvements over time and catch regressions.

Get Started

cd ~/projects/evaluation
git checkout main && git pull
uv sync

Directory Structure

evaluation/
├── .github/
│   ├── actions/       # Composite actions (deploy-lab, eval, run-tenzai)
│   └── workflows/     # GitHub Actions workflows
├── cli/               # Dojo CLI
├── dojo-sdk/          # Python SDK for deployments
├── dojo-web/          # Web UI (React)
├── evaluator/         # LLM-based evaluation engine
├── experiments/       # Benchmark orchestration
└── suite-files/       # Lab suite configurations (YAML)

Dojo

Dojo is our lab deployment and management system for deploying vulnerable applications to test the agent.

What Dojo Does

Deploys labs to GKE clusters with proper networking and secrets
Triggers Tenzai agent runs against deployed labs
Tracks deployment status and manages lifecycle (auto-teardown)
Supports individual lab and batch suite deployments

Dojo CLI

# Install globally
uv tool install --from "git+https://github.com/TenzaiLtd/evaluation.git#subdirectory=cli" dojo --force

# Run
uvx dojo

Features: Browse labs, deploy with custom configs, select agent/LLM, view active deployments, destroy labs.

Dojo Web UI

Web interface at https://tenzailtd.github.io/evaluation/ — GitHub OAuth, visual lab browser, real-time deployment tracking.

GitHub Actions Workflows

Workflow	Purpose	Trigger
`run-labs.yml`	Manual lab deployments	workflow_dispatch
`run-sanity.yml`	Nightly exploit validation	Schedule
`run-labs-nightly-tenzai.yml`	Nightly benchmarks	Schedule
`evaluator-ci.yml`	Evaluator tests/lint	PRs

Evaluator

LLM-based system that compares agent findings against ground truth:

cd evaluator
uv sync --dev
uv run pytest tests/

🧪 Labs Repository

Location: ~/projects/labs

What is the Labs Repo?

Contains the source code and infrastructure for all vulnerable applications used in evaluation — the targets that the Tenzai agent scans.

Lab Types

Type	Description	Count
T-Bench Tenzai Built	Complex applications spanning realistic customer tech stacks	26 apps
X-Bench xBow	Single-page apps with capture-the-flag style vulnerabilities	104 labs
OSS Labs Open Source	Real vulnerable apps (Juice Shop, DVWP, OpenCart, Zabbix)	~10 apps
CVE-Bench	Labs reproducing specific CVEs (not actively used yet)	Various

Get Started

cd ~/projects/labs
git checkout main && git pull

Directory Structure

labs/
├── tbench/                     # T-Bench labs (app-001 ... app-023b)
├── xben-001-24/ ... xben-104-24/  # X-Bench labs
├── cve-bench/                  # CVE reproduction labs
├── dvwp/                       # Damn Vulnerable WordPress
├── juiceshop/                  # OWASP Juice Shop
├── opencart/                   # OpenCart e-commerce
└── zabbix/                     # Zabbix monitoring

Local Development

cd labs/tbench/app-001

# Create env configuration
mkdir -p ~/.config/tbench
cat > ~/.config/tbench/.env << EOF
NAMESPACE_PREFIX=local
LOCAL_K8S_CONTEXT=docker-desktop
EOF
ln -s ~/.config/tbench/.env .env

# Deploy locally
just run

# Access at http://localhost:300XX
curl http://localhost:30010/api/health

# Stop
just stop

Remote Deployment (GKE)

# Authenticate
gcloud auth login
gcloud container clusters get-credentials lab-cluster --region=us-central1

# Deploy (commit changes first!)
just deploy

# Get endpoint info
just describe

# Destroy
just destroy

ZeroTier Network Access

Labs on GKE use internal LoadBalancers. To access from your machine:

Install ZeroTier: brew install zerotier-one
Join network: sudo zerotier-cli join <network-id> (get from 1Password web)
Approve your device in ZeroTier admin console
Lab endpoints accessible via internal IPs

🌐 Environments

Environment Overview

Environment	Purpose	Deployment
dev	Development and testing	Auto-deploy from main
staging	Pre-production validation	Manual promotion
prod	Production	Manual promotion

🔧 Dev Environment

🎭 Staging Environment

🚀 Production Environment

⚡ Quick Reference

Daily Workflow

# 1. Login to AWS (opens browser)
cd ~/projects/tenzai
mise run secrets

# 2. Start tenzai platform
git pull
mise run all

# 3. Access
open http://localhost:4200  # UI
open http://localhost:8000/docs  # API

Common Commands

Task	Command
Start full platform	`mise run all`
Run agent standalone	`mise run agent -U https://target.com`
Deploy a lab	`uvx dojo`
Run evaluator tests	`cd evaluator && uv run pytest tests/`
Local lab deployment	`cd labs/tbench/app-XXX && just run`
View K8s dashboard	`mise run k9s:local`
Get API JWT	`mise run jwt`

⚡ Prerequisites

What is mise?

Why we use mise

Install mise

☁️ AWS Setup

Initial SSO Configuration

AWS Start Page

Available Profiles

Daily Login

📦 Repository Overview

Clone All Repos

🚀 Tenzai Platform

Get Started

Directory Structure

Core Services (the main components)

Agent Toolboxes (containers used by the agent)

Infrastructure

Supportive Components

Config Files

Local Development with Tilt

What is Tilt?

Key benefits

Architecture Diagram

What is LocalStack?

Running the Platform

Local Endpoints

Common mise Tasks

CI/CD Pipeline

CI Gates (Pull Requests)

PR Environments

Deployment Flow

📊 Evaluation System

What is the Evaluation System?

Get Started

Directory Structure

Dojo

What Dojo Does

Dojo CLI

Dojo Web UI

GitHub Actions Workflows

Evaluator

🧪 Labs Repository

What is the Labs Repo?

Lab Types

Get Started

Directory Structure

Local Development

Remote Deployment (GKE)

ZeroTier Network Access

🌐 Environments

Environment Overview

🔧 Dev Environment

🎭 Staging Environment

🚀 Production Environment

⚡ Quick Reference

Daily Workflow

Common Commands

Getting Help

Links & References

📦 Repositories

🔧 Tools & Dashboards

📚 Documentation