📊

Benchmarks

T-Bench is Tenzai's benchmarking infrastructure for evaluating AI security testing capabilities across applications of varying complexity.

The goal of T-Bench is to provide a standardized and repeatable way to evaluate the effectiveness of AI security scanners and penetration testing agents.

Benchmark Categories

The labs repository contains three tiers of security benchmarks:

Category Count Description Complexity
T-Bench (Medium Apps) 26 apps Custom-built realistic business applications Medium-High
OSS (Large Apps) ~8 apps Real-world open-source software with known CVEs High
CTF Challenges (Small) 108+ labs Focused single-vulnerability challenges Low-Medium

1. T-Bench Applications (Medium Complexity)

Custom-built, medium-complexity business applications simulating real enterprise software.

Metric Value
Total Apps 26 (app-001 to app-023b)
Avg Endpoints/App 23.7
Tech Diversity 8 backend languages, 4 databases

Example Apps


2. OSS Applications (Large/Complex)

Real-world open-source applications with documented CVEs and vulnerabilities.

App Description Findings
Zabbix 6.4/7.0 Enterprise monitoring platform (PHP + PostgreSQL) 4 (SQLi, priv escalation)
OpenFire 3.6.0 XMPP instant messaging server (Java) 12 (auth bypass, XSS, SQLi)
OpenCart E-commerce platform (PHP) 7 (path traversal, code injection)
DVWP Damn Vulnerable WordPress 4 (code injection, path traversal)
OrangeHRM HR management system Known CVEs

Additional benchmarks include:


3. CTF Challenges (Small/Focused)

Single-vulnerability challenges for focused testing.

XBEN (104 challenges)

WSCOIL (4 challenges)

Additional small challenges include:


Grounded Reports

Each lab with known vulnerabilities includes grounded-report.md files that document the exact vulnerabilities and step-by-step exploitation procedures.

Purpose

Grounded reports serve as:

Markdown Format (grounded-report.md)

Vulnerability N

Title: Descriptive name of the vulnerability

CWE: CWE-XXX (or CWE-XXX OR CWE-YYY for alternative classifications)

Name: Official CWE name

Severity:

Endpoint: METHOD /path/to/endpoint

Assets: Prerequisites needed (credentials, session cookies, IDs)

Description: Technical explanation of the vulnerability root cause.

Impact: Business and security consequences of exploitation.

Reproduction:

  1. Step:
    • Action: Exact HTTP request or command with full payload
    • Entity: What component is being attacked
    • Required Assets: What you need for this step
    • Derived Assets: What you gain from this step

JSON Format (grounded-report.json)

Machine-readable version containing:


Infrastructure

Feature Implementation
Deployment Kubernetes (local Docker Desktop/minikube + remote GKE)
Management Dojo CLI in TenzaiLtd/evaluation repo
Access ZeroTier VPN for remote GKE endpoints
Secrets SOPS + GCP KMS encryption
Deploying a lab? See Lab Bringup for step-by-step instructions on running labs locally and on GKE.