T-Bench is Tenzai's benchmarking infrastructure for evaluating AI security testing capabilities across applications of varying complexity.
The goal of T-Bench is to provide a standardized and repeatable way to evaluate the effectiveness of AI security scanners and penetration testing agents.
The labs repository contains three tiers of security benchmarks:
| Category | Count | Description | Complexity |
|---|---|---|---|
| T-Bench (Medium Apps) | 26 apps | Custom-built realistic business applications | Medium-High |
| OSS (Large Apps) | ~8 apps | Real-world open-source software with known CVEs | High |
| CTF Challenges (Small) | 108+ labs | Focused single-vulnerability challenges | Low-Medium |
Custom-built, medium-complexity business applications simulating real enterprise software.
| Metric | Value |
|---|---|
| Total Apps | 26 (app-001 to app-023b) |
| Avg Endpoints/App | 23.7 |
| Tech Diversity | 8 backend languages, 4 databases |
Real-world open-source applications with documented CVEs and vulnerabilities.
| App | Description | Findings |
|---|---|---|
| Zabbix 6.4/7.0 | Enterprise monitoring platform (PHP + PostgreSQL) | 4 (SQLi, priv escalation) |
| OpenFire 3.6.0 | XMPP instant messaging server (Java) | 12 (auth bypass, XSS, SQLi) |
| OpenCart | E-commerce platform (PHP) | 7 (path traversal, code injection) |
| DVWP | Damn Vulnerable WordPress | 4 (code injection, path traversal) |
| OrangeHRM | HR management system | Known CVEs |
Additional benchmarks include:
Single-vulnerability challenges for focused testing.
Additional small challenges include:
Each lab with known vulnerabilities includes grounded-report.md files that document the exact vulnerabilities and step-by-step exploitation procedures.
Grounded reports serve as:
grounded-report.md)Title: Descriptive name of the vulnerability
CWE: CWE-XXX (or CWE-XXX OR CWE-YYY for alternative classifications)
Name: Official CWE name
Severity:
Endpoint: METHOD /path/to/endpoint
Assets: Prerequisites needed (credentials, session cookies, IDs)
Description: Technical explanation of the vulnerability root cause.
Impact: Business and security consequences of exploitation.
Reproduction:
grounded-report.json)Machine-readable version containing:
| Feature | Implementation |
|---|---|
| Deployment | Kubernetes (local Docker Desktop/minikube + remote GKE) |
| Management | Dojo CLI in TenzaiLtd/evaluation repo |
| Access | ZeroTier VPN for remote GKE endpoints |
| Secrets | SOPS + GCP KMS encryption |