← Back to Onboarding Hub

Bonzai Leaderboard

Internal tool for tracking agent performance across benchmarks, comparing runs, and analyzing lab statistics.

🌐 Public URL

https://leaderboard.tenzai.io/

What It Does


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Svelte UI │────▢│ Flask API │────▢│ BigQuery β”‚ β”‚ (TypeScript) β”‚ β”‚ (Python) β”‚ β”‚ + GCS β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
ComponentTech StackPurpose
FrontendSvelte 5, TypeScript, ViteLeaderboard tables, charts, comparison views
BackendFlask, Python 3.13+API endpoints, data aggregation
DataBigQuery, GCSAgent run artifacts, metrics, reports
AuthIAPGoogle Workspace SSO

Key Concepts

Benchmarks

BenchmarkDescription
JINGLEJingle benchmark β€” specific lab set
OSSOpen-source vulnerable apps (Juice Shop, DVWP, etc.)
TBENCHTenzai-built complex applications
CustomUser-defined lab selections

Data Pipeline

Agent runs are stored in GCS with Hive-style partitioning:

gs://tenzai-agent-run-artifacts/
└── year=*/month=*/day=*/hour=*/run_id=*/agent_id=*/
    β”œβ”€β”€ run_config_fixed.json   # Agent config
    β”œβ”€β”€ report.json             # Detailed findings
    └── agent-summary.json      # Run summary

BigQuery external tables mount this data. Scheduled queries refresh native tables hourly.


Local Development

Prerequisites

Quick Start

# Clone the repo
git clone git@github.com:TenzaiLtd/leaderboard.git
cd leaderboard

# Setup UI (installs Node.js 22 and dependencies)
just setup-ui

# Run backend (Terminal 1)
just dev

# Run UI with hot reload (Terminal 2)
just ui

URLs

Common Commands

CommandDescription
just devRun Flask backend
just uiRun Svelte UI dev server
just build-uiBuild UI for production
just docker-runBuild and run in Docker
just deployDeploy to Cloud Run

API Endpoints

EndpointDescription
/api/leaderboardMain leaderboard data (agent rankings)
/api/leaderboard/benchmarksAvailable benchmark configurations
/api/leaderboard/run/<run_id>/detailsDetailed metrics for a run
/api/leaderboard/compareCompare two runs
/api/labsLab-level statistics
/api/labs/<lab_name>/timeseriesHistorical success rate

GCP Resources

ResourceValue
Projectannular-fold-460418-r3
Datasetevaluation_reports
GCS Bucketgs://tenzai-agent-run-artifacts
Cloud Run Regioneurope-west1

Repository

URL: https://github.com/TenzaiLtd/leaderboard

PathPurpose
backend/Flask API, BigQuery queries, data logic
ui/Svelte frontend
static/Built UI assets
justfileTask runner commands

Troubleshooting

GCP Auth Errors

gcloud auth login
gcloud auth application-default login

Data Not Showing

Node Version Issues

cd ui
nvm install
nvm use