Performance Benchmarks

Atmosphere runs JMH micro-benchmarks on every push to main that touches performance-sensitive code, plus weekly scheduled runs. Results are tracked over time to detect regressions automatically.

What We Measure

CheckpointStore

save, load, fork, list throughput at 100 / 1K / 10K snapshots. Validates that durable workflow state doesn’t become a bottleneck.

Broadcaster Dispatch

Broadcast-to-N-subscribers throughput at 1 / 10 / 100 / 1K subscribers. Measures the per-subscriber dispatch overhead in DefaultBroadcaster.

Coordinator Fan-Out

Parallel agent dispatch latency at 2 / 4 / 8 / 16 agents. Validates virtual-thread-based fleet coordination scales linearly.

AI Interceptor Chain

Pre/post-process chain traversal at 0 / 1 / 4 / 16 / 64 interceptors. Measures guardrail + memory + cost-filter overhead per prompt.

AgentRuntime Resolver

Cached resolution path (warm) for the ServiceLoader-based runtime lookup. Validates the hot path is effectively zero-cost.

VT Pinning Check

JFR-based scan for jdk.VirtualThreadPinned events under benchmark load. Proves no synchronized blocks pin virtual threads to carriers.

Live Results

The chart below loads the latest benchmark data from CI. Each data point is a push to main or a weekly scheduled run on GitHub Actions (ubuntu-latest, JDK 21 Temurin).

Loading benchmark data from CI…

Running Locally

# Build the benchmarks module (profile-gated)
./mvnw -Pperf -pl modules/benchmarks -am package -DskipTests

# Run all JMH micro-benchmarks
scripts/benchmarks/run-jmh.sh

# Run a single benchmark
java -jar modules/benchmarks/target/benchmarks.jar CheckpointStoreBenchmark

# Virtual thread pinning check
scripts/benchmarks/vt-pinning-check.sh

# Streaming load test (server must be running separately)
scripts/benchmarks/run-streaming-load.sh --clients 1000 --messages 50

CI Regression Detection

The Benchmarks workflow runs automatically on:

Pushes to main touching modules/cpr/, modules/ai/, modules/checkpoint/, modules/coordinator/, or modules/benchmarks/
Weekly schedule (Monday 6am UTC)
Manual dispatch

Regression threshold: 15%. PRs that degrade any benchmark by more than 15% receive an automatic comment.

Methodology

Hardware: GitHub Actions ubuntu-latest runners (2-4 vCPU, 7-16 GB RAM)
JDK: Temurin 21 LTS
JMH settings (CI): 1 fork, 3 warmup iterations, 5 measurement iterations
JMH settings (local): 2 forks, 5 warmup iterations, 10 measurement iterations (default)
Numbers reflect relative performance on shared CI hardware — use for regression tracking, not absolute claims

Performance Benchmarks

Performance Benchmarks

What We Measure

Live Results

Running Locally

CI Regression Detection

Methodology

See Also