Benchmarks
Real numbers from real workloads. Methodology below the tables — you can rerun any of these against your own account and our numbers will reproduce within the bounds stated.
The point is not to win every line. The point is to make the comparison honest. Some lines we win on; some we lose on; the bill, in joules, tells you the truth either way.
Inference, mixed-traffic week
A real customer's anonymised week of traffic on their LLM SaaS — ~2.96M requests across L0/L1/L2/L3 tiers. Same prompts replayed against three providers; judge-model scoring kept quality Δ under noise.
| Provider | Strategy | Spend | p50 ms | p99 ms | Quality |
|---|---|---|---|---|---|
| OpenAI | auto-route (gpt-4o-mini / gpt-4o split) | $54,200 | 620 | 3,400 | 0.84 |
| Anthropic | Claude Haiku 4.5 pinned | $48,700 | 510 | 2,800 | 0.85 |
| AWS Bedrock | Llama 3.3 70B pinned | $41,300 | 740 | 4,100 | 0.83 |
| Joule Cloud | model: "auto" | $28,900 | 510 | 2,900 | 0.83 |
Quality is judge-model score on a 1,200-prompt held-out eval set; scale 0–1. The +0.01 / −0.01 spread is at noise level for this evaluator. Reproduce at /tutorials/build-a-chatbot by pointing the same prompt set at four providers.
Object storage, 80 TB region-pinned
One EU health-tech's analytics corpus — mostly Parquet shards, occasionally hot reads. Pricing assumes 100% of reads come from same-region compute.
| Provider | Storage | Egress & req fees | Compute & analytics | Total |
|---|---|---|---|---|
| AWS S3 + EMR + Athena | $1,840 | $1,420 | $7,840 | $11,100 |
| GCP GCS + Dataproc + BigQuery | $1,750 | $1,180 | $7,600 | $10,530 |
| Azure Blob + Synapse | $1,920 | $1,340 | $8,210 | $11,470 |
| Joule Cloud | $1,280 | $0 | $6,140 | $7,420 |
All four runs configured for hourly analytical scans over the full 80 TB plus ~200 GB/day of hot reads from a same-region compute workload. The "egress & req fees" line on Joule Cloud is zero because in-mesh transfer is free; the analytics line reflects measured joules during the run.
Container compute, "always-on small service"
A typical 1 vCPU / 1 GB Node.js HTTP server, 30 req/s sustained, scale-to-zero idle policy not viable because traffic is steady.
| Provider | Instance type | Monthly | Egress (~100 GB) | Total |
|---|---|---|---|---|
| AWS EC2 (t3.micro) | 2 vCPU / 1 GB | $7.50 | $8.10 | $15.60 |
| GCP Cloud Run | 1 vCPU / 1 GB | $12.40 | $5.00 | $17.40 |
| Fly.io shared-cpu-1x | 1 vCPU / 1 GB | $5.10 | $3.20 | $8.30 |
| Cloudflare Workers | n/a (per-request) | $5.00 | $0.00 | $5.00 |
| Joule Cloud Compute | 1 vCPU / 1 GB | $3.80 | $0.00 | $3.80 |
Joule line is monthly joule consumption at our published rate; CPU runs at ~40% steady-state, so the bill reflects ~5 watt-average for the month. Cloudflare Workers wins at the lowest tier; for any workload that needs request-scoped CPU time approaching seconds, Joule's per-joule line scales linearly while Workers' wall-clock CPU caps bite.
Functions, "warm path"
A 256 MB serverless handler, 1.5M invocations/month, average 80 ms warm execution. Cold starts at ~2% rate.
| Provider | Pricing dimension | Compute | Requests | Total |
|---|---|---|---|---|
| AWS Lambda | GB-second + request | $0.32 | $0.30 | $0.62 |
| GCP Cloud Functions | GB-second + request | $0.36 | $0.30 | $0.66 |
| Vercel Functions | VFC + request | $1.20 | $0.00 | $1.20 |
| Cloudflare Workers | request only | $0.00 | $0.50 | $0.50 |
| Joule Cloud Functions | joules consumed | $0.46 | $0.00 | $0.46 |
For high-volume sub-100ms handlers Cloudflare Workers is still the lowest-price line. We come out ahead at the +CPU-heavy end (image processing, model inference inline). For pure HTTP-glue functions, run the numbers against Workers first.
Energy per token (Inference)
The metric the rest of the industry doesn't publish. Joules consumed at the silicon to produce one output token, for a representative inference call.
| Model | Silicon | J / token (input) | J / token (output) | Source |
|---|---|---|---|---|
| Llama 3.3 70B | H100 SXM5 | 0.0028 | 0.0034 | measured |
| Llama 3.3 405B | H100 SXM5 (4-way TP) | 0.012 | 0.014 | measured |
| DeepSeek-V3 | H200 | 0.0061 | 0.0072 | measured |
| Mixtral 8x22B | L40S | 0.0021 | 0.0025 | measured |
| Qwen 2.5 72B | H100 SXM5 | 0.0029 | 0.0035 | measured |
| Claude Haiku 4.5 | (licensor silicon) | ~0.002 | ~0.003 | derived from billing |
| FLUX schnell (image) | H100 SXM5 | n/a | ~110 J / image | measured |
All measured numbers come from NVML on production GPUs during real inference, integrated over the request and bookkept against the chip's idle floor. See Measuring energy at the silicon for methodology, including which numbers we derived vs measured.
Carbon per million requests
At the typical mid-tier traffic mix, scaled per million inference requests, region-pinned to a clean-grid region (eu-fi).
| Region | Grid intensity (gCO₂/kWh) | kJ / M-req | kg CO₂eq / M-req |
|---|---|---|---|
| eu-fi (Helsinki) | 41 | 312 | 0.0036 |
| eu-de (Frankfurt) | 335 | 312 | 0.0290 |
| eu-fr (Paris) | 54 | 312 | 0.0047 |
| us-east (Ashburn) | 402 | 312 | 0.0349 |
| us-west (Hillsboro) | 198 | 312 | 0.0172 |
Grid intensities are 30-day rolling averages from Electricity Maps. The same workload, region-pinned to eu-fi instead of us-east, emits ~10× less carbon. The router's region = "auto" setting picks region by this exact math, with latency as a secondary score.
Methodology
- Inference quality: a held-out 1,200-prompt eval set scored by an independent judge model. The scoring methodology is public and reproducible. We do not select prompts that favour our routing.
- Energy measurement: NVML on NVIDIA silicon, RAPL on x86 hosts, IOReport on Apple Silicon. Per-request, integrated over the call duration. Receipts attached to every measured data point.
- Pricing comparisons: list prices as of 2026-06-23. We do not assume any committed-spend discounts on either side — for high-volume committed customers, the picture shifts.
- Reproducibility: each table here can be reproduced against your own account. Run
jc evals run --prompts ./your-set.jsonl --candidates joule-auto,openai-autoto confirm. - What we don't measure: network energy outside the data centre (the upstream user's connection), client-device energy. See the measurement methodology blog post for the full scope statement.
Caveats & honest losses
- For pure HTTP-glue Functions, Cloudflare Workers is still cheaper at small scale. We come out ahead once compute starts dominating.
- For raw, committed-spend EC2 reserved instances at the 3-year tier, AWS comes out cheaper than us on the pure-compute line. We come out ahead on the bundle (compute + egress + receipts).
- For some specialty models (OpenAI's o1-class reasoning, Anthropic's long-context Claude Opus), we don't have a price-comparable open-weight equivalent. The bench above uses what we host.
- For 2026Q3, we expect the inference J/token to drop ~30% as speculative-decoding lands at the gateway.
Update cadence
This page is regenerated weekly. Last updated: 2026-06-23. Subscribe to the changelog for material updates; for full transparency, the raw measurement data is available on request to [email protected].