Benchmarks

Real numbers from real workloads. Methodology below the tables — you can rerun any of these against your own account and our numbers will reproduce within the bounds stated.

The point is not to win every line. The point is to make the comparison honest. Some lines we win on; some we lose on; the bill, in joules, tells you the truth either way.

Inference, mixed-traffic week

A real customer's anonymised week of traffic on their LLM SaaS — ~2.96M requests across L0/L1/L2/L3 tiers. Same prompts replayed against three providers; judge-model scoring kept quality Δ under noise.

ProviderStrategySpendp50 msp99 msQuality
OpenAIauto-route (gpt-4o-mini / gpt-4o split)$54,2006203,4000.84
AnthropicClaude Haiku 4.5 pinned$48,7005102,8000.85
AWS BedrockLlama 3.3 70B pinned$41,3007404,1000.83
Joule Cloudmodel: "auto"$28,9005102,9000.83

Quality is judge-model score on a 1,200-prompt held-out eval set; scale 0–1. The +0.01 / −0.01 spread is at noise level for this evaluator. Reproduce at /tutorials/build-a-chatbot by pointing the same prompt set at four providers.

Object storage, 80 TB region-pinned

One EU health-tech's analytics corpus — mostly Parquet shards, occasionally hot reads. Pricing assumes 100% of reads come from same-region compute.

ProviderStorageEgress & req feesCompute & analyticsTotal
AWS S3 + EMR + Athena$1,840$1,420$7,840$11,100
GCP GCS + Dataproc + BigQuery$1,750$1,180$7,600$10,530
Azure Blob + Synapse$1,920$1,340$8,210$11,470
Joule Cloud$1,280$0$6,140$7,420

All four runs configured for hourly analytical scans over the full 80 TB plus ~200 GB/day of hot reads from a same-region compute workload. The "egress & req fees" line on Joule Cloud is zero because in-mesh transfer is free; the analytics line reflects measured joules during the run.

Container compute, "always-on small service"

A typical 1 vCPU / 1 GB Node.js HTTP server, 30 req/s sustained, scale-to-zero idle policy not viable because traffic is steady.

ProviderInstance typeMonthlyEgress (~100 GB)Total
AWS EC2 (t3.micro)2 vCPU / 1 GB$7.50$8.10$15.60
GCP Cloud Run1 vCPU / 1 GB$12.40$5.00$17.40
Fly.io shared-cpu-1x1 vCPU / 1 GB$5.10$3.20$8.30
Cloudflare Workersn/a (per-request)$5.00$0.00$5.00
Joule Cloud Compute1 vCPU / 1 GB$3.80$0.00$3.80

Joule line is monthly joule consumption at our published rate; CPU runs at ~40% steady-state, so the bill reflects ~5 watt-average for the month. Cloudflare Workers wins at the lowest tier; for any workload that needs request-scoped CPU time approaching seconds, Joule's per-joule line scales linearly while Workers' wall-clock CPU caps bite.

Functions, "warm path"

A 256 MB serverless handler, 1.5M invocations/month, average 80 ms warm execution. Cold starts at ~2% rate.

ProviderPricing dimensionComputeRequestsTotal
AWS LambdaGB-second + request$0.32$0.30$0.62
GCP Cloud FunctionsGB-second + request$0.36$0.30$0.66
Vercel FunctionsVFC + request$1.20$0.00$1.20
Cloudflare Workersrequest only$0.00$0.50$0.50
Joule Cloud Functionsjoules consumed$0.46$0.00$0.46

For high-volume sub-100ms handlers Cloudflare Workers is still the lowest-price line. We come out ahead at the +CPU-heavy end (image processing, model inference inline). For pure HTTP-glue functions, run the numbers against Workers first.

Energy per token (Inference)

The metric the rest of the industry doesn't publish. Joules consumed at the silicon to produce one output token, for a representative inference call.

ModelSiliconJ / token (input)J / token (output)Source
Llama 3.3 70BH100 SXM50.00280.0034measured
Llama 3.3 405BH100 SXM5 (4-way TP)0.0120.014measured
DeepSeek-V3H2000.00610.0072measured
Mixtral 8x22BL40S0.00210.0025measured
Qwen 2.5 72BH100 SXM50.00290.0035measured
Claude Haiku 4.5(licensor silicon)~0.002~0.003derived from billing
FLUX schnell (image)H100 SXM5n/a~110 J / imagemeasured

All measured numbers come from NVML on production GPUs during real inference, integrated over the request and bookkept against the chip's idle floor. See Measuring energy at the silicon for methodology, including which numbers we derived vs measured.

Carbon per million requests

At the typical mid-tier traffic mix, scaled per million inference requests, region-pinned to a clean-grid region (eu-fi).

RegionGrid intensity (gCO₂/kWh)kJ / M-reqkg CO₂eq / M-req
eu-fi (Helsinki)413120.0036
eu-de (Frankfurt)3353120.0290
eu-fr (Paris)543120.0047
us-east (Ashburn)4023120.0349
us-west (Hillsboro)1983120.0172

Grid intensities are 30-day rolling averages from Electricity Maps. The same workload, region-pinned to eu-fi instead of us-east, emits ~10× less carbon. The router's region = "auto" setting picks region by this exact math, with latency as a secondary score.

Methodology

Caveats & honest losses

Update cadence

This page is regenerated weekly. Last updated: 2026-06-23. Subscribe to the changelog for material updates; for full transparency, the raw measurement data is available on request to [email protected].