Benchmarks

Real numbers from real workloads. Methodology below the tables — you can rerun any of these against your own account and our numbers will reproduce within the bounds stated.

The point is not to win every line. The point is to make the comparison honest. Some lines we win on; some we lose on; the bill, in joules, tells you the truth either way.

Inference, mixed-traffic week

A real customer's anonymised week of traffic on their LLM SaaS — ~2.96M requests across L0/L1/L2/L3 tiers. Same prompts replayed against three providers; judge-model scoring kept quality Δ under noise.

Provider	Strategy	Spend	p50 ms	p99 ms	Quality
OpenAI	auto-route (gpt-4o-mini / gpt-4o split)	$54,200	620	3,400	0.84
Anthropic	Claude Haiku 4.5 pinned	$48,700	510	2,800	0.85
AWS Bedrock	Llama 3.3 70B pinned	$41,300	740	4,100	0.83
Joule Cloud	`model: "auto"`	$28,900	510	2,900	0.83

Quality is judge-model score on a 1,200-prompt held-out eval set; scale 0–1. The +0.01 / −0.01 spread is at noise level for this evaluator. Reproduce at /tutorials/build-a-chatbot by pointing the same prompt set at four providers.

Object storage, 80 TB region-pinned

One EU health-tech's analytics corpus — mostly Parquet shards, occasionally hot reads. Pricing assumes 100% of reads come from same-region compute.

Provider	Storage	Egress & req fees	Compute & analytics	Total
AWS S3 + EMR + Athena	$1,840	$1,420	$7,840	$11,100
GCP GCS + Dataproc + BigQuery	$1,750	$1,180	$7,600	$10,530
Azure Blob + Synapse	$1,920	$1,340	$8,210	$11,470
Joule Cloud	$1,280	$0	$6,140	$7,420

All four runs configured for hourly analytical scans over the full 80 TB plus ~200 GB/day of hot reads from a same-region compute workload. The "egress & req fees" line on Joule Cloud is zero because in-mesh transfer is free; the analytics line reflects measured joules during the run.

Container compute, "always-on small service"

A typical 1 vCPU / 1 GB Node.js HTTP server, 30 req/s sustained, scale-to-zero idle policy not viable because traffic is steady.

Provider	Instance type	Monthly	Egress (~100 GB)	Total
AWS EC2 (t3.micro)	2 vCPU / 1 GB	$7.50	$8.10	$15.60
GCP Cloud Run	1 vCPU / 1 GB	$12.40	$5.00	$17.40
Fly.io shared-cpu-1x	1 vCPU / 1 GB	$5.10	$3.20	$8.30
Cloudflare Workers	n/a (per-request)	$5.00	$0.00	$5.00
Joule Cloud Compute	1 vCPU / 1 GB	$3.80	$0.00	$3.80

Joule line is monthly joule consumption at our published rate; CPU runs at ~40% steady-state, so the bill reflects ~5 watt-average for the month. Cloudflare Workers wins at the lowest tier; for any workload that needs request-scoped CPU time approaching seconds, Joule's per-joule line scales linearly while Workers' wall-clock CPU caps bite.

Functions, "warm path"

A 256 MB serverless handler, 1.5M invocations/month, average 80 ms warm execution. Cold starts at ~2% rate.

Provider	Pricing dimension	Compute	Requests	Total
AWS Lambda	GB-second + request	$0.32	$0.30	$0.62
GCP Cloud Functions	GB-second + request	$0.36	$0.30	$0.66
Vercel Functions	VFC + request	$1.20	$0.00	$1.20
Cloudflare Workers	request only	$0.00	$0.50	$0.50
Joule Cloud Functions	joules consumed	$0.46	$0.00	$0.46

For high-volume sub-100ms handlers Cloudflare Workers is still the lowest-price line. We come out ahead at the +CPU-heavy end (image processing, model inference inline). For pure HTTP-glue functions, run the numbers against Workers first.

Energy per token (Inference)

The metric the rest of the industry doesn't publish. Joules consumed at the silicon to produce one output token, for a representative inference call.

Model	Silicon	J / token (input)	J / token (output)	Source
Llama 3.3 70B	H100 SXM5	0.0028	0.0034	measured
Llama 3.3 405B	H100 SXM5 (4-way TP)	0.012	0.014	measured
DeepSeek-V3	H200	0.0061	0.0072	measured
Mixtral 8x22B	L40S	0.0021	0.0025	measured
Qwen 2.5 72B	H100 SXM5	0.0029	0.0035	measured
Claude Haiku 4.5	(licensor silicon)	~0.002	~0.003	derived from billing
FLUX schnell (image)	H100 SXM5	n/a	~110 J / image	measured

All measured numbers come from NVML on production GPUs during real inference, integrated over the request and bookkept against the chip's idle floor. See Measuring energy at the silicon for methodology, including which numbers we derived vs measured.

Carbon per million requests

At the typical mid-tier traffic mix, scaled per million inference requests, region-pinned to a clean-grid region (eu-fi).

Region	Grid intensity (gCO₂/kWh)	kJ / M-req	kg CO₂eq / M-req
eu-fi (Helsinki)	41	312	0.0036
eu-de (Frankfurt)	335	312	0.0290
eu-fr (Paris)	54	312	0.0047
us-east (Ashburn)	402	312	0.0349
us-west (Hillsboro)	198	312	0.0172

Grid intensities are 30-day rolling averages from Electricity Maps. The same workload, region-pinned to eu-fi instead of us-east, emits ~10× less carbon. The router's region = "auto" setting picks region by this exact math, with latency as a secondary score.

Methodology

Inference quality: a held-out 1,200-prompt eval set scored by an independent judge model. The scoring methodology is public and reproducible. We do not select prompts that favour our routing.
Energy measurement: NVML on NVIDIA silicon, RAPL on x86 hosts, IOReport on Apple Silicon. Per-request, integrated over the call duration. Receipts attached to every measured data point.
Pricing comparisons: list prices as of 2026-06-23. We do not assume any committed-spend discounts on either side — for high-volume committed customers, the picture shifts.
Reproducibility: each table here can be reproduced against your own account. Run jc evals run --prompts ./your-set.jsonl --candidates joule-auto,openai-auto to confirm.
What we don't measure: network energy outside the data centre (the upstream user's connection), client-device energy. See the measurement methodology blog post for the full scope statement.

Caveats & honest losses

For pure HTTP-glue Functions, Cloudflare Workers is still cheaper at small scale. We come out ahead once compute starts dominating.
For raw, committed-spend EC2 reserved instances at the 3-year tier, AWS comes out cheaper than us on the pure-compute line. We come out ahead on the bundle (compute + egress + receipts).
For some specialty models (OpenAI's o1-class reasoning, Anthropic's long-context Claude Opus), we don't have a price-comparable open-weight equivalent. The bench above uses what we host.
For 2026Q3, we expect the inference J/token to drop ~30% as speculative-decoding lands at the gateway.

Update cadence

This page is regenerated weekly. Last updated: 2026-06-23. Subscribe to the changelog for material updates; for full transparency, the raw measurement data is available on request to [email protected].