What is a joule, here
A joule is one watt-second — the SI unit of energy. On Joule Cloud, the joule is the unit of account. You ship work; we measure the energy that work actually consumed at the silicon; the bill is the sum of those measurements.
You can't optimize what you can't see. You can't price honestly what you can't measure. The joule is what physics charges for computation.
How we measure it
Every request and every workload run is wrapped by a thin middleware layer that reads the silicon's own hardware counters at request start and end. The deltas are the joules.
| Silicon | Counter | Precision |
|---|---|---|
| Intel / AMD x86 | Intel RAPL (per-socket package + DRAM) | ~5% |
| NVIDIA GPU | NVML (per-device power.draw, integrated) | ~10% |
| Apple Silicon | IOReport (CPU + GPU + Neural Engine) | ~3% |
| Anything else | Util × published TDP, declared as derived | ~20-30% |
Which method we used appears on every response in the X-Energy-Method header, and on the receipt as energy.method + energy.method_confidence. An auditor can always tell whether a number came from the chip itself or was derived from utilization.
Why joules instead of GPU-hours or tokens
Token-based and hour-based billing both ignore reality. A 30-second request on an idle H100 costs the provider one thirtieth of an hour's reservation but ate ~25 watt-hours of energy. A token-priced call that internally runs reasoning may consume 100× the energy of a token-priced call that hits the cache. The customer pays the same in both cases. We don't think they should.
Joules make billing track physics:
- Idle is near-zero. A waiting container draws idle-floor watts, not reservation watts.
- Inefficient code shows up. A quadratic loop on a million rows costs more in joules than the linear version — even if both finish in the same wall-clock.
- Architecture matters. An H100 doing matmul at 700 W charges 700 W. A Graviton doing the same matmul at 90 W charges 90 W. The router puts your call on the chip whose energy profile matches your call's needs — see Routing & placement.
How the joule appears in your code
Every API response carries the joules used in an X-Energy-Joules response header. Most OpenAI-compatible SDKs expose response headers via response.headers or response.response.headers.
r = client.chat.completions.create(model="auto", messages=[...])
print(r.response.headers["x-energy-joules"]) # e.g. "0.31"
For batch workloads, query the per-workload joule total from the API:
curl https://api.greenjoules.cloud/v1/usage/workload/<id> \
-H "Authorization: Bearer jc_…"
What a joule is worth
Joules at scale, with rough reference orientation:
| Quantity | About |
|---|---|
| 1 J | One second of a small LED bulb |
| 100 J | A mid-size reasoning request on H100 |
| 1 kJ | A small image generation |
| 1 MJ | Roughly 12 hours of a 25 W laptop |
| 1 GJ | A long fine-tuning job on multiple GPUs |
Next
To see how we make sure your joules don't cost more than they have to, read Routing & placement. To see what the per-request record looks like, read Energy receipts.