Case study · published v1.0 (placeholder; customer name redacted pending sign-off)

An LLM SaaS company moving 40M inference calls / month off OpenAI

The before

The migration

One sprint. Two lines changed in their inference client (base URL + token). Behind a feature flag, ramped 5% → 25% → 100% over six days. Their internal evals (a fixed test set of 1,200 prompts scored by a held-out judge model) showed quality parity within the noise on the model: "auto" setting.

The after

MetricBeforeAfter
Monthly bill$54,200$28,900
p50 latency620 ms510 ms
p99 latency3.4 s2.9 s
Quality (judge-model score)0.840.83
Per-call carbon attributionnoyes

What the customer said

"The unlock wasn't the cost, although the cost was great. The unlock was getting the per-call joules so our pricing model could be defensible to our own customers. We can now charge per AI feature based on what it actually costs us to run."

What we learned