Inference
OpenAI-compatible inference. Same routes, same shapes, same SDKs — pointed at api.greenjoules.cloud/v1. The router picks the cheapest capable silicon per call; the bill is the joules.
What you can call
| Route | Maps to |
|---|---|
POST /v1/chat/completions | Chat completions, OpenAI shape |
POST /v1/embeddings | Embeddings |
POST /v1/audio/transcriptions | Speech-to-text (Whisper-compat) |
POST /v1/audio/speech | Text-to-speech |
POST /v1/images/generations | Image generation (SDXL / FLUX where licensed) |
GET /v1/models | Live model list + per-model joule cost orientation |
How the energy is measured
For each request: hardware counters at start and end, delta × PUE = total joules. NVML for NVIDIA, RAPL for x86 hosts, IOReport for Apple Silicon. Method + confidence land on every response in X-Energy-Method and on the receipt as energy.method_confidence. See What is a joule? for details.
Streaming
Standard SSE. The final energy total arrives in the X-Energy-Joules response header (sent on the final chunk's trailers, or via the closing event metadata; both are exposed by the OpenAI Python and Node SDKs).
Pricing orientation
| Class | Example | Typical energy |
|---|---|---|
| L0 | embedding lookup, cache hit | ~0.01 J |
| L1 | short summarize, classify | ~0.05 J |
| L2 | RAG, mid-context | ~0.3 J |
| L3 | long-context reasoning | ~6 J |
See Pricing for joule-to-dollar conversion.
Start
See the Quickstart for a working code example, or Migrate from OpenAI if you already have an OpenAI integration.