Inference

OpenAI-compatible inference. Same routes, same shapes, same SDKs — pointed at api.greenjoules.cloud/v1. The router picks the cheapest capable silicon per call; the bill is the joules.

What you can call

Route	Maps to
`POST /v1/chat/completions`	Chat completions, OpenAI shape
`POST /v1/embeddings`	Embeddings
`POST /v1/audio/transcriptions`	Speech-to-text (Whisper-compat)
`POST /v1/audio/speech`	Text-to-speech
`POST /v1/images/generations`	Image generation (SDXL / FLUX where licensed)
`GET /v1/models`	Live model list + per-model joule cost orientation

How the energy is measured

For each request: hardware counters at start and end, delta × PUE = total joules. NVML for NVIDIA, RAPL for x86 hosts, IOReport for Apple Silicon. Method + confidence land on every response in X-Energy-Method and on the receipt as energy.method_confidence. See What is a joule? for details.

Streaming

Standard SSE. The final energy total arrives in the X-Energy-Joules response header (sent on the final chunk's trailers, or via the closing event metadata; both are exposed by the OpenAI Python and Node SDKs).

Pricing orientation

Class	Example	Typical energy
L0	embedding lookup, cache hit	~0.01 J
L1	short summarize, classify	~0.05 J
L2	RAG, mid-context	~0.3 J
L3	long-context reasoning	~6 J

See Pricing for joule-to-dollar conversion.

Start

See the Quickstart for a working code example, or Migrate from OpenAI if you already have an OpenAI integration.