Inference

OpenAI-compatible inference. Same routes, same shapes, same SDKs — pointed at api.greenjoules.cloud/v1. The router picks the cheapest capable silicon per call; the bill is the joules.

What you can call

RouteMaps to
POST /v1/chat/completionsChat completions, OpenAI shape
POST /v1/embeddingsEmbeddings
POST /v1/audio/transcriptionsSpeech-to-text (Whisper-compat)
POST /v1/audio/speechText-to-speech
POST /v1/images/generationsImage generation (SDXL / FLUX where licensed)
GET /v1/modelsLive model list + per-model joule cost orientation

How the energy is measured

For each request: hardware counters at start and end, delta × PUE = total joules. NVML for NVIDIA, RAPL for x86 hosts, IOReport for Apple Silicon. Method + confidence land on every response in X-Energy-Method and on the receipt as energy.method_confidence. See What is a joule? for details.

Streaming

Standard SSE. The final energy total arrives in the X-Energy-Joules response header (sent on the final chunk's trailers, or via the closing event metadata; both are exposed by the OpenAI Python and Node SDKs).

Pricing orientation

ClassExampleTypical energy
L0embedding lookup, cache hit~0.01 J
L1short summarize, classify~0.05 J
L2RAG, mid-context~0.3 J
L3long-context reasoning~6 J

See Pricing for joule-to-dollar conversion.

Start

See the Quickstart for a working code example, or Migrate from OpenAI if you already have an OpenAI integration.