Migrate from OpenAI

Joule Cloud's inference API is a drop-in replacement for OpenAI's. Two lines change: the base_url and the api_key. The rest of your integration keeps working — same routes, same request shapes, same response shapes.

The diff

from openai import OpenAI

- client = OpenAI()  # reads OPENAI_API_KEY from env
+ client = OpenAI(
+     base_url="https://api.greenjoules.cloud/v1",
+     api_key="jc_…"  # from portal.greenjoules.cloud
+ )

  r = client.chat.completions.create(
-     model="gpt-4o-mini",
+     model="auto",  # or pin: "llama-3.3-70b-instruct"
      messages=[{"role":"user","content":"hi"}],
  )

That's it. The response object has the same fields. r.choices[0].message.content works. Streaming works. Tool use works. JSON mode works.

What's the same

Capability	Joule Cloud
Chat completions	`POST /v1/chat/completions` — identical shape
Streaming (SSE)	Set `stream: true`, parse `data:` lines, same delta shape
Tool / function calling	`tools` array on request, `tool_calls` on response — identical
JSON mode	`response_format: {"type": "json_object"}`
Embeddings	`POST /v1/embeddings`
Vision (image input)	`content` array with `image_url` parts
Model listing	`GET /v1/models`

What's different

Capability	How it differs
Model names	OpenAI models are not on the mesh. Use `model: "auto"` for the cheapest capable choice, or pin one of: `llama-3.3-70b-instruct`, `llama-3.3-405b-instruct`, `mixtral-8x22b`, `qwen2.5-72b`, `deepseek-v3`, `claude-haiku-4.5` (where licensed), `gpt-oss-120b`.
Energy header	Every response includes `X-Energy-Joules` and a sibling set of `X-Carbon-mg`, `X-Routed-To`, `X-Tier`, `X-Receipt-Id`. OpenAI does not.
Pricing model	Billed in joules consumed, not per-token. See Pricing for orientation.
Rate limits	No per-minute token caps. Your only ceiling is your account balance (or an explicit per-workload `energy_budget`).
Region / data residency	You can pin a jurisdiction via request header `X-Region: eu-fi`. Default behaviour is "cheapest capable" worldwide.

Migration checklist

Sign up at portal.greenjoules.cloud and put $5 on file.
Mint a token in the portal under Tokens → New token. Scope to inference-only if you don't need the full surface.
Swap your client config — base_url and api_key as above.
Pick a model. Start with "auto" and let the router classify. Switch to a pinned model if you need deterministic behaviour.
Run your existing test suite against the new endpoint. Most regressions you'll find are around model-specific output style, not API shape.
Capture the joule headers in your observability stack. X-Energy-Joules, X-Tier, X-Routed-To — these are the data you don't have today.

Patterns by SDK

Python (openai)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.greenjoules.cloud/v1",
    api_key=os.environ["JC_API_KEY"],
)

r = client.chat.completions.create(model="auto", messages=[...])
joules = r.response.headers.get("x-energy-joules")

Node (openai npm)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.greenjoules.cloud/v1",
  apiKey: process.env.JC_API_KEY,
});

const r = await client.chat.completions.create({ model: "auto", messages: [...] });
// r._request_id, etc.
// energy is on the underlying response object — use the lower-level fetch
//   approach below if you need the X-Energy-Joules header

Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";

const jc = createOpenAI({
  baseURL: "https://api.greenjoules.cloud/v1",
  apiKey: process.env.JC_API_KEY,
});

const { text } = await generateText({
  model: jc("auto"),
  prompt: "hi",
});

LangChain (Python)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="https://api.greenjoules.cloud/v1",
    api_key=os.environ["JC_API_KEY"],
    model="auto",
)

curl

curl https://api.greenjoules.cloud/v1/chat/completions \
  -H "Authorization: Bearer $JC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

Rollback

Both endpoints can run in parallel. Most teams keep the OpenAI client as a fallback for a week or two, behind a feature flag, then retire it. To roll back, change the base_url and api_key back. The application code does not need to change.

Common gotchas

Implicit model defaults. If your code relies on a missing model parameter defaulting to gpt-3.5-turbo server-side, set model explicitly. Joule Cloud requires it.
Hard-coded model names in prompts. Prompt templates that say "Respond as GPT-4 would" still work, but the response style will be the chosen model's, not GPT-4's.
OpenAI-specific tool helpers. openai.beta.assistants.* isn't implemented — the assistants API is OpenAI-specific. Use direct chat completions for agentic flows.
Error shapes. We return the same OpenAI error envelope. Your existing except openai.RateLimitError handlers continue to work, except you'll see 402 Insufficient Balance instead of 429 when out of budget.

For the full API surface, see the API reference. To understand what your bill is going to look like, see Pricing and What is a joule, here.