Migrate from OpenAI
Joule Cloud's inference API is a drop-in replacement for OpenAI's. Two lines change: the base_url and the api_key. The rest of your integration keeps working — same routes, same request shapes, same response shapes.
The diff
from openai import OpenAI
- client = OpenAI() # reads OPENAI_API_KEY from env
+ client = OpenAI(
+ base_url="https://api.greenjoules.cloud/v1",
+ api_key="jc_…" # from portal.greenjoules.cloud
+ )
r = client.chat.completions.create(
- model="gpt-4o-mini",
+ model="auto", # or pin: "llama-3.3-70b-instruct"
messages=[{"role":"user","content":"hi"}],
)
That's it. The response object has the same fields. r.choices[0].message.content works. Streaming works. Tool use works. JSON mode works.
What's the same
| Capability | Joule Cloud |
|---|---|
| Chat completions | POST /v1/chat/completions — identical shape |
| Streaming (SSE) | Set stream: true, parse data: lines, same delta shape |
| Tool / function calling | tools array on request, tool_calls on response — identical |
| JSON mode | response_format: {"type": "json_object"} |
| Embeddings | POST /v1/embeddings |
| Vision (image input) | content array with image_url parts |
| Model listing | GET /v1/models |
What's different
| Capability | How it differs |
|---|---|
| Model names | OpenAI models are not on the mesh. Use model: "auto" for the cheapest capable choice, or pin one of: llama-3.3-70b-instruct, llama-3.3-405b-instruct, mixtral-8x22b, qwen2.5-72b, deepseek-v3, claude-haiku-4.5 (where licensed), gpt-oss-120b. |
| Energy header | Every response includes X-Energy-Joules and a sibling set of X-Carbon-mg, X-Routed-To, X-Tier, X-Receipt-Id. OpenAI does not. |
| Pricing model | Billed in joules consumed, not per-token. See Pricing for orientation. |
| Rate limits | No per-minute token caps. Your only ceiling is your account balance (or an explicit per-workload energy_budget). |
| Region / data residency | You can pin a jurisdiction via request header X-Region: eu-fi. Default behaviour is "cheapest capable" worldwide. |
Migration checklist
- Sign up at portal.greenjoules.cloud and put $5 on file.
- Mint a token in the portal under Tokens → New token. Scope to inference-only if you don't need the full surface.
- Swap your client config —
base_urlandapi_keyas above. - Pick a model. Start with
"auto"and let the router classify. Switch to a pinned model if you need deterministic behaviour. - Run your existing test suite against the new endpoint. Most regressions you'll find are around model-specific output style, not API shape.
- Capture the joule headers in your observability stack.
X-Energy-Joules,X-Tier,X-Routed-To— these are the data you don't have today.
Patterns by SDK
Python (openai)
from openai import OpenAI
client = OpenAI(
base_url="https://api.greenjoules.cloud/v1",
api_key=os.environ["JC_API_KEY"],
)
r = client.chat.completions.create(model="auto", messages=[...])
joules = r.response.headers.get("x-energy-joules")
Node (openai npm)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.greenjoules.cloud/v1",
apiKey: process.env.JC_API_KEY,
});
const r = await client.chat.completions.create({ model: "auto", messages: [...] });
// r._request_id, etc.
// energy is on the underlying response object — use the lower-level fetch
// approach below if you need the X-Energy-Joules header
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";
const jc = createOpenAI({
baseURL: "https://api.greenjoules.cloud/v1",
apiKey: process.env.JC_API_KEY,
});
const { text } = await generateText({
model: jc("auto"),
prompt: "hi",
});
LangChain (Python)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.greenjoules.cloud/v1",
api_key=os.environ["JC_API_KEY"],
model="auto",
)
curl
curl https://api.greenjoules.cloud/v1/chat/completions \
-H "Authorization: Bearer $JC_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'
Rollback
Both endpoints can run in parallel. Most teams keep the OpenAI client as a fallback for a week or two, behind a feature flag, then retire it. To roll back, change the base_url and api_key back. The application code does not need to change.
Common gotchas
- Implicit model defaults. If your code relies on a missing
modelparameter defaulting togpt-3.5-turboserver-side, setmodelexplicitly. Joule Cloud requires it. - Hard-coded model names in prompts. Prompt templates that say "Respond as GPT-4 would" still work, but the response style will be the chosen model's, not GPT-4's.
- OpenAI-specific tool helpers.
openai.beta.assistants.*isn't implemented — the assistants API is OpenAI-specific. Use direct chat completions for agentic flows. - Error shapes. We return the same OpenAI error envelope. Your existing
except openai.RateLimitErrorhandlers continue to work, except you'll see402 Insufficient Balanceinstead of429when out of budget.
Next
For the full API surface, see the API reference. To understand what your bill is going to look like, see Pricing and What is a joule, here.