Routing & placement
The job of the router is simple to state: send each request to the cheapest capable silicon currently available on the mesh. Doing it well is the actual engineering. This page explains how it works so you can predict where your workloads land.
Inference: cost tiers
Each inference request is classified at the gateway into one of four cost tiers before model selection:
| Tier | What it covers | Typical energy | Where it can run |
|---|---|---|---|
| L0 — lookup | cache hits, key-value reads, tiny embeddings | ~0.01 J | any node, including ARM edge |
| L1 — extraction | short summarization, classification, NER | ~0.05 J | CPU / small GPU |
| L2 — aggregation | RAG queries, mid-context summarization | ~0.3 J | mid-tier GPU (L4, A10, L40S) |
| L3 — reasoning | long-context reasoning, code gen, planning | ~6 J | top-tier GPU (H100, H200, B200, MI300) |
Classification is fast (sub-millisecond) and conservative — if there's any doubt, the router upgrades the tier rather than risking a quality regression. You can read the tier the router picked from the X-Tier response header.
What "cheapest capable" means
For a given tier, the router maintains a live ranking of every node in the mesh that can serve it, scored on:
- Per-token energy (joules per token) — the dominant term
- Local grid carbon intensity (gCO₂/kWh, hourly)
- Operator PUE — published per data centre
- Queue depth — nodes with backed-up queues drop in the ranking
- Network latency — if your account specifies a region preference
The top-ranked node gets the request. If it fails health-check mid-flight, the request fails over to the next-ranked node automatically.
Workloads: carbon-aware placement
For container/function workloads (not inference), placement happens once at deploy time and then periodically — the workload may migrate if the scheduler finds a meaningfully better node. Set the placement strategy in invisible.hcl:
workload "site" {
image = "nginx:alpine"
region = "auto" # cheapest cleanest grid available
# region = "eu" # constrain to EU jurisdictions
# region = "eu-fi" # pin to Finland
}
With region = "auto", the carbon-aware scheduler picks a node with low grid carbon intensity at deploy time. A batch job started in Virginia may finish in Helsinki if Nordic wind drops the energy cost meaningfully.
What you can control
| What | How |
|---|---|
| Pin a specific model | model: "llama-3.3-70b-instruct" instead of "auto" |
| Constrain to a region or jurisdiction | region: "eu-fi" in invisible.hcl |
| Set an energy ceiling per workload | energy_budget: "10 kJ/day" |
| Override tier classification | X-Force-Tier: L3 request header (audit-logged) |
How to verify the routing
Every response carries the routing decision in headers:
X-Tier: L2
X-Routed-To: nebius/eu-helsinki/h100-sxm5
X-Routing-Reason: cheapest-capable
The same routing-decision metadata lives on the receipt for that request. If a request lands somewhere unexpected, the receipt explains why.
Next
To see what the per-request record contains, read Energy receipts. To see the underlying joule measurement, read What is a joule, here.