Routing & placement

The job of the router is simple to state: send each request to the cheapest capable silicon currently available on the mesh. Doing it well is the actual engineering. This page explains how it works so you can predict where your workloads land.

Inference: cost tiers

Each inference request is classified at the gateway into one of four cost tiers before model selection:

Tier	What it covers	Typical energy	Where it can run
L0 — lookup	cache hits, key-value reads, tiny embeddings	~0.01 J	any node, including ARM edge
L1 — extraction	short summarization, classification, NER	~0.05 J	CPU / small GPU
L2 — aggregation	RAG queries, mid-context summarization	~0.3 J	mid-tier GPU (L4, A10, L40S)
L3 — reasoning	long-context reasoning, code gen, planning	~6 J	top-tier GPU (H100, H200, B200, MI300)

Classification is fast (sub-millisecond) and conservative — if there's any doubt, the router upgrades the tier rather than risking a quality regression. You can read the tier the router picked from the X-Tier response header.

What "cheapest capable" means

For a given tier, the router maintains a live ranking of every node in the mesh that can serve it, scored on:

Per-token energy (joules per token) — the dominant term
Local grid carbon intensity (gCO₂/kWh, hourly)
Operator PUE — published per data centre
Queue depth — nodes with backed-up queues drop in the ranking
Network latency — if your account specifies a region preference

The top-ranked node gets the request. If it fails health-check mid-flight, the request fails over to the next-ranked node automatically.

Workloads: carbon-aware placement

For container/function workloads (not inference), placement happens once at deploy time and then periodically — the workload may migrate if the scheduler finds a meaningfully better node. Set the placement strategy in invisible.hcl:

workload "site" {
  image  = "nginx:alpine"
  region = "auto"   # cheapest cleanest grid available
  # region = "eu"    # constrain to EU jurisdictions
  # region = "eu-fi" # pin to Finland
}

With region = "auto", the carbon-aware scheduler picks a node with low grid carbon intensity at deploy time. A batch job started in Virginia may finish in Helsinki if Nordic wind drops the energy cost meaningfully.

What you can control

What	How
Pin a specific model	`model: "llama-3.3-70b-instruct"` instead of `"auto"`
Constrain to a region or jurisdiction	`region: "eu-fi"` in `invisible.hcl`
Set an energy ceiling per workload	`energy_budget: "10 kJ/day"`
Override tier classification	`X-Force-Tier: L3` request header (audit-logged)

How to verify the routing

Every response carries the routing decision in headers:

X-Tier: L2
X-Routed-To: nebius/eu-helsinki/h100-sxm5
X-Routing-Reason: cheapest-capable

The same routing-decision metadata lives on the receipt for that request. If a request lands somewhere unexpected, the receipt explains why.