"Book a flight to Oslo." Embedding similarity finds search_flights and get_airports. It misses process_payment — because booking a flight inherently requires payment, but those phrases share almost no tokens and sit far apart in embedding space.
Single-stage embedding routers are fast and good enough for direct matches. They fail on indirect dependencies — tool chains where step two is implied but not stated. That is exactly where agent workflows live.
Where embeddings stop
Local embeddings compare two independent vectors: the routing query and each tool description. Cosine similarity is symmetric — it measures lexical and semantic overlap, not causal structure.
- Direct match: "weather in Oslo" →
get_weather✓ - Sibling confusion: "search files" vs "search database" — similar embeddings, wrong choice
- Indirect chain: "generate invoice" → needs
lookup_customerfirst — not obvious from embeddings alone - Negative selection: "send email" should not pull in
get_weather— embeddings cannot express exclusion well
Stage 1 alone works for narrow catalogs and first-turn queries. At 50+ tools with multi-step intents, you need a second pass that reads query and candidate together — cross-attention, not cosine distance.
Indirect tool dependencies
| User intent | Stage 1 picks | Also needed |
|---|---|---|
| Book a flight | search_flights | process_payment, select_seat |
| Generate invoice | create_invoice | lookup_customer, get_line_items |
| Deploy to prod | run_deploy | check_approval, run_tests |
Multi-turn context helps Stage 1 (see multi-turn routing), but it cannot infer domain chains that never appear in the conversation text. Stage 2 closes that gap.
Stage 1: fast recall
Orqen runs semantic routing inside its own infrastructure — no extra API keys on your side for the default path. It embeds:
- Multi-turn routing context (system domain, recent user messages, tools already called — not just the last message)
- Each tool's description, parameter docs, and optional
x-orqen-exampleswhen present
Stage 1 narrows large catalogs to a short candidate list quickly. Tool schemas are cached by content hash — static MCP catalogs are not re-processed every turn.
Session hints and intent scoring adjust ranks before candidates pass to Stage 2. Orqen measures recall@K after the model responds — if routing misses too often, session recovery widens the window (see recall misses).
Stage 2: reranking pass
For large catalogs and multi-step intents, Orqen can run a second pass that reads the query and each Stage 1 candidate together — cross-attention, not cosine distance. It sees "book flight" and "process_payment: charges card for booking" in the same context window.
# Stage 1 — fast semantic recall
# Embed routing context + tool descriptions
# Narrow a large catalog to a short candidate list quickly
# Strength: fast, cheap, good on direct matches
# Weakness: "book flight" and "process_payment" sit far apart in embedding space
#
# Stage 2 — optional reranking pass
# Reads query and candidate tools together
# Better at indirect dependencies, negative selection, domain chains
# Falls back to Stage 1 if unavailable or slow — request still succeedsStage 2 uses Orqen's internal infrastructure — not your provider key. It is invisible to the agent; only the final pruned tool list changes.
Latency tradeoff: a reranking pass adds overhead on top of the upstream LLM call. Orqen enables it only when tool count and session context justify the cost — not on every request.
Fail-open by design
Stage 2 is an enhancement, not a dependency. If anything fails — timeout, provider outage, missing internal capacity — the pipeline returns Stage 1 rankings unchanged. The request still succeeds; your agent never sees a routing error.
When reranking helps
Orqen enables the second pass when:
- The catalog is large enough that Stage 1 alone leaves ambiguous chains
- Your plan tier includes advanced routing features
- The reranker is available and within latency budget
Dashboard traces show whether reranking ran and how long routing took. Better tool descriptions improve both stages — see the routing quality checklist.
See two-stage routing live
- Sign up for Orqen Pro with a 30+ tool catalog containing chained workflows.
- Send multi-step intents ("book flight for two passengers").
- Compare tools_out — payment and booking tools should co-appear when Stage 2 is active.
- Check recall@K on chained tasks vs single-tool queries.
Next step: Sign up free · MCP tool sprawl · Introducing Orqen