Turn 12: "Show me invoice 8842." The agent calls get_invoice, returns the PDF metadata. Turn 13: "Now update it." Four words — zero nouns, zero tool names. A router that only reads the last message sees a vague edit request and forwards search_files instead of update_invoice.
Multi-turn agents fail routing on follow-ups constantly. The user's latest message is anaphoric — it refers to entities and actions established turns ago. Single-message routing treats every turn like a fresh chat.
The follow-up trap
Common follow-ups that break last-message routers:
- "Now update it" — refers to the object from the prior tool result
- "Send that to Sarah" — refers to content the assistant just produced
- "Try again with the prod API" — modifies parameters from turn 8
- "Same but for Q2" — repeats a workflow with one changed constraint
Embeddings of "now update it" alone sit near generic CRUD tools, write endpoints, and file editors. Without conversation context, cosine similarity picks the wrong sibling — especially in 50-tool MCP stacks where half the catalog contains "update" in the description.
The model often gets it right — if it has the tools. Follow-up failures are frequently a routing problem, not a reasoning problem. The model knows it needs update_invoice; the router never sent it.
Last-message routing fails
Naive tool routers extract only the final user message and embed it against tool descriptions. That works for first-turn queries ("What's the weather in Oslo?") and fails for everything that depends on session state.
| Signal | Last message only | Multi-turn context |
|---|---|---|
| Entity reference ("it") | Missing | Resolved from prior turns |
| Domain (billing vs files) | Ambiguous | From system prompt |
| Tool chain continuity | Missing | From already_called tools |
| User intent trajectory | One snapshot | Last 3 user messages |
When recall misses happen on follow-ups, see recall misses and session recovery for how Orqen widens K on the next turn.
Multi-turn routing context
Orqen's routing query is built from the conversation — not from the last user message alone:
# Multi-turn routing context combines:
#
# System prompt excerpt → domain ("You are a billing assistant...")
# Recent user messages → trajectory, not just "now update it"
# Recently called tools → session is already in an invoice workflow
#
# The router scores tools against this combined signal — not the last
# message alone.The embedder scores tools against this combined string. The final user message still dominates, but system domain, prior user intent, and recent tool usage disambiguate follow-ups.
Including recently called tools serves two purposes: it tells the router which capability area the session is in, and it helps the agent progress instead of looping on the same lookup tool.
Session-aware hints
Embedding similarity produces a base score per tool. Orqen also applies session-aware boosts before ranking:
- Tools the model already called in this session get boosted — if
get_invoiceran on turn 12, sibling tools likeupdate_invoiceare more likely to co-rank. - Tools from the previous turn's pruned set get boosted so the forwarded subset stays stable — improving provider prefix-cache hit rates.
- When caching is active, the stability boost is stronger so routing does not reshuffle tools every turn and invalidate the cache.
Low-confidence widening
For ambiguous or high-stakes turns, Orqen can run additional intent analysis and widen the routing window when confidence is low — before the model sees a too-narrow tool list.
That pass adds latency and runs only when the optimization plan calls for it (large catalog, high context pressure, or active session recovery). It prevents catastrophic misroutes on vague follow-ups — but it is not a substitute for clear tool descriptions on your side.
Cache-stable routing
Multi-turn routing and caching interact. If turn 12 forwards [get_invoice, list_invoices, search_customers] and turn 13 forwards a completely different triple because "now update it" embedded differently, the provider cache misses and you pay full input rates again.
Session hints and cache-stable routing keep subsets consistent when cache_control is present, while still allowing new tools to enter when the multi-turn context clearly demands it. Details in context caching.
Test with follow-up queries
Reproduce the follow-up trap in five minutes:
- Pick a two-turn workflow — lookup then modify — with 20+ tools in the catalog.
- Route through Orqen and send turn 1 (specific query).
- Send turn 2 with a vague follow-up ("now update it").
- Compare tools_out on turn 2 — multi-turn context should include the update tool, not just search tools.
Next step: Sign up free · Sessions docs · Tool description checklist