Skip to content
All posts
Technical//4 MIN READ

What Agent Optimization Should Log (No Prompts)

Debug savings, recall, and latency without storing prompts. Orqen logs structured metadata per request — counts, timings, plan decisions, and honest savings math.

O

Orqen Team

orqen.app

Your optimization layer saves 60% on tokens — but when recall drops or latency spikes, engineers ask for the prompt. Storing full prompts in logs is a privacy liability and a storage cost. You need enough signal to debug routing, compression, and billing without a conversation archive.

Orqen logs structured metadata per request: counts, timings, tool names, plan decisions, and savings math. Not message bodies. Not system prompts. Not tool result payloads.

Debug without storing prompts

Production agent debugging usually needs answers to five questions:

  1. Did routing forward the right tools? → recall@K
  2. Which optimization stages ran? → optimization_plan trace
  3. Where did latency go? → latency_breakdown
  4. How much did we actually save? → honest token accounting
  5. Did provider caching help? → provider_cached_tokens

All five are answerable from metadata. None require storing the user's message or the model's completion text in Orqen's database.

Privacy by design. Offline routing evaluation operates on metadata — candidate tool names, selected variants, called tools — not raw user prompts.

What Orqen logs per request

Each chat completion creates one structured log row — the billing spine and feedback loop:

# What Orqen logs per request (no raw prompts):
#
# Routing quality:  tools in/out, tools called, recall@K
# Savings:          tokens saved, compression techniques used
# Caching:          provider cached tokens, whether Orqen injected markers
# Latency:          total ms + per-stage breakdown
# Session:          session_id for multi-turn grouping
Field groupExample fieldsUse
Tool routingtools_in → tools_out, recall_at_kRouting quality
Compressioncompression_tokens_saved, techniquesPayload shrink
Cachingprovider_cached_tokens, cache_injectedProvider cache hits
Latencylatency_ms, latency_breakdownPipeline vs upstream
Sessionsession_id, tools_calledMulti-turn grouping

latency_breakdown stages

Total latency splits into pipeline time vs upstream provider time. The breakdown stored on each log includes routing, compression, dedup, validation, and upstream — plus a slimmed optimization plan trace showing which stages ran and why.

  • Routing — embedding, intent scoring, optional reranking (see two-stage routing).
  • Compression — tool results, dedup, summarization, telegraphic passes.
  • Upstream — provider LLM call (usually the largest slice).

When pipeline time exceeds the plan's latency budget, the log flags it for dashboard filtering — without logging message content.

Optimization plan trace

Each request gets a single optimization plan that coordinates routing, compression, history tiers, and validation. Its trace answers "why did Orqen run conversation summary but not telegraphic compression?"

  • Context pressure — how full the window is relative to the model limit (see context window post).
  • Stage flags — which optimization stages enabled for this turn.
  • Recovery state — whether session recovery widened routing after a miss (recall misses).
  • Human-readable reasons — short strings explaining plan decisions for dashboard and log aggregation.

recall@K and routing metadata

After each tool-using response, Orqen computes:

recall@K = |tools_called ∩ tools_forwarded| / |tools_called|

NULL when no tools were called — not a failure, just not scored.

routing_trace stores candidate names and policy version for offline comparison — enough to tune routing without the user query text that produced the scores.

Honest cache accounting

Orqen tracks provider cached tokens from upstream usage responses. When compression would overlap tokens the provider already serves from cache at a discount, Orqen subtracts the overlap from reported savings — unless Orqen itself injected the cache markers, in which case both levers are credited honestly.

Full caching behavior in Context Caching: The LLM Cost Lever Most Agents Skip.

What is not stored

Orqen's request logs do not include:

  • User message content
  • Assistant completion text
  • System prompts
  • Tool result payloads
  • Full tool JSON schemas (schemas are deduplicated separately by hash in tool_schemas for analysis — not tied to request content)

Tool schemas are analyzed once per hash for Routing Quality scores — the analysis reads description text from the schema definition your agent registered, not from per-request logs. See tool description checklist.

Session grouping uses a session_id header your agent optionally sends — Orqen correlates turns without storing conversation text.

Open the dashboard

  1. Sign up for Orqen and send a few agent requests.
  2. Usage tab — tokens saved, tools in/out, recall@K, cached tokens.
  3. Sessions view — per-session waterfall from latency_breakdown.
  4. Routing Quality — schema scores independent of prompt logs.

For local offline analysis, the Orqen CLI can replay routing metadata from exported logs — still no prompt bodies required.

Tagged:observabilityprivacyagent-optimizationloggingdashboard
O

Orqen Team

We build the optimization layer for tool-heavy LLM agents. Our goal is to make agent costs predictable as your tool set grows.

Try Orqen free

250K saved tokens per month. Free forever. Two-line integration.

See your savings in the dashboard within seconds of your first request.