What Agent Optimization Should Log (No Prompts)

Your optimization layer saves 60% on tokens — but when recall drops or latency spikes, engineers ask for the prompt. Storing full prompts in logs is a privacy liability and a storage cost. You need enough signal to debug routing, compression, and billing without a conversation archive.

Orqen logs structured metadata per request: counts, timings, tool names, plan decisions, and savings math. Not message bodies. Not system prompts. Not tool result payloads.

Debug without storing prompts

Production agent debugging usually needs answers to five questions:

Did routing forward the right tools? → recall@K
Which optimization stages ran? → optimization_plan trace
Where did latency go? → latency_breakdown
How much did we actually save? → honest token accounting
Did provider caching help? → provider_cached_tokens

All five are answerable from metadata. None require storing the user's message or the model's completion text in Orqen's database.

Privacy by design. Offline routing evaluation operates on metadata — candidate tool names, selected variants, called tools — not raw user prompts.

What Orqen logs per request

Each chat completion creates one structured log row — the billing spine and feedback loop:

# What Orqen logs per request (no raw prompts):
#
# Routing quality:  tools in/out, tools called, recall@K
# Savings:          tokens saved, compression techniques used
# Caching:          provider cached tokens, whether Orqen injected markers
# Latency:          total ms + per-stage breakdown
# Session:          session_id for multi-turn grouping

Field group	Example fields	Use
Tool routing	tools_in → tools_out, recall_at_k	Routing quality
Compression	compression_tokens_saved, techniques	Payload shrink
Caching	provider_cached_tokens, cache_injected	Provider cache hits
Latency	latency_ms, latency_breakdown	Pipeline vs upstream
Session	session_id, tools_called	Multi-turn grouping

latency_breakdown stages

Total latency splits into pipeline time vs upstream provider time. The breakdown stored on each log includes routing, compression, dedup, validation, and upstream — plus a slimmed optimization plan trace showing which stages ran and why.

Routing — embedding, intent scoring, optional reranking (see two-stage routing).
Compression — tool results, dedup, summarization, telegraphic passes.
Upstream — provider LLM call (usually the largest slice).

When pipeline time exceeds the plan's latency budget, the log flags it for dashboard filtering — without logging message content.

Optimization plan trace

Each request gets a single optimization plan that coordinates routing, compression, history tiers, and validation. Its trace answers "why did Orqen run conversation summary but not telegraphic compression?"

Context pressure — how full the window is relative to the model limit (see context window post).
Stage flags — which optimization stages enabled for this turn.
Recovery state — whether session recovery widened routing after a miss (recall misses).
Human-readable reasons — short strings explaining plan decisions for dashboard and log aggregation.

recall@K and routing metadata

After each tool-using response, Orqen computes:

recall@K = |tools_called ∩ tools_forwarded| / |tools_called|

NULL when no tools were called — not a failure, just not scored.

routing_trace stores candidate names and policy version for offline comparison — enough to tune routing without the user query text that produced the scores.

Honest cache accounting

Orqen tracks provider cached tokens from upstream usage responses. When compression would overlap tokens the provider already serves from cache at a discount, Orqen subtracts the overlap from reported savings — unless Orqen itself injected the cache markers, in which case both levers are credited honestly.

Full caching behavior in Context Caching: The LLM Cost Lever Most Agents Skip.

What is not stored

Orqen's request logs do not include:

User message content
Assistant completion text
System prompts
Tool result payloads
Full tool JSON schemas (schemas are deduplicated separately by hash in tool_schemas for analysis — not tied to request content)

Tool schemas are analyzed once per hash for Routing Quality scores — the analysis reads description text from the schema definition your agent registered, not from per-request logs. See tool description checklist.

Session grouping uses a session_id header your agent optionally sends — Orqen correlates turns without storing conversation text.

Open the dashboard

Sign up for Orqen and send a few agent requests.
Usage tab — tokens saved, tools in/out, recall@K, cached tokens.
Sessions view — per-session waterfall from latency_breakdown.
Routing Quality — schema scores independent of prompt logs.

For local offline analysis, the Orqen CLI can replay routing metadata from exported logs — still no prompt bodies required.

Next step: Sign up free · Dashboard setup · Introducing Orqen

What Agent Optimization Should Log (No Prompts)

Debug without storing prompts

What Orqen logs per request

latency_breakdown stages

Optimization plan trace

recall@K and routing metadata

Honest cache accounting

What is not stored

Open the dashboard

Stop Hardcoding GPT-4o: Task-Aware Model Routing

See Orqen optimize your agent payloads