Concepts

How it works

Buzo sits out of your agent's request path. It captures what your agent sees (retrieval) and what it says (generation), and correlates the two server-side to produce actionable findings.

The flow

Every agent request produces two potential events at the SDK:

  1. Retrieval trace — the query the user asked and the vector IDs + scores your retriever returned. Sent to /v1/retrieval-traces.
  2. Generation trace (optional, opt-in) — the LLM output, model name, and token usage. Sent to /v1/generation-traces.

Both events carry LangChain run-tree identifiers that let Buzo correlate them server-side — the retriever and the LLM call in the same chain invocation share a common parent run, which is what the server matches on.

What gets captured

EventPayloadDefault
RetrievalQuery text or hash, returned vector IDs + scores, k, latency, embedding model id, metadataCaptured on every instrumented retriever call
GenerationOutput text (optional redaction), model, prompt/completion tokens, latencyOff — opt-in via outputCapture

Vector content is never sent. Buzo already has it from its own scans. The SDK only ships IDs, scores, and metadata. Less bandwidth, less PII surface.

What doesn't get captured

  • System prompts — never sent.
  • Assistant tool calls — never sent.
  • Vector content or document text from your store — already known to Buzo.
  • User identities beyond what you explicitly attach via the metadata field.

Retrieval ↔ generation correlation

When outputCapture is enabled, Buzo joins each generation event with the retrieval events it shares a LangChain run tree with, using runId and parentRunId. The vector content Buzo already stores is matched against the captured LLM output using a tiered matcher that combines verbatim-substring detection with n-gram similarity for paraphrase.

A hit on a quarantined vector emits a CITED_FLAGGED alert: proof the flagged content reached a real user, not just the context.

What the alerts mean

AlertTriggered whenStrength
SERVED_FLAGGEDA retrieval returns a vector currently marked QUARANTINED.Exposure
CITED_FLAGGEDThe content of a quarantined vector actually appears in the LLM's final answer.Evidence

Operational guarantees

  • Never in your agent's request path. Out-of-band, fire-and-forget POST after the original retrieval already returned.
  • Never throws to your code. All errors are caught and forwarded to your configured logger.
  • Bounded memory. Ring buffer drops oldest events under load — no unbounded growth, no back-pressure on your process.
  • Circuit breaker. Repeated failures open a short-lived circuit so your process never retries into an outage.
  • Edge-runtime aware. Call buzo.flush() inside ctx.waitUntil(...) for guaranteed delivery on Cloudflare Workers, Vercel Edge, and Next.js Edge.