How it works
Buzo sits out of your agent's request path. It captures what your agent sees (retrieval) and what it says (generation), and correlates the two server-side to produce actionable findings.
The flow
Every agent request produces two potential events at the SDK:
- Retrieval trace — the query the user asked and the vector IDs + scores your retriever returned. Sent to
/v1/retrieval-traces. - Generation trace (optional, opt-in) — the LLM output, model name, and token usage. Sent to
/v1/generation-traces.
Both events carry LangChain run-tree identifiers that let Buzo correlate them server-side — the retriever and the LLM call in the same chain invocation share a common parent run, which is what the server matches on.
What gets captured
| Event | Payload | Default |
|---|---|---|
| Retrieval | Query text or hash, returned vector IDs + scores, k, latency, embedding model id, metadata | Captured on every instrumented retriever call |
| Generation | Output text (optional redaction), model, prompt/completion tokens, latency | Off — opt-in via outputCapture |
Vector content is never sent. Buzo already has it from its own scans. The SDK only ships IDs, scores, and metadata. Less bandwidth, less PII surface.
What doesn't get captured
- System prompts — never sent.
- Assistant tool calls — never sent.
- Vector content or document text from your store — already known to Buzo.
- User identities beyond what you explicitly attach via the
metadatafield.
Retrieval ↔ generation correlation
When outputCapture is enabled, Buzo joins each generation event with the retrieval events it shares a LangChain run tree with, using runId and parentRunId. The vector content Buzo already stores is matched against the captured LLM output using a tiered matcher that combines verbatim-substring detection with n-gram similarity for paraphrase.
A hit on a quarantined vector emits a CITED_FLAGGED alert: proof the flagged content reached a real user, not just the context.
What the alerts mean
| Alert | Triggered when | Strength |
|---|---|---|
SERVED_FLAGGED | A retrieval returns a vector currently marked QUARANTINED. | Exposure |
CITED_FLAGGED | The content of a quarantined vector actually appears in the LLM's final answer. | Evidence |
Operational guarantees
- Never in your agent's request path. Out-of-band, fire-and-forget POST after the original retrieval already returned.
- Never throws to your code. All errors are caught and forwarded to your configured
logger. - Bounded memory. Ring buffer drops oldest events under load — no unbounded growth, no back-pressure on your process.
- Circuit breaker. Repeated failures open a short-lived circuit so your process never retries into an outage.
- Edge-runtime aware. Call
buzo.flush()insidectx.waitUntil(...)for guaranteed delivery on Cloudflare Workers, Vercel Edge, and Next.js Edge.
Buzo