Capture modes
Three independent dials control what leaves the SDK: queryCapture for retrieval queries, outputCapture for LLM generations, and resultsCapture for retrieved-vector content. Queries default to plaintext because that is where the analytical value is. Generations and results default to off / ids-only respectively — both commonly carry data that should only egress deliberately.
queryCapture
The user's raw query text is the single most useful signal for grouping findings by topic, surfacing common retrieval failure modes, and attributing retrieval health over time.
| Mode | What ships | When to use |
|---|---|---|
plaintext (default) | The full query text, UTF-8. | Default. Maximum analytical value. The customer is responsible for PII handling under the DPA with Buzo. |
hash | SHA-256 hex of the query. No plaintext ever leaves the SDK. | Regulated deployments where queries may carry direct identifiers. Buzo can still detect repeated queries and join with reads, but cannot see the text. |
redact | Query text with customer-supplied redactPatterns replaced in-place (e.g. emails → <EMAIL>). | Most queries are safe but some patterns must be scrubbed. The regexes run client-side before any network I/O. |
new Buzo({
apiKey: process.env.BUZO_API_KEY!,
queryCapture: 'redact',
redactPatterns: [
{ pattern: /[\w.-]+@[\w.-]+\.\w+/g, replacement: '<EMAIL>' },
{ pattern: /\b\d{3}-\d{2}-\d{4}\b/g, replacement: '<SSN>' },
],
})outputCapture v0.2+
LLM outputs are richer and more sensitive than queries — they often echo back parts of the user's message or include PII the user authored. Output capture is opt-in on purpose.
| Mode | What ships | When to use |
|---|---|---|
off (default) | Nothing. The LangChain handleLLMStart/handleLLMEnd hooks short-circuit — no map allocation, no timing tracked. | Default. Pick this unless you have actively decided to ship generations to Buzo. |
redacted | Output text with outputRedactPatterns replaced before egress. Separate from redactPatterns. | You want CITED_FLAGGED attribution but must scrub known PII patterns (emails, SSNs, card numbers, etc.) on the way out. |
plaintext | The full LLM generation text. | Maximum signal for citation matching. Reserve for environments with a DPA in place and customer-approved handling. |
redacted instead.resultsCapture v0.4+
Controls whether each retrieved vector's pageContent is shipped alongside its id and score. Defaults to ids-only — the pre-0.4 wire format — so upgrading the SDK does not change what leaves the network.
Citation matching relies on comparing retrieved content against the LLM output. With ids-only, Buzo can only match vectors that have a server-side content_snapshot from a prior scan. Opting in to plaintext or redacted makes every retrieved vector matchable, including those never scanned by Buzo.
| Mode | What ships | When to use |
|---|---|---|
ids-only (default) | Only { id, score } per result. Identical to pre-0.4 behaviour. | Default. Zero change to wire payload. Citation matching works for the scanned subset of the corpus. |
redacted | Each result's content with resultsRedactPatternsreplaced before egress. | Retrieved chunks may echo user-authored data (prior messages, support tickets). Scrub known patterns on the way out. |
plaintext | Full content per retrieved vector, UTF-8. | Maximum signal for citation matching across the entire corpus. Reserve for environments with a DPA in place. |
new Buzo({
apiKey: process.env.BUZO_API_KEY!,
resultsCapture: 'redacted',
resultsRedactPatterns: [
{ pattern: /[\w.-]+@[\w.-]+\.\w+/g, replacement: '<EMAIL>' },
{ pattern: /\b\d{16}\b/g, replacement: '<CC>' },
],
})content is capped server-side at 16 KB. Oversized chunks are rejected rather than truncated — truncate retriever-side if your chunks exceed that limit.What the SDK does with each mode
- Capture is synchronous. Redaction and hashing happen before the event enters the buffer, so a subsequent
flush()is guaranteed to include it. - Modes are independent. You can run
queryCapture: 'plaintext'withoutputCapture: 'redacted'andresultsCapture: 'ids-only', or any combination. - Disabled entirely. Set
disabled: truein tests or local dev — no network I/O, no buffer growth, capture modes are irrelevant.
Buzo