Observability Sources

Langfuse

If your agent is already instrumented with Langfuse, connect it to Buzo and we pull retrievals + generations directly from your Langfuse project. No buzo-sdk install, no code changes.

How it works

Buzo stores your Langfuse API keys (secret key encrypted at rest) and runs a background worker that polls your Langfuse project on a short cadence. For each new trace, Buzo:

  • Emits a retrieval event for every SPAN that looks like a retriever call — name matches retriever/search/vectorstore, or output is an array of documents in the LangChain shape.
  • Emits a generation event for every Langfuse GENERATION observation, reading model + tokens + output.
  • Uses the Langfuse trace.id as the correlation anchor — same trace = correlated retrieval and generation in Buzo's matcher, so CITED_FLAGGED works out of the box without you stamping any ids.
Retrieved-content capture. When your retriever spans include each document's pageContent (the default when the LangChain Langfuse callback is installed), Buzo ingests it alongside the id and score — same semantics as the SDK's resultsCapture: 'plaintext' mode, capped at 16 KB per item. This extends citation matching to every retrieved vector, not just the subset Buzo has a content_snapshot for. If your spans only emit ids, Buzo still ingests normally and falls back to the stored snapshot.

Setup

  1. Install the Langfuse callback handler in your agent

    If you're on LangChain, follow Langfuse's official docs. Buzo doesn't replace the Langfuse integration — it consumes from it.

    LangChain + Langfuse
    import { CallbackHandler } from 'langfuse-langchain'
    
    const langfuse = new CallbackHandler({
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      baseUrl: 'https://cloud.langfuse.com',
    })
    
    const docs = await retriever.invoke('how do I reset my password?', {
      callbacks: [langfuse],
    })
  2. Create a Langfuse API key with read access

    In your Langfuse project: Settings → API Keys → Create new key. Buzo needs both the public and secret keys. Read access is enough; we never write to your Langfuse project.

  3. Connect the project to Buzo

    From the Buzo dashboard, go to Settings → Observability sources → Langfuse. Paste the public key, secret key, and host (default: https://cloud.langfuse.com). Pick the Buzo collection that traces should default to.

    Buzo starts polling within minutes and advances its own watermark per connection — nothing on your side to schedule or maintain.

Mapping traces to Buzo collections

Most customers have a single Buzo collection per Langfuse project — in that case you just pick the default and you're done. If you need per-trace routing (e.g. a product KB and a support KB in one Langfuse project), configure a tag rule when you create the connection:

tag_mapping example
{
  "metadataKey": "collection",
  "values": {
    "support-kb": "00000000-0000-0000-0000-000000000001",
    "product-kb": "00000000-0000-0000-0000-000000000002"
  }
}

Buzo then reads trace.metadata.collection (or any key you nominated) and routes the event accordingly. Falls back to the default when no rule matches.

What gets synced

Buzo fieldLangfuse source
retrieval_traces.query.textspan.input.query / span.input.question / raw string
retrieval_traces.results[].id / .scorespan.output[i] — LangChain documents or Pinecone matches. ID falls back to metadata.id / _id / vector_id.
retrieval_traces.results[].contentpageContent / page_content / content / text from the document. Trimmed to 16 KB per item. Omitted when the span doesn't carry it.
retrieval_traces.latencyMsspan.endTime − span.startTime
retrieval_traces.parentQueryIdtrace.id (the correlation anchor)
generation_traces.output.textobservation.output (type: GENERATION)
generation_traces.modelobservation.model
generation_traces.promptTokens / completionTokensobservation.usageDetails or observation.usage
generation_traces.runIdobservation.id
generation_traces.parentRunIdtrace.id (matches retrieval anchor)
Event metadatatrace.userId, sessionId, release, tags, scalar fields of trace.metadata

What we don't sync

  • System prompts — Langfuse captures them on generations; Buzo skips them because we don't need them to attribute citations.
  • Langfuse scores — not ingested yet, but we're planning to use them as extra signal for Buzo's classifier feedback loop.
  • Full observation.metadata — only scalar keys from trace.metadata are attached to Buzo events. Nested / array metadata on individual observations is dropped.

Troubleshooting

  • Retrievals show up but results array is empty. Your retriever spans aren't emitting documents in a shape Buzo recognizes. Either use the LangChain Langfuse callback (which emits pageContent + metadata) or explicitly set span.metadata.buzo_kind = 'retrieval' and ensure span.output is an array with id + score.
  • Generations have no tokens. Check that your Langfuse instrumentation populates usageDetails (newer path) or usage (legacy). Buzo reads both.
  • Connection shows “last_error: 401”. The secret key was rotated on Langfuse's side. Reconnect with a fresh key from Settings → Observability sources.
Credentials. Your Langfuse secret key is encrypted at rest with the same KMS pipeline Buzo uses for vector store credentials. Buzo never writes to your Langfuse project — the connection uses read-only endpoints.