REST API

Overview

The Buzo REST API is a thin HTTP layer over the same ingest path the SDK uses. It exposes the two endpoints needed to ship retrieval and generation events to Buzo — nothing else is publicly accessible over REST today.

When to use the REST API

Most customers should use buzo-sdk — it handles batching, retries, circuit breaking, and the LangChain run-tree correlation for you. Reach for the REST API directly when one of the following applies:

  • Non-Node stacks. The SDK is TypeScript / Node only. Python, Go, Rust, Java, .NET, PHP — call the REST API from your language of choice. This is the most common case.
  • Edge runtimes with strict bundle limits. Environments where shipping buzo-sdk plus @langchain/core is too heavy — a raw fetch() is lighter.
  • Historical backfill. Replaying months of retrieval logs from a warehouse (Snowflake, BigQuery, S3). Ingest is idempotent on clientEventId, so retries never double-count.
  • Custom ingest pipelines. A Kafka consumer, Temporal workflow, CLI tool, or any other process that isn't the typical live agent request.
  • Self-hosted or proxy setups where you want to inspect the wire format before anything leaves your network.
Not available over REST today: reading back traces, querying collections or findings, triggering scans, and managing quarantine actions. Those live behind the dashboard's JWT auth and are not part of this public API.

Base URL

HTTPS
https://api.buzo.ai

Endpoints

MethodPathPurpose
POST/v1/retrieval-tracesIngest a batch of retrieval events (up to 100 per batch).
POST/v1/generation-tracesIngest a batch of LLM generation events (up to 50 per batch).

Conventions

  • All requests and responses are JSON (application/json).
  • All requests require a Bearer API key. Keys start with ak_live_.
  • Successful writes return 200 OK with a summary body. 4xx errors return a JSON body of the form { "detail": "…" }. 5xx errors are transient and should be retried with exponential backoff.
  • Both ingest endpoints are idempotent on (organization_id, clientEventId). Re-submitting the same clientEventId is a no-op and counts as a duplicate.
  • Vector content is optional. Retrieval events accept IDs and scores by default; passing the retrieved content per item (v0.4+, opt-in via resultsCapture) extends citation matching to vectors Buzo has not scanned. See Capture modes.

Rate limits

EndpointRequests / minMax batch sizeBody limit
/v1/retrieval-traces600100 events5 MB
/v1/generation-traces30050 events10 MB

Generation limits are tighter because output payloads are larger. If you hit a rate limit, the response is 429 Too Many Requests with a Retry-After header.

Latency

Ingest is near-synchronous. The server returns as soon as the batch is written to the primary table; downstream work (citation matching, read-count rollup, alert dispatch) runs asynchronously, so your request time is not coupled to it.