Overview
The Buzo REST API is a thin HTTP layer over the same ingest path the SDK uses. It exposes the two endpoints needed to ship retrieval and generation events to Buzo — nothing else is publicly accessible over REST today.
When to use the REST API
Most customers should use buzo-sdk — it handles batching, retries, circuit breaking, and the LangChain run-tree correlation for you. Reach for the REST API directly when one of the following applies:
- Non-Node stacks. The SDK is TypeScript / Node only. Python, Go, Rust, Java, .NET, PHP — call the REST API from your language of choice. This is the most common case.
- Edge runtimes with strict bundle limits. Environments where shipping
buzo-sdkplus@langchain/coreis too heavy — a rawfetch()is lighter. - Historical backfill. Replaying months of retrieval logs from a warehouse (Snowflake, BigQuery, S3). Ingest is idempotent on
clientEventId, so retries never double-count. - Custom ingest pipelines. A Kafka consumer, Temporal workflow, CLI tool, or any other process that isn't the typical live agent request.
- Self-hosted or proxy setups where you want to inspect the wire format before anything leaves your network.
Base URL
https://api.buzo.aiEndpoints
| Method | Path | Purpose |
|---|---|---|
POST | /v1/retrieval-traces | Ingest a batch of retrieval events (up to 100 per batch). |
POST | /v1/generation-traces | Ingest a batch of LLM generation events (up to 50 per batch). |
Conventions
- All requests and responses are JSON (
application/json). - All requests require a Bearer API key. Keys start with
ak_live_. - Successful writes return
200 OKwith a summary body.4xxerrors return a JSON body of the form{ "detail": "…" }.5xxerrors are transient and should be retried with exponential backoff. - Both ingest endpoints are idempotent on
(organization_id, clientEventId). Re-submitting the sameclientEventIdis a no-op and counts as a duplicate. - Vector content is optional. Retrieval events accept IDs and scores by default; passing the retrieved
contentper item (v0.4+, opt-in viaresultsCapture) extends citation matching to vectors Buzo has not scanned. See Capture modes.
Rate limits
| Endpoint | Requests / min | Max batch size | Body limit |
|---|---|---|---|
/v1/retrieval-traces | 600 | 100 events | 5 MB |
/v1/generation-traces | 300 | 50 events | 10 MB |
Generation limits are tighter because output payloads are larger. If you hit a rate limit, the response is 429 Too Many Requests with a Retry-After header.
Latency
Ingest is near-synchronous. The server returns as soon as the batch is written to the primary table; downstream work (citation matching, read-count rollup, alert dispatch) runs asynchronously, so your request time is not coupled to it.
Buzo