API Reference

All request and response bodies are JSON. The base URL defaults to http://localhost:3000.

Authentication

When FIRNFLOW_API_KEY is configured, every protected endpoint requires Authorization: Bearer <token>. Two tiers exist:

/health is always public. /metrics is public unless FIRNFLOW_METRICS_TOKEN is set, in which case the same Bearer header is required. Token comparisons are constant-time (subtle::ConstantTimeEq). See configuration for the full env-var list, including the optional FIRNFLOW_RATE_LIMIT_RPS / FIRNFLOW_PREAUTH_IP_LIMIT_RPS rate-limit knobs.

Rejection responses on protected endpoints:

Service-level only
A token authorises operations against the firnflow process, not against a specific namespace. Per-tenant namespace isolation requires an authenticating gateway in front of Firn.

Namespaces

Every data operation is scoped to a namespace. Namespace names must be lowercase alphanumeric with hyphens, and no longer than 64 characters. Each namespace maps to an isolated object-storage prefix under the configured FIRNFLOW_STORAGE_URI — for example s3://bucket/namespace/ or gs://bucket/tenants/acme/namespace/.

Valid: my-project, embeddings-v2, prod-search.
Invalid: My_Project (uppercase and underscores), a-very-long-name-that-exceeds-the-sixty-four-character-limit-imposed-by-firn.

Endpoints

GET /health Liveness check

Returns 200 OK with body ok. Use this for load balancer health checks and container readiness probes.

Example

curl http://localhost:3000/health

Response

ok
GET /metrics Prometheus metrics

Returns all Prometheus metrics in text exposition format (text/plain; version=0.0.4). See the monitoring guide for the full metric list and PromQL examples.

Example

curl http://localhost:3000/metrics
POST /ns/{namespace}/upsert Insert or update vectors and text

Appends rows to the namespace's Lance table. The vector dimension and the vector kind (single-vector or multivector) are inferred from the first upsert and enforced on subsequent calls. After a successful write, all cached query results for this namespace are invalidated.

Each row carries one of two vector payload shapes, depending on the namespace's kind:

  • Single-vector namespaces: vector: float32[] — one dense vector of length dim.
  • Multivector namespaces: vectors: float32[][] — a non-empty list of equal-length inner vectors. Used for ColBERT / ColPali / ColQwen2 late-interaction retrieval.

At most one of the two fields may be set on a row. The first row of the first upsert into a fresh namespace fixes the kind for its lifetime; payloads in the wrong shape return 400.

Request body

FieldTypeRequiredDescription
rowsarrayYesList of rows to insert
rows[].idu64YesUnique identifier for the row
rows[].vectorfloat32[]One ofSingle dense vector (single-vector namespaces). Length must match the namespace dimension.
rows[].vectorsfloat32[][]One ofBag of equal-length sub-vectors (multivector namespaces). Each inner vector must match the namespace's inner sub-vector dimension; outer list length is the per-row sub-vector count.
rows[].textstringNoText payload for full-text search

Response (200)

FieldTypeDescription
upsertedintegerNumber of rows accepted

Examples

Single-vector upsert:

curl -X POST http://localhost:3000/ns/demo/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "text": "hello world"},
      {"id": 2, "vector": [0.0, 1.0, 0.0, 0.0]}
    ]
  }'
{"upserted": 2}

Multivector upsert:

curl -X POST http://localhost:3000/ns/demo-mv/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]]},
      {"id": 2, "vectors": [[0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0], [0.5, 0.5, 0.0, 0.0]]}
    ]
  }'

Errors

  • 400 — invalid namespace name; vector dimension mismatch; payload shape mismatch (e.g. vector on a multivector namespace or vice versa); both vector and vectors set on the same row; empty inner sub-vector; mixed inner sub-vector dim within one row
  • 500 — object-storage write failure (client gets generic error; full details logged server-side)
POST /ns/{namespace}/query Vector, full-text, or hybrid search

Queries the namespace through the cache-aside path. On a cache hit, the result is returned from RAM or NVMe with zero object-storage access. On a miss, the query runs against the configured backend via LanceDB and the result is cached.

Query modes

The query mode is determined by which fields are present. The vector field uses vector: float32[] for single-vector namespaces and vectors: float32[][] for multivector namespaces:

ModeVector fieldtextDescription
Single-vectorvector setabsentNearest-neighbour search (L2 distance by default)
Multivectorvectors setabsentLate-interaction MaxSim search against a multivector namespace (cosine distance; an IVF_PQ index is what makes this tractable on real corpora — un-indexed queries fall back to a brute-force scan)
FTSnonesetBM25 full-text search (requires FTS index)
Hybridvector or vectors setsetBoth, fused via Reciprocal Rank Fusion (RRF)

Request body

FieldTypeRequiredDescription
vectorfloat32[]One ofSingle-vector query payload. Length must match the namespace dimension.
vectorsfloat32[][]One ofMultivector query payload — a bag of equal-length sub-vectors. Each inner vector must match the namespace's inner sub-vector dimension.
kintegerYesNumber of results to return
nprobesintegerNoIVF partitions to probe (default 20). Higher values trade latency for recall.
textstringNo*Text query for full-text or hybrid search

* At least one of vector, vectors, or text must be present. Setting both vector and vectors on the same request returns 400.

Response (200)

FieldTypeDescription
query_idstringDeterministic hash of the query parameters (the cache key)
resultsarrayOrdered list of matching rows
results[].idu64Row identifier
results[].scorefloat32Distance (vector / multivector), BM25 score (FTS), or relevance score (hybrid)
results[].vectorfloat32[]The stored vector for single-vector hits; empty ([]) for multivector hits — the bag is intentionally not echoed.
results[].textstring?The stored text (null if none)

Examples

Single-vector search:

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"vector": [1.0, 0.0, 0.0, 0.0], "k": 5}'

Multivector search:

curl -X POST http://localhost:3000/ns/demo-mv/query \
  -H 'Content-Type: application/json' \
  -d '{"vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]], "k": 5}'

Full-text search:

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"text": "search query terms", "k": 10}'

Hybrid search (single-vector + FTS):

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [1.0, 0.0, 0.0, 0.0],
    "text": "search terms",
    "k": 10,
    "nprobes": 40
  }'
GET /ns/{namespace}/list Cursor-paginated list ordered by _ingested_at

Returns rows in ingest order for "recent content" flows (landing pages, new-uploads feeds, back-catalogue browsing). Results are ordered by the reserved _ingested_at system column (microsecond server-side timestamp, set at first write and never mutated).

This endpoint intentionally bypasses the foyer cache. Pagination tails would push hot query results out of RAM and NVMe without an offsetting hit rate, so /list goes directly to the Lance dataset.

Query parameters

ParamDefaultDescription
order_by_ingested_atV1 supports the reserved system column only; other values return 400. User-column ordering will follow scalar-index support. Build the BTree via POST /ns/{namespace}/scalar-index to remove the full-fragment scan cost on large namespaces.
orderdescasc or desc.
limit50Rows per page. Hard-capped at 500.
cursornoneOpaque token from the previous response's next_cursor. Value-based, survives concurrent writes. Format is implementation-defined — do not parse or construct by hand.

Response (200)

FieldTypeDescription
rowsarrayRows in the requested order
rows[].idu64Row identifier
rows[].vectorfloat32[]The stored vector
rows[].textstring?The stored text (null if none)
rows[].ingested_at_microsi64Server-side microsecond timestamp the row was first written
next_cursorstring?Pass verbatim as ?cursor= on the next call; null on the final page

Status codes

CodeWhen
200Success
400Invalid order_by, malformed cursor, or limit over 500
501Namespace's Lance table pre-dates the _ingested_at column; recreate the namespace to enable the endpoint

Examples

First page, newest first:

curl 'http://localhost:3000/ns/demo/list?limit=50'

Next page via cursor:

curl 'http://localhost:3000/ns/demo/list?limit=50&cursor=0006012a9abb9c800000000000000007'
DELETE /ns/{namespace} Delete a namespace and all its data

Removes every object under the namespace's prefix and evicts all cached query results. This is irreversible.

Response (200)

FieldTypeDescription
objects_deletedintegerNumber of objects removed from the backend

Example

curl -X DELETE http://localhost:3000/ns/demo
{"objects_deleted": 12}
POST /ns/{namespace}/warmup Pre-warm the cache (async)

Accepts a list of queries and runs them in a background task to populate the cache. Returns 202 Accepted immediately. Useful for warming the cache after a deployment or before expected traffic.

Request body

FieldTypeRequiredDescription
queriesQueryRequest[]YesList of query objects (same schema as /query)

Response (202)

FieldTypeDescription
queuedintegerNumber of queries submitted for background execution

Example

curl -X POST http://localhost:3000/ns/demo/warmup \
  -H 'Content-Type: application/json' \
  -d '{
    "queries": [
      {"vector": [1.0, 0.0, 0.0, 0.0], "k": 5},
      {"vector": [0.0, 1.0, 0.0, 0.0], "k": 5}
    ]
  }'
{"queued": 2}
Monitoring warmup progress
Watch the firnflow_cache_misses_total metric to track how many warmup queries have completed. Failures inside the background task are logged server-side but do not affect the HTTP response.
POST /ns/{namespace}/index Build an ANN vector index (async)

Builds an IVF_PQ (Inverted File with Product Quantisation) index on the namespace's vector column. Returns 202 Accepted and builds in the background. Building an index dramatically reduces cold query latency (25x speedup on AWS S3).

Request body

FieldTypeRequiredDefaultDescription
kindstringYes-Index type. Only "ivf_pq" is supported.
num_partitionsu32Nosqrt(row_count)Number of IVF partitions
num_sub_vectorsu32Nodim / 16Number of PQ sub-vectors (must divide dimension evenly)
num_bitsu32No8PQ codebook bit width. Accepted values: 4 or 8. Setting 4 halves the per-vector index storage cost at the cost of some recall; 4-bit additionally requires num_sub_vectors to be even.

Response (202)

{"status": "index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/index \
  -H 'Content-Type: application/json' \
  -d '{"kind": "ivf_pq"}'
Index build time
Index builds can take minutes for large datasets (147s for 100k vectors at dim=1536 on MinIO). Monitor firnflow_index_build_duration_seconds to track completion. Queries against the namespace continue to work during the build using linear scan.
POST /ns/{namespace}/fts-index Build a BM25 full-text search index (async)

Builds a BM25 full-text search index on the namespace's text column. Required before FTS or hybrid queries will return results. Returns 202 Accepted.

Request body

No request body required.

Response (202)

{"status": "fts index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/fts-index
Prerequisite
At least one row must have a non-null text field before building an FTS index.
POST /ns/{namespace}/scalar-index Build BTree index on _ingested_at to accelerate /list (async)

Builds a BTree scalar index on the reserved _ingested_at column. With the index in place, /list cursor pages do an index range scan instead of a full-fragment scan, and the leading ORDER BY _ingested_at short-circuits the in-memory sort step. Returns 202 Accepted.

Request body

No request body required. v1 hardcodes the column to _ingested_at — the same constraint /list puts on order_by.

Response (202)

{"status": "scalar index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/scalar-index
Idempotent and self-maintaining
Repeat calls rebuild the index in place. POST /compact incrementally absorbs new rows into the existing BTree, so a separate rebuild is not needed after compaction. Operators monitor firnflow_index_build_duration_seconds{kind="scalar"} for completion.
POST /ns/{namespace}/compact Compact data files (async)

Merges small Lance data fragments into fewer, larger files to reduce object-storage round-trips on cold queries. Returns 202 Accepted. Also invalidates the cache for this namespace, since file offsets change after compaction.

Request body

No request body required.

Response (202)

{"status": "compaction queued"}

Example

curl -X POST http://localhost:3000/ns/demo/compact
When to compact
Compact after many small upsert batches have created fragment sprawl. The server logs fragments_removed and fragments_added when the compaction completes.

Error responses

All errors return a JSON body with an error field.

StatusCauseExample
400 Invalid namespace name, dimension mismatch, empty query, unsupported index kind {"error": "invalid namespace: must be lowercase alphanumeric and hyphens, max 64 chars"}
401 Missing, malformed, or unknown Authorization: Bearer header. Only emitted when an API key is configured. Includes WWW-Authenticate. {"error": "unauthorized"}
403 Valid read/write key on an admin route while a separate admin key is configured. {"error": "forbidden"}
429 Rejected by the per-principal or pre-auth IP rate limiter. Response includes Retry-After in seconds. {"error": "rate limited"}
500 Object-storage connectivity, cache failure, or internal error {"error": "internal error"}

On 500 errors, the full error details are logged server-side via tracing::error! but scrubbed from the client response to prevent leaking internal state.