API Reference

All request and response bodies are JSON. The base URL defaults to http://localhost:3000.

Authentication

When FIRNFLOW_API_KEY is configured, every protected endpoint requires Authorization: Bearer <token>. Two tiers exist:

Read/write (FIRNFLOW_API_KEY) — upsert, query, list, warmup.
Admin (FIRNFLOW_ADMIN_API_KEY) — delete, index, fts-index, scalar-index, compact. The admin key also satisfies read/write routes. If no admin key is configured, the read/write key authorises admin routes too (single-key fallback).

/health is always public. /metrics is public unless FIRNFLOW_METRICS_TOKEN is set, in which case the same Bearer header is required. Token comparisons are constant-time (subtle::ConstantTimeEq). See configuration for the full env-var list, including the optional FIRNFLOW_RATE_LIMIT_RPS / FIRNFLOW_PREAUTH_IP_LIMIT_RPS rate-limit knobs.

Rejection responses on protected endpoints:

401 Unauthorized — header missing, malformed, or token unknown. Includes WWW-Authenticate: Bearer realm="firnflow".
403 Forbidden — valid token but the route requires admin scope and only the read/write key was presented.
429 Too Many Requests — rejected by either rate limiter; includes Retry-After seconds.

Service-level only

A token authorises operations against the firnflow process, not against a specific namespace. Per-tenant namespace isolation requires an authenticating gateway in front of Firn.

Namespaces

Every data operation is scoped to a namespace. Namespace names must be lowercase alphanumeric with hyphens, and no longer than 64 characters. Each namespace maps to an isolated object-storage prefix under the configured FIRNFLOW_STORAGE_URI — for example s3://bucket/namespace/ or gs://bucket/tenants/acme/namespace/.

Valid: my-project, embeddings-v2, prod-search.
Invalid: My_Project (uppercase and underscores), a-very-long-name-that-exceeds-the-sixty-four-character-limit-imposed-by-firn.

Endpoints

GET /health Liveness check

Returns 200 OK with body ok. Use this for load balancer health checks and container readiness probes.

Example

curl http://localhost:3000/health

Response

ok

GET /metrics Prometheus metrics

Returns all Prometheus metrics in text exposition format (text/plain; version=0.0.4). See the monitoring guide for the full metric list and PromQL examples.

Example

curl http://localhost:3000/metrics

POST /ns/{namespace}/upsert Insert or update vectors and text

Appends rows to the namespace's Lance table. The vector dimension and the vector kind (single-vector or multivector) are inferred from the first upsert and enforced on subsequent calls. After a successful write, all cached query results for this namespace are invalidated.

Each row carries one of two vector payload shapes, depending on the namespace's kind:

Single-vector namespaces: vector: float32[] — one dense vector of length dim.
Multivector namespaces: vectors: float32[][] — a non-empty list of equal-length inner vectors. Used for ColBERT / ColPali / ColQwen2 late-interaction retrieval.

At most one of the two fields may be set on a row. The first row of the first upsert into a fresh namespace fixes the kind for its lifetime; payloads in the wrong shape return 400.

Request body

Field	Type	Required	Description
`rows`	array	Yes	List of rows to insert
`rows[].id`	u64	Yes	Unique identifier for the row
`rows[].vector`	float32[]	One of	Single dense vector (single-vector namespaces). Length must match the namespace dimension.
`rows[].vectors`	float32[][]	One of	Bag of equal-length sub-vectors (multivector namespaces). Each inner vector must match the namespace's inner sub-vector dimension; outer list length is the per-row sub-vector count.
`rows[].text`	string	No	Text payload for full-text search

Response (200)

Field	Type	Description
`upserted`	integer	Number of rows accepted

Examples

Single-vector upsert:

curl -X POST http://localhost:3000/ns/demo/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "text": "hello world"},
      {"id": 2, "vector": [0.0, 1.0, 0.0, 0.0]}
    ]
  }'

{"upserted": 2}

Multivector upsert:

curl -X POST http://localhost:3000/ns/demo-mv/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]]},
      {"id": 2, "vectors": [[0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0], [0.5, 0.5, 0.0, 0.0]]}
    ]
  }'

Errors

400 — invalid namespace name; vector dimension mismatch; payload shape mismatch (e.g. vector on a multivector namespace or vice versa); both vector and vectors set on the same row; empty inner sub-vector; mixed inner sub-vector dim within one row
500 — object-storage write failure (client gets generic error; full details logged server-side)

POST /ns/{namespace}/query Vector, full-text, or hybrid search

Queries the namespace through the cache-aside path. On a cache hit, the result is returned from RAM or NVMe with zero object-storage access. On a miss, the query runs against the configured backend via LanceDB and the result is cached.

Query modes

The query mode is determined by which fields are present. The vector field uses vector: float32[] for single-vector namespaces and vectors: float32[][] for multivector namespaces:

Mode	Vector field	`text`	Description
Single-vector	`vector` set	absent	Nearest-neighbour search (L2 distance by default)
Multivector	`vectors` set	absent	Late-interaction MaxSim search against a multivector namespace (cosine distance; an IVF_PQ index is what makes this tractable on real corpora — un-indexed queries fall back to a brute-force scan)
FTS	none	set	BM25 full-text search (requires FTS index)
Hybrid	`vector` or `vectors` set	set	Both, fused via Reciprocal Rank Fusion (RRF)

Request body

Field	Type	Required	Description
`vector`	float32[]	One of	Single-vector query payload. Length must match the namespace dimension.
`vectors`	float32[][]	One of	Multivector query payload — a bag of equal-length sub-vectors. Each inner vector must match the namespace's inner sub-vector dimension.
`k`	integer	Yes	Number of results to return
`nprobes`	integer	No	IVF partitions to probe (default 20). Higher values trade latency for recall.
`text`	string	No*	Text query for full-text or hybrid search

* At least one of vector, vectors, or text must be present. Setting both vector and vectors on the same request returns 400.

Response (200)

Field	Type	Description
`query_id`	string	Deterministic hash of the query parameters (the cache key)
`results`	array	Ordered list of matching rows
`results[].id`	u64	Row identifier
`results[].score`	float32	Distance (vector / multivector), BM25 score (FTS), or relevance score (hybrid)
`results[].vector`	float32[]	The stored vector for single-vector hits; empty (`[]`) for multivector hits — the bag is intentionally not echoed.
`results[].text`	string?	The stored text (null if none)

Examples

Single-vector search:

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"vector": [1.0, 0.0, 0.0, 0.0], "k": 5}'

Multivector search:

curl -X POST http://localhost:3000/ns/demo-mv/query \
  -H 'Content-Type: application/json' \
  -d '{"vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]], "k": 5}'

Full-text search:

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"text": "search query terms", "k": 10}'

Hybrid search (single-vector + FTS):

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [1.0, 0.0, 0.0, 0.0],
    "text": "search terms",
    "k": 10,
    "nprobes": 40
  }'

GET /ns/{namespace}/list Cursor-paginated list ordered by _ingested_at

Returns rows in ingest order for "recent content" flows (landing pages, new-uploads feeds, back-catalogue browsing). Results are ordered by the reserved _ingested_at system column (microsecond server-side timestamp, set at first write and never mutated).

This endpoint intentionally bypasses the foyer cache. Pagination tails would push hot query results out of RAM and NVMe without an offsetting hit rate, so /list goes directly to the Lance dataset.

Query parameters

Param	Default	Description
`order_by`	`_ingested_at`	V1 supports the reserved system column only; other values return 400. User-column ordering will follow scalar-index support. Build the BTree via `POST /ns/{namespace}/scalar-index` to remove the full-fragment scan cost on large namespaces.
`order`	`desc`	`asc` or `desc`.
`limit`	50	Rows per page. Hard-capped at 500.
`cursor`	none	Opaque token from the previous response's `next_cursor`. Value-based, survives concurrent writes. Format is implementation-defined — do not parse or construct by hand.

Response (200)

Field	Type	Description
`rows`	array	Rows in the requested order
`rows[].id`	u64	Row identifier
`rows[].vector`	float32[]	The stored vector
`rows[].text`	string?	The stored text (null if none)
`rows[].ingested_at_micros`	i64	Server-side microsecond timestamp the row was first written
`next_cursor`	string?	Pass verbatim as `?cursor=` on the next call; `null` on the final page

Status codes

Code	When
200	Success
400	Invalid `order_by`, malformed `cursor`, or `limit` over 500
501	Namespace's Lance table pre-dates the `_ingested_at` column; recreate the namespace to enable the endpoint

Examples

First page, newest first:

curl 'http://localhost:3000/ns/demo/list?limit=50'

Next page via cursor:

curl 'http://localhost:3000/ns/demo/list?limit=50&cursor=0006012a9abb9c800000000000000007'

DELETE /ns/{namespace} Delete a namespace and all its data

Removes every object under the namespace's prefix and evicts all cached query results. This is irreversible.

Response (200)

Field	Type	Description
`objects_deleted`	integer	Number of objects removed from the backend

Example

curl -X DELETE http://localhost:3000/ns/demo

{"objects_deleted": 12}

POST /ns/{namespace}/warmup Pre-warm the cache (async)

Accepts a list of queries and runs them in a background task to populate the cache. Returns 202 Accepted immediately. Useful for warming the cache after a deployment or before expected traffic.

Request body

Field	Type	Required	Description
`queries`	QueryRequest[]	Yes	List of query objects (same schema as `/query`)

Response (202)

Field	Type	Description
`queued`	integer	Number of queries submitted for background execution

Example

curl -X POST http://localhost:3000/ns/demo/warmup \
  -H 'Content-Type: application/json' \
  -d '{
    "queries": [
      {"vector": [1.0, 0.0, 0.0, 0.0], "k": 5},
      {"vector": [0.0, 1.0, 0.0, 0.0], "k": 5}
    ]
  }'

{"queued": 2}

Monitoring warmup progress

Watch the firnflow_cache_misses_total metric to track how many warmup queries have completed. Failures inside the background task are logged server-side but do not affect the HTTP response.

POST /ns/{namespace}/index Build an ANN vector index (async)

Builds an IVF_PQ (Inverted File with Product Quantisation) index on the namespace's vector column. Returns 202 Accepted and builds in the background. Building an index dramatically reduces cold query latency (25x speedup on AWS S3).

Request body

Field	Type	Required	Default	Description
`kind`	string	Yes	-	Index type. Only `"ivf_pq"` is supported.
`num_partitions`	u32	No	sqrt(row_count)	Number of IVF partitions
`num_sub_vectors`	u32	No	dim / 16	Number of PQ sub-vectors (must divide dimension evenly)
`num_bits`	u32	No	8	PQ codebook bit width. Accepted values: 4 or 8. Setting 4 halves the per-vector index storage cost at the cost of some recall; 4-bit additionally requires `num_sub_vectors` to be even.

Response (202)

{"status": "index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/index \
  -H 'Content-Type: application/json' \
  -d '{"kind": "ivf_pq"}'

Index build time

Index builds can take minutes for large datasets (147s for 100k vectors at dim=1536 on MinIO). Monitor firnflow_index_build_duration_seconds to track completion. Queries against the namespace continue to work during the build using linear scan.

POST /ns/{namespace}/fts-index Build a BM25 full-text search index (async)

Builds a BM25 full-text search index on the namespace's text column. Required before FTS or hybrid queries will return results. Returns 202 Accepted.

Request body

No request body required.

Response (202)

{"status": "fts index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/fts-index

Prerequisite

At least one row must have a non-null text field before building an FTS index.

POST /ns/{namespace}/scalar-index Build BTree index on _ingested_at to accelerate /list (async)

Builds a BTree scalar index on the reserved _ingested_at column. With the index in place, /list cursor pages do an index range scan instead of a full-fragment scan, and the leading ORDER BY _ingested_at short-circuits the in-memory sort step. Returns 202 Accepted.

Request body

No request body required. v1 hardcodes the column to _ingested_at — the same constraint /list puts on order_by.

Response (202)

{"status": "scalar index build queued"}

Example

curl -X POST http://localhost:3000/ns/demo/scalar-index

Idempotent and self-maintaining

Repeat calls rebuild the index in place. POST /compact incrementally absorbs new rows into the existing BTree, so a separate rebuild is not needed after compaction. Operators monitor firnflow_index_build_duration_seconds{kind="scalar"} for completion.

POST /ns/{namespace}/compact Compact data files (async)

Merges small Lance data fragments into fewer, larger files to reduce object-storage round-trips on cold queries. Returns 202 Accepted. Also invalidates the cache for this namespace, since file offsets change after compaction.

Request body

No request body required.

Response (202)

{"status": "compaction queued"}

Example

curl -X POST http://localhost:3000/ns/demo/compact

When to compact

Compact after many small upsert batches have created fragment sprawl. The server logs fragments_removed and fragments_added when the compaction completes.

Error responses

All errors return a JSON body with an error field.

Status	Cause	Example
`400`	Invalid namespace name, dimension mismatch, empty query, unsupported index kind	`{"error": "invalid namespace: must be lowercase alphanumeric and hyphens, max 64 chars"}`
`401`	Missing, malformed, or unknown `Authorization: Bearer` header. Only emitted when an API key is configured. Includes `WWW-Authenticate`.	`{"error": "unauthorized"}`
`403`	Valid read/write key on an admin route while a separate admin key is configured.	`{"error": "forbidden"}`
`429`	Rejected by the per-principal or pre-auth IP rate limiter. Response includes `Retry-After` in seconds.	`{"error": "rate limited"}`
`500`	Object-storage connectivity, cache failure, or internal error	`{"error": "internal error"}`

On 500 errors, the full error details are logged server-side via tracing::error! but scrubbed from the client response to prevent leaking internal state.