API Reference
All request and response bodies are JSON. The base URL defaults to http://localhost:3000.
Authentication
When FIRNFLOW_API_KEY is configured, every protected endpoint requires Authorization: Bearer <token>. Two tiers exist:
- Read/write (
FIRNFLOW_API_KEY) —upsert,query,list,warmup. - Admin (
FIRNFLOW_ADMIN_API_KEY) —delete,index,fts-index,scalar-index,compact. The admin key also satisfies read/write routes. If no admin key is configured, the read/write key authorises admin routes too (single-key fallback).
/health is always public. /metrics is public unless FIRNFLOW_METRICS_TOKEN is set, in which case the same Bearer header is required. Token comparisons are constant-time (subtle::ConstantTimeEq). See configuration for the full env-var list, including the optional FIRNFLOW_RATE_LIMIT_RPS / FIRNFLOW_PREAUTH_IP_LIMIT_RPS rate-limit knobs.
Rejection responses on protected endpoints:
401 Unauthorized— header missing, malformed, or token unknown. IncludesWWW-Authenticate: Bearer realm="firnflow".403 Forbidden— valid token but the route requires admin scope and only the read/write key was presented.429 Too Many Requests— rejected by either rate limiter; includesRetry-Afterseconds.
Namespaces
Every data operation is scoped to a namespace. Namespace names must be lowercase alphanumeric with hyphens, and no longer than 64 characters. Each namespace maps to an isolated object-storage prefix under the configured FIRNFLOW_STORAGE_URI — for example s3://bucket/namespace/ or gs://bucket/tenants/acme/namespace/.
Valid: my-project, embeddings-v2, prod-search.
Invalid: My_Project (uppercase and underscores), a-very-long-name-that-exceeds-the-sixty-four-character-limit-imposed-by-firn.
Endpoints
Returns 200 OK with body ok. Use this for load balancer health checks and container readiness probes.
Example
curl http://localhost:3000/health
Response
ok
Returns all Prometheus metrics in text exposition format (text/plain; version=0.0.4). See the monitoring guide for the full metric list and PromQL examples.
Example
curl http://localhost:3000/metrics
Appends rows to the namespace's Lance table. The vector dimension and the vector kind (single-vector or multivector) are inferred from the first upsert and enforced on subsequent calls. After a successful write, all cached query results for this namespace are invalidated.
Each row carries one of two vector payload shapes, depending on the namespace's kind:
- Single-vector namespaces:
vector: float32[]— one dense vector of lengthdim. - Multivector namespaces:
vectors: float32[][]— a non-empty list of equal-length inner vectors. Used for ColBERT / ColPali / ColQwen2 late-interaction retrieval.
At most one of the two fields may be set on a row. The first row of the first upsert into a fresh namespace fixes the kind for its lifetime; payloads in the wrong shape return 400.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
rows | array | Yes | List of rows to insert |
rows[].id | u64 | Yes | Unique identifier for the row |
rows[].vector | float32[] | One of | Single dense vector (single-vector namespaces). Length must match the namespace dimension. |
rows[].vectors | float32[][] | One of | Bag of equal-length sub-vectors (multivector namespaces). Each inner vector must match the namespace's inner sub-vector dimension; outer list length is the per-row sub-vector count. |
rows[].text | string | No | Text payload for full-text search |
Response (200)
| Field | Type | Description |
|---|---|---|
upserted | integer | Number of rows accepted |
Examples
Single-vector upsert:
curl -X POST http://localhost:3000/ns/demo/upsert \
-H 'Content-Type: application/json' \
-d '{
"rows": [
{"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "text": "hello world"},
{"id": 2, "vector": [0.0, 1.0, 0.0, 0.0]}
]
}'
{"upserted": 2}
Multivector upsert:
curl -X POST http://localhost:3000/ns/demo-mv/upsert \
-H 'Content-Type: application/json' \
-d '{
"rows": [
{"id": 1, "vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]]},
{"id": 2, "vectors": [[0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0], [0.5, 0.5, 0.0, 0.0]]}
]
}'
Errors
400— invalid namespace name; vector dimension mismatch; payload shape mismatch (e.g.vectoron a multivector namespace or vice versa); bothvectorandvectorsset on the same row; empty inner sub-vector; mixed inner sub-vector dim within one row500— object-storage write failure (client gets generic error; full details logged server-side)
Queries the namespace through the cache-aside path. On a cache hit, the result is returned from RAM or NVMe with zero object-storage access. On a miss, the query runs against the configured backend via LanceDB and the result is cached.
Query modes
The query mode is determined by which fields are present. The vector field uses vector: float32[] for single-vector namespaces and vectors: float32[][] for multivector namespaces:
| Mode | Vector field | text | Description |
|---|---|---|---|
| Single-vector | vector set | absent | Nearest-neighbour search (L2 distance by default) |
| Multivector | vectors set | absent | Late-interaction MaxSim search against a multivector namespace (cosine distance; an IVF_PQ index is what makes this tractable on real corpora — un-indexed queries fall back to a brute-force scan) |
| FTS | none | set | BM25 full-text search (requires FTS index) |
| Hybrid | vector or vectors set | set | Both, fused via Reciprocal Rank Fusion (RRF) |
Request body
| Field | Type | Required | Description |
|---|---|---|---|
vector | float32[] | One of | Single-vector query payload. Length must match the namespace dimension. |
vectors | float32[][] | One of | Multivector query payload — a bag of equal-length sub-vectors. Each inner vector must match the namespace's inner sub-vector dimension. |
k | integer | Yes | Number of results to return |
nprobes | integer | No | IVF partitions to probe (default 20). Higher values trade latency for recall. |
text | string | No* | Text query for full-text or hybrid search |
* At least one of vector, vectors, or text must be present. Setting both vector and vectors on the same request returns 400.
Response (200)
| Field | Type | Description |
|---|---|---|
query_id | string | Deterministic hash of the query parameters (the cache key) |
results | array | Ordered list of matching rows |
results[].id | u64 | Row identifier |
results[].score | float32 | Distance (vector / multivector), BM25 score (FTS), or relevance score (hybrid) |
results[].vector | float32[] | The stored vector for single-vector hits; empty ([]) for multivector hits — the bag is intentionally not echoed. |
results[].text | string? | The stored text (null if none) |
Examples
Single-vector search:
curl -X POST http://localhost:3000/ns/demo/query \
-H 'Content-Type: application/json' \
-d '{"vector": [1.0, 0.0, 0.0, 0.0], "k": 5}'
Multivector search:
curl -X POST http://localhost:3000/ns/demo-mv/query \
-H 'Content-Type: application/json' \
-d '{"vectors": [[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0]], "k": 5}'
Full-text search:
curl -X POST http://localhost:3000/ns/demo/query \
-H 'Content-Type: application/json' \
-d '{"text": "search query terms", "k": 10}'
Hybrid search (single-vector + FTS):
curl -X POST http://localhost:3000/ns/demo/query \
-H 'Content-Type: application/json' \
-d '{
"vector": [1.0, 0.0, 0.0, 0.0],
"text": "search terms",
"k": 10,
"nprobes": 40
}'
_ingested_at
Returns rows in ingest order for "recent content" flows (landing pages, new-uploads feeds, back-catalogue browsing). Results are ordered by the reserved _ingested_at system column (microsecond server-side timestamp, set at first write and never mutated).
This endpoint intentionally bypasses the foyer cache. Pagination tails would push hot query results out of RAM and NVMe without an offsetting hit rate, so /list goes directly to the Lance dataset.
Query parameters
| Param | Default | Description |
|---|---|---|
order_by | _ingested_at | V1 supports the reserved system column only; other values return 400. User-column ordering will follow scalar-index support. Build the BTree via POST /ns/{namespace}/scalar-index to remove the full-fragment scan cost on large namespaces. |
order | desc | asc or desc. |
limit | 50 | Rows per page. Hard-capped at 500. |
cursor | none | Opaque token from the previous response's next_cursor. Value-based, survives concurrent writes. Format is implementation-defined — do not parse or construct by hand. |
Response (200)
| Field | Type | Description |
|---|---|---|
rows | array | Rows in the requested order |
rows[].id | u64 | Row identifier |
rows[].vector | float32[] | The stored vector |
rows[].text | string? | The stored text (null if none) |
rows[].ingested_at_micros | i64 | Server-side microsecond timestamp the row was first written |
next_cursor | string? | Pass verbatim as ?cursor= on the next call; null on the final page |
Status codes
| Code | When |
|---|---|
| 200 | Success |
| 400 | Invalid order_by, malformed cursor, or limit over 500 |
| 501 | Namespace's Lance table pre-dates the _ingested_at column; recreate the namespace to enable the endpoint |
Examples
First page, newest first:
curl 'http://localhost:3000/ns/demo/list?limit=50'
Next page via cursor:
curl 'http://localhost:3000/ns/demo/list?limit=50&cursor=0006012a9abb9c800000000000000007'
Removes every object under the namespace's prefix and evicts all cached query results. This is irreversible.
Response (200)
| Field | Type | Description |
|---|---|---|
objects_deleted | integer | Number of objects removed from the backend |
Example
curl -X DELETE http://localhost:3000/ns/demo
{"objects_deleted": 12}
Accepts a list of queries and runs them in a background task to populate the cache. Returns 202 Accepted immediately. Useful for warming the cache after a deployment or before expected traffic.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
queries | QueryRequest[] | Yes | List of query objects (same schema as /query) |
Response (202)
| Field | Type | Description |
|---|---|---|
queued | integer | Number of queries submitted for background execution |
Example
curl -X POST http://localhost:3000/ns/demo/warmup \
-H 'Content-Type: application/json' \
-d '{
"queries": [
{"vector": [1.0, 0.0, 0.0, 0.0], "k": 5},
{"vector": [0.0, 1.0, 0.0, 0.0], "k": 5}
]
}'
{"queued": 2}
firnflow_cache_misses_total metric to track how many warmup queries have completed. Failures inside the background task are logged server-side but do not affect the HTTP response.
Builds an IVF_PQ (Inverted File with Product Quantisation) index on the namespace's vector column. Returns 202 Accepted and builds in the background. Building an index dramatically reduces cold query latency (25x speedup on AWS S3).
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
kind | string | Yes | - | Index type. Only "ivf_pq" is supported. |
num_partitions | u32 | No | sqrt(row_count) | Number of IVF partitions |
num_sub_vectors | u32 | No | dim / 16 | Number of PQ sub-vectors (must divide dimension evenly) |
num_bits | u32 | No | 8 | PQ codebook bit width. Accepted values: 4 or 8. Setting 4 halves the per-vector index storage cost at the cost of some recall; 4-bit additionally requires num_sub_vectors to be even. |
Response (202)
{"status": "index build queued"}
Example
curl -X POST http://localhost:3000/ns/demo/index \
-H 'Content-Type: application/json' \
-d '{"kind": "ivf_pq"}'
firnflow_index_build_duration_seconds to track completion. Queries against the namespace continue to work during the build using linear scan.
Builds a BM25 full-text search index on the namespace's text column. Required before FTS or hybrid queries will return results. Returns 202 Accepted.
Request body
No request body required.
Response (202)
{"status": "fts index build queued"}
Example
curl -X POST http://localhost:3000/ns/demo/fts-index
text field before building an FTS index.
_ingested_at to accelerate /list (async)
Builds a BTree scalar index on the reserved _ingested_at column. With the index in place, /list cursor pages do an index range scan instead of a full-fragment scan, and the leading ORDER BY _ingested_at short-circuits the in-memory sort step. Returns 202 Accepted.
Request body
No request body required. v1 hardcodes the column to _ingested_at — the same constraint /list puts on order_by.
Response (202)
{"status": "scalar index build queued"}
Example
curl -X POST http://localhost:3000/ns/demo/scalar-index
POST /compact incrementally absorbs new rows into the existing BTree, so a separate rebuild is not needed after compaction. Operators monitor firnflow_index_build_duration_seconds{kind="scalar"} for completion.
Merges small Lance data fragments into fewer, larger files to reduce object-storage round-trips on cold queries. Returns 202 Accepted. Also invalidates the cache for this namespace, since file offsets change after compaction.
Request body
No request body required.
Response (202)
{"status": "compaction queued"}
Example
curl -X POST http://localhost:3000/ns/demo/compact
fragments_removed and fragments_added when the compaction completes.
Error responses
All errors return a JSON body with an error field.
| Status | Cause | Example |
|---|---|---|
400 |
Invalid namespace name, dimension mismatch, empty query, unsupported index kind | {"error": "invalid namespace: must be lowercase alphanumeric and hyphens, max 64 chars"} |
401 |
Missing, malformed, or unknown Authorization: Bearer header. Only emitted when an API key is configured. Includes WWW-Authenticate. |
{"error": "unauthorized"} |
403 |
Valid read/write key on an admin route while a separate admin key is configured. | {"error": "forbidden"} |
429 |
Rejected by the per-principal or pre-auth IP rate limiter. Response includes Retry-After in seconds. |
{"error": "rate limited"} |
500 |
Object-storage connectivity, cache failure, or internal error | {"error": "internal error"} |
On 500 errors, the full error details are logged server-side via tracing::error! but scrubbed from the client response to prevent leaking internal state.