Quickstart

Get Firn running locally with Docker. The first docker compose up --build compiles the server from source, so give it a few minutes; later starts take seconds.

Prerequisites

Docker and Docker Compose V2

Default-open local stack

This quickstart talks to a default-open local stack — no Authorization header is required. Before exposing Firn on a network you do not trust, set FIRNFLOW_API_KEY and send Authorization: Bearer <token> with every request. See deployment and configuration for the full story.

1. Clone and start the stack

This launches MinIO (local S3) and the Firn API server together.

git clone https://github.com/gordonmurray/firnflow
cd firnflow
docker compose up --build

Once you see listening on 0.0.0.0:3000, the API is ready. MinIO is available at localhost:9000 (API) and localhost:9001 (console, credentials: minioadmin / minioadmin).

2. Check health

curl http://localhost:3000/health

Expected response:

ok

3. Upsert vectors

Insert a few vectors into the demo namespace. Firn auto-detects the vector dimension from the first upsert.

curl -X POST http://localhost:3000/ns/demo/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vector": [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]},
      {"id": 2, "vector": [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]},
      {"id": 3, "vector": [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]}
    ]
  }'

Response:

{"upserted": 3}

4. Query for nearest neighbours

Search the demo namespace for the 2 closest vectors.

curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{"vector": [1.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], "k": 2}'

Response (first query hits the backend, populates the cache):

{
  "query_id": "a1b2c3d4",
  "results": [
    {"id": 1, "score": 0.01, "vector": [1.0, 0.0, ...], "text": null, "ingested_at_micros": 1718000000000000},
    {"id": 2, "score": 0.99, "vector": [0.0, 1.0, ...], "text": null, "ingested_at_micros": 1718000000000000}
  ]
}

Hits carry the stored vector by default. If you only need ids, scores, and text, add "include_vector": false to the request — at realistic dimensions the vectors are most of the response bytes.

Cache in action

Run the same query again. The second call returns from cache with zero object-storage requests. You can verify this at /metrics.

5. Try semantic caching

The exact cache only helps when the same query repeats. For single-vector workloads, you can opt into semantic caching so a near-duplicate query can reuse a previous top-k result when cosine similarity clears your threshold.

# First query: populates the exact cache and the semantic sidecar.
curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
    "k": 2,
    "semantic_cache": {"enabled": true, "min_similarity": 0.995}
  }'

# Near-duplicate query: exact cache misses, semantic cache may hit.
curl -X POST http://localhost:3000/ns/demo/query \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.0, 0.999, 0.001, 0.0, 0.0, 0.0, 0.0, 0.0],
    "k": 2,
    "semantic_cache": {"enabled": true, "min_similarity": 0.995}
  }'

Approximate reuse

A semantic-cache hit reuses an earlier result; it does not run a fresh search for the new vector. Start with the default threshold (0.995) and lower min_similarity only after measuring result quality for your model and corpus. V1 supports single-vector queries only, with no text or vectors field.

6. Add text for full-text search

Upsert rows with both vectors and text, then build an FTS index.

# Upsert with text
curl -X POST http://localhost:3000/ns/articles/upsert \
  -H 'Content-Type: application/json' \
  -d '{
    "rows": [
      {"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "text": "Introduction to vector databases"},
      {"id": 2, "vector": [0.0, 1.0, 0.0, 0.0], "text": "Full-text search with BM25 scoring"},
      {"id": 3, "vector": [0.0, 0.0, 1.0, 0.0], "text": "Hybrid search combines vector and text"}
    ]
  }'

# Build the FTS index (async, returns 202)
curl -X POST http://localhost:3000/ns/articles/fts-index

Then run a text search:

curl -X POST http://localhost:3000/ns/articles/query \
  -H 'Content-Type: application/json' \
  -d '{"text": "vector databases", "k": 2}'

Or a hybrid search (vector + text together):

curl -X POST http://localhost:3000/ns/articles/query \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.9, 0.1, 0.0, 0.0],
    "text": "vector databases",
    "k": 2
  }'

7. Check the metrics

See exactly how many object-storage requests Firn has saved you.

curl -s http://localhost:3000/metrics | grep firnflow

Key metric families to look for:

firnflow_cache_hits_total{namespace="demo"} ...
firnflow_cache_misses_total{namespace="demo"} ...
firnflow_semantic_cache_hits_total{namespace="demo"} ...
firnflow_s3_requests_total{namespace="demo",operation="query"} ...
firnflow_active_namespaces ...
firnflow_cached_handles ...

The exact cache miss count increments when a query misses the RAM/NVMe result cache. If semantic caching hits after that, firnflow_semantic_cache_hits_total increments and firnflow_s3_requests_total{operation="query"} does not. firnflow_cached_handles tracks namespaces with a warm LanceDB connection in the in-process pool — the first request to any namespace opens the connection, every later request reuses it.

Next steps

API reference - all nine endpoints with full request and response schemas
Configuration - tune cache sizes, connect to AWS S3, native GCS, or any other supported object-storage backend
Deployment - run Firn in production with Docker or on Kubernetes
Monitoring - Prometheus metrics, PromQL examples, and alerting rules