Architecture
How Firn turns cheap object storage into a fast, multi-tenant search engine.
Tiered storage
Firn's core idea is a three-tier storage hierarchy. Data lives permanently on S3, but queries are served from the fastest available tier:
| Tier | Backed by | Latency | Role |
|---|---|---|---|
| L1: RAM | foyer in-memory cache | sub-microsecond | Hottest queries, configurable size |
| L2: NVMe | foyer disk cache | microseconds | Overflow from RAM tier, larger capacity |
| L3: Object storage | LanceDB on S3/MinIO/R2 | milliseconds to seconds | Source of truth, unlimited capacity, near-zero idle cost |
The cache stores complete, serialised query result sets. When a query hits the cache, zero bytes are read from S3. When it misses, the result is fetched from S3 and stored in the cache for future use.
Query path
Every query follows this exact sequence:
- Hash the query. The full
QueryRequest(vector, k, nprobes, text) is serialised with bincode and hashed with xxh3 to produce a deterministicQueryHash. - Build the cache key. The key is a tuple of
(namespace, generation, query_hash). The generation counter ensures stale results are never returned after a write. - Check foyer. The HybridCache checks RAM first, then NVMe. On a hit, the serialised result is returned and a
cache_hits_totalmetric is recorded. - On miss: query S3. LanceDB runs the query against the Lance table on S3 (vector nearest-neighbour, BM25 FTS, or hybrid). The result is serialised with bincode and stored in foyer. A
cache_misses_totalands3_requests_totalmetric are recorded. - Return the result. The query duration (cache hit or miss) is recorded in the
query_duration_secondshistogram.
Write path
Writes follow a strict ordering to prevent stale cache state:
- Append to S3. Rows are appended to the namespace's Lance table via LanceDB's native append API.
- Invalidate the cache. Only after the S3 write succeeds, the namespace's generation counter is atomically incremented. This makes all previously cached entries for that namespace unreachable by key.
- Record metrics.
write_duration_secondsands3_requests_total{operation=upsert}are updated.
The "invalidate after confirmed write" ordering is critical: if the S3 write fails, the cache is left untouched with valid data. There is no window where the cache is empty and queries would storm S3.
Cache invalidation
Firn uses a generation counter strategy for cache invalidation. This was chosen after evaluating several alternatives.
How it works
A DashMap<NamespaceId, AtomicU64> tracks a generation number for each namespace. The foyer cache key includes the generation:
CacheKey {
namespace: NamespaceId,
generation: u64,
query: QueryHash,
}
On any write to a namespace, the generation is atomically incremented. All previously cached entries (at the old generation) are now unreachable by key construction. foyer's normal LFU/LRU eviction reclaims their space over time.
Why this approach
| Property | Value |
|---|---|
| Invalidation cost | O(1) per write, regardless of cached query count |
| Auxiliary memory | One u64 per namespace (not per cache entry) |
| Race conditions | None. The generation is captured at query start, so a concurrent write cannot cause a stale result to be cached at the new generation. |
| Trade-off | Stale-generation entries remain in the cache until evicted by foyer. Under write-heavy workloads, the NVMe tier may hold unreachable entries temporarily. |
Namespace isolation
Each namespace is a fully isolated unit:
- S3 prefix:
s3://bucket/namespace/- no data sharing between namespaces - Vector dimension: inferred from the first upsert; subsequent upserts to the same namespace must match
- Cache: one shared foyer instance, but keys are scoped by namespace
- Lifecycle: lazy creation (no S3 objects until the first write), full cleanup on delete
Cross-namespace queries are not supported and return a 400 error.
Per-namespace vector dimensions
There is no global vector dimension setting. Each namespace independently determines its dimension:
- On the first upsert, the dimension is inferred from the first row's vector length
- On subsequent upserts, every row is validated against the resolved dimension
- On re-open (after restart), the dimension is read from the existing Lance table schema
This means a single Firn instance can serve namespaces at different dimensions simultaneously, for example 384-dim sentence embeddings alongside 1536-dim OpenAI embeddings.
ANN index
Firn supports explicit IVF_PQ (Inverted File with Product Quantisation) index builds via the /ns/{ns}/index endpoint. The index is optional; without it, queries perform a linear scan.
When to build an index
- Once the namespace has enough rows that cold query latency matters (usually >10k vectors)
- After the bulk of your data is loaded (building on a partial dataset then upserting more data degrades index quality until rebuilt)
Impact on latency
On AWS S3 with 100k vectors at dim=1536:
| Without index | With IVF_PQ | Speedup | |
|---|---|---|---|
| Cold query (p50) | 25.14 s | 979 ms | 25.7x |
| Warm query (p50) | 66 µs | 72 µs | (cache dominates) |
The index matters most for cold queries. Once a result is cached, the index makes no difference.
The nprobes parameter
When an IVF_PQ index exists, the nprobes query parameter controls how many IVF partitions are searched. Higher values improve recall but increase latency. The default is 20.
Full-text search
Each namespace's schema includes a nullable text column. When text data is present and an FTS index has been built (via /ns/{ns}/fts-index), three query modes are available:
- Vector-only: provide
vector, omittext - FTS-only: provide
text, omitvector - Hybrid: provide both. LanceDB automatically fuses results via Reciprocal Rank Fusion (RRF).
Compaction
Each upsert creates a new Lance data fragment on S3. After many small upserts, the namespace accumulates many small files, which increases cold query latency (more S3 GET requests per scan). Compaction merges these fragments into fewer, larger files.
- Triggered explicitly via
POST /ns/{ns}/compact - Runs in the background (returns 202)
- Invalidates the cache after completion (file offsets change)
- Target: 1 million rows per fragment
Concurrency
Firn relies on LanceDB's native concurrency model, which uses S3 conditional writes (If-None-Match: *) to prevent conflicts between concurrent writers. This has been stress-tested with multiple simultaneous writers on both MinIO and AWS S3, with 100 runs each showing zero row count discrepancies.
Serialisation
Cached result sets are serialised with bincode 2 (serde path). This was benchmarked against realistic payloads:
| Result set size | Round-trip p99 |
|---|---|
| 10 results (1536-dim) | 32 µs |
| 100 results (1536-dim) | 318 µs |
| 1000 results (1536-dim) | 3 ms |
The architecture includes a documented upgrade path to rkyv (zero-copy deserialisation) if serialisation overhead becomes a bottleneck at scale.