Architecture

How Firn turns cheap object storage into a fast, multi-tenant search engine.

Tiered storage

Firn's core idea is a three-tier storage hierarchy. Data lives permanently on S3, but queries are served from the fastest available tier:

TierBacked byLatencyRole
L1: RAMfoyer in-memory cachesub-microsecondHottest queries, configurable size
L2: NVMefoyer disk cachemicrosecondsOverflow from RAM tier, larger capacity
L3: Object storageLanceDB on S3/MinIO/R2milliseconds to secondsSource of truth, unlimited capacity, near-zero idle cost

The cache stores complete, serialised query result sets. When a query hits the cache, zero bytes are read from S3. When it misses, the result is fetched from S3 and stored in the cache for future use.

Query path

Every query follows this exact sequence:

  1. Hash the query. The full QueryRequest (vector, k, nprobes, text) is serialised with bincode and hashed with xxh3 to produce a deterministic QueryHash.
  2. Build the cache key. The key is a tuple of (namespace, generation, query_hash). The generation counter ensures stale results are never returned after a write.
  3. Check foyer. The HybridCache checks RAM first, then NVMe. On a hit, the serialised result is returned and a cache_hits_total metric is recorded.
  4. On miss: query S3. LanceDB runs the query against the Lance table on S3 (vector nearest-neighbour, BM25 FTS, or hybrid). The result is serialised with bincode and stored in foyer. A cache_misses_total and s3_requests_total metric are recorded.
  5. Return the result. The query duration (cache hit or miss) is recorded in the query_duration_seconds histogram.

Write path

Writes follow a strict ordering to prevent stale cache state:

  1. Append to S3. Rows are appended to the namespace's Lance table via LanceDB's native append API.
  2. Invalidate the cache. Only after the S3 write succeeds, the namespace's generation counter is atomically incremented. This makes all previously cached entries for that namespace unreachable by key.
  3. Record metrics. write_duration_seconds and s3_requests_total{operation=upsert} are updated.

The "invalidate after confirmed write" ordering is critical: if the S3 write fails, the cache is left untouched with valid data. There is no window where the cache is empty and queries would storm S3.

Cache invalidation

Firn uses a generation counter strategy for cache invalidation. This was chosen after evaluating several alternatives.

How it works

A DashMap<NamespaceId, AtomicU64> tracks a generation number for each namespace. The foyer cache key includes the generation:

CacheKey {
    namespace: NamespaceId,
    generation: u64,
    query: QueryHash,
}

On any write to a namespace, the generation is atomically incremented. All previously cached entries (at the old generation) are now unreachable by key construction. foyer's normal LFU/LRU eviction reclaims their space over time.

Why this approach

PropertyValue
Invalidation costO(1) per write, regardless of cached query count
Auxiliary memoryOne u64 per namespace (not per cache entry)
Race conditionsNone. The generation is captured at query start, so a concurrent write cannot cause a stale result to be cached at the new generation.
Trade-offStale-generation entries remain in the cache until evicted by foyer. Under write-heavy workloads, the NVMe tier may hold unreachable entries temporarily.

Namespace isolation

Each namespace is a fully isolated unit:

Cross-namespace queries are not supported and return a 400 error.

Per-namespace vector dimensions

There is no global vector dimension setting. Each namespace independently determines its dimension:

This means a single Firn instance can serve namespaces at different dimensions simultaneously, for example 384-dim sentence embeddings alongside 1536-dim OpenAI embeddings.

ANN index

Firn supports explicit IVF_PQ (Inverted File with Product Quantisation) index builds via the /ns/{ns}/index endpoint. The index is optional; without it, queries perform a linear scan.

When to build an index

Impact on latency

On AWS S3 with 100k vectors at dim=1536:

Without indexWith IVF_PQSpeedup
Cold query (p50)25.14 s979 ms25.7x
Warm query (p50)66 µs72 µs(cache dominates)

The index matters most for cold queries. Once a result is cached, the index makes no difference.

The nprobes parameter

When an IVF_PQ index exists, the nprobes query parameter controls how many IVF partitions are searched. Higher values improve recall but increase latency. The default is 20.

Full-text search

Each namespace's schema includes a nullable text column. When text data is present and an FTS index has been built (via /ns/{ns}/fts-index), three query modes are available:

Compaction

Each upsert creates a new Lance data fragment on S3. After many small upserts, the namespace accumulates many small files, which increases cold query latency (more S3 GET requests per scan). Compaction merges these fragments into fewer, larger files.

Concurrency

Firn relies on LanceDB's native concurrency model, which uses S3 conditional writes (If-None-Match: *) to prevent conflicts between concurrent writers. This has been stress-tested with multiple simultaneous writers on both MinIO and AWS S3, with 100 runs each showing zero row count discrepancies.

Single-node design
Firn currently operates as a single-node service. The cache is in-process (not distributed). Horizontal scaling would require a shared cache layer or request routing, which is not yet implemented.

Serialisation

Cached result sets are serialised with bincode 2 (serde path). This was benchmarked against realistic payloads:

Result set sizeRound-trip p99
10 results (1536-dim)32 µs
100 results (1536-dim)318 µs
1000 results (1536-dim)3 ms

The architecture includes a documented upgrade path to rkyv (zero-copy deserialisation) if serialisation overhead becomes a bottleneck at scale.