Vector search on object storage

A multi-tenant search engine that pairs LanceDB with a tiered RAM + NVMe cache. Every namespace lives on S3 with near-zero idle cost. Every cache hit costs zero S3 requests.

72 µs
Warm query latency (AWS S3)
25x
ANN index speedup on cold queries
0
S3 requests per cache hit
3
Search modes: vector, FTS, hybrid

How it works

Firn stores every namespace under its own S3 prefix using LanceDB as the storage engine. A tiered cache powered by foyer sits in front of S3, serving repeated queries from RAM or NVMe in microseconds instead of milliseconds. Writes automatically invalidate the cache for the affected namespace using an O(1) generation counter strategy.

The result: your data lives cheaply on S3, but hot queries feel local. The /metrics endpoint shows exactly how many S3 requests the cache is saving you.

Documentation

Performance on real AWS S3

Benchmarked with 100,000 vectors at 1536 dimensions (OpenAI embedding size) against eu-west-1 S3.

PhasePathp50 latency
Linear scanCold (S3)25.14 s
Linear scanWarm (cache)66 µs
IVF_PQ indexedCold (S3)979 ms
IVF_PQ indexedWarm (cache)72 µs

Without an index, each cache miss costs 25 seconds. With IVF_PQ, that drops to under 1 second. The cache eliminates S3 entirely for repeated queries, bringing latency to microseconds regardless of index type.

Key technologies