Vector search on object storage
A multi-tenant search engine that pairs LanceDB with a tiered RAM + NVMe cache. Every namespace lives on S3 with near-zero idle cost. Every cache hit costs zero S3 requests.
How it works
Firn stores every namespace under its own S3 prefix using LanceDB as the storage engine. A tiered cache powered by foyer sits in front of S3, serving repeated queries from RAM or NVMe in microseconds instead of milliseconds. Writes automatically invalidate the cache for the affected namespace using an O(1) generation counter strategy.
The result: your data lives cheaply on S3, but hot queries feel local. The /metrics endpoint shows exactly how many S3 requests the cache is saving you.
Documentation
Quickstart
Docker Compose up, first upsert, first query, and check your metrics in under two minutes.
API Reference
All nine endpoints with request and response schemas, status codes, and curl examples.
Configuration
Environment variables for S3 backends, cache sizing, bind address, and logging.
Architecture
Tiered storage, cache invalidation, namespace isolation, and the query and write paths.
Deployment
Run locally with Docker Compose, deploy to production with the multi-stage Dockerfile, or connect to AWS, MinIO, or R2.
Monitoring
Prometheus metrics, PromQL examples, alerting rules, and Grafana dashboard guidance.
Performance on real AWS S3
Benchmarked with 100,000 vectors at 1536 dimensions (OpenAI embedding size) against eu-west-1 S3.
| Phase | Path | p50 latency |
|---|---|---|
| Linear scan | Cold (S3) | 25.14 s |
| Linear scan | Warm (cache) | 66 µs |
| IVF_PQ indexed | Cold (S3) | 979 ms |
| IVF_PQ indexed | Warm (cache) | 72 µs |
Without an index, each cache miss costs 25 seconds. With IVF_PQ, that drops to under 1 second. The cache eliminates S3 entirely for repeated queries, bringing latency to microseconds regardless of index type.
Key technologies
- LanceDB - vector and BM25 search engine that runs natively on object storage.
- foyer - hybrid cache (RAM + NVMe) with LFU/LRU eviction policies.
- axum - async Rust HTTP framework.
- Prometheus - native metrics for cache hits, misses, and S3 cost savings.