Vector search on object storage

A multi-tenant search engine that pairs LanceDB with a tiered RAM + NVMe cache. Every namespace lives on cheap object storage (AWS S3, MinIO, R2, Tigris, DigitalOcean Spaces, or native Google Cloud Storage) with near-zero idle cost. A cache hit never re-runs the search, and once a namespace's handle is warm it costs zero backend requests.

Get started View on GitHub

72 µs

Warm query latency (repeated query)

25x

ANN index speedup on cold queries

Object-storage requests per cache hit

Search modes: vector, FTS, hybrid

How it works

Firn stores every namespace under its own object-storage prefix using LanceDB as the storage engine. A tiered cache powered by foyer sits in front of the backend, serving repeated queries from RAM or NVMe in microseconds instead of milliseconds. Single-vector queries can also opt into semantic caching, where a near-duplicate query may reuse a previous top-k result when its vector clears your cosine-similarity threshold. Writes automatically invalidate both cache layers for the affected namespace, because the cache key carries the Lance table version and every committed write advances it.

The result: your data lives cheaply on object storage, but hot and near-duplicate queries can feel local. The /metrics endpoint shows exactly how many backend requests the exact and semantic cache layers are saving you.

New in 0.6.0: native Google Cloud Storage support. Set FIRNFLOW_STORAGE_URI=gs://your-bucket and Firn routes through lancedb's native GCS backend, using the GCS XML API's generation precondition (x-goog-if-generation-match: 0) instead of If-None-Match: *. Validated by the same 100-run Lance concurrent-writer stress that gates every other backend. Switching between AWS S3, MinIO, Cloudflare R2, Tigris, DigitalOcean Spaces, and native GCS is a single env-var change — see the backend configuration recipes.

Production hardening, new in 0.5.0: optional bearer-token authentication with a read/write and admin scope split, optional tower-governor rate limiting keyed on the validated principal, and an opt-in /metrics token. See configuration and deployment.

Documentation

Quickstart

Docker Compose up, first upsert, first query, and check your metrics.

API Reference

All nine endpoints with request and response schemas, status codes, and curl examples.

Configuration

Environment variables for object-storage backends (S3-family and native GCS), cache sizing, bind address, and logging.

Architecture

Tiered storage, cache invalidation, namespace isolation, and the query and write paths.

Deployment

Run locally with Docker Compose, deploy to production with the multi-stage Dockerfile, against AWS S3, MinIO, Cloudflare R2, Tigris, DigitalOcean Spaces, or native Google Cloud Storage.

Monitoring

Prometheus metrics, PromQL examples, alerting rules, and Grafana dashboard guidance.

Performance on real AWS S3

Benchmarked with 100,000 vectors at 1536 dimensions (OpenAI embedding size) against eu-west-1 S3.

Phase	Path	p50 latency
Linear scan	Cold (S3)	25.14 s
Linear scan	Warm (cache)	66 µs
IVF_PQ indexed	Cold (S3)	979 ms
IVF_PQ indexed	Warm (cache)	72 µs

Without an index, each cache miss costs 25 seconds. With IVF_PQ, that drops to under 1 second. For repeated queries the cache serves the stored result without re-running the search, bringing latency to microseconds regardless of index type.

Key technologies

LanceDB - vector and BM25 search engine that runs natively on object storage.
foyer - hybrid cache (RAM + NVMe) with LFU/LRU eviction policies.
axum - async Rust HTTP framework.
Prometheus - native metrics for cache hits, misses, and object-storage cost savings.