Monitoring
Firn exposes Prometheus metrics at GET /metrics that give full visibility into cache effectiveness and S3 cost savings.
Scrape configuration
Add Firn to your Prometheus scrape targets:
# prometheus.yml
scrape_configs:
- job_name: firn
scrape_interval: 15s
static_configs:
- targets: ['firn:3000']
The endpoint returns metrics in Prometheus text exposition format (text/plain; version=0.0.4).
Metric reference
Cache metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
firnflow_cache_hits_total |
Counter | namespace |
Total cache hits. Each hit means the result was served from RAM or NVMe with zero S3 access. |
firnflow_cache_misses_total |
Counter | namespace |
Total cache misses. Each miss triggered a query to S3 via LanceDB and populated the cache. |
Latency metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
firnflow_query_duration_seconds |
Histogram | namespace, query_type |
End-to-end query latency through the cache-aside path, including serialisation. The query_type label is vector, fts, or hybrid. |
firnflow_write_duration_seconds |
Histogram | namespace |
Upsert or delete latency, including cache invalidation time. |
firnflow_index_build_duration_seconds |
Histogram | namespace, kind |
Time to build a vector or FTS index. Buckets go up to 600 seconds. The kind label is ivf_pq or fts. |
firnflow_compaction_duration_seconds |
Histogram | namespace |
Time to compact data files. Buckets go up to 600 seconds. |
Cost metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
firnflow_s3_requests_total |
Counter | namespace, operation |
Number of Firn-initiated operations that hit S3. Operations: query (cache miss), upsert, delete. This is the primary signal for whether the cache is saving you S3 costs. |
firnflow_active_namespaces |
Gauge | none | Number of distinct namespaces that have been accessed since startup. |
The key metric
firnflow_s3_requests_total is the metric that proves the cache is working. Compare s3_requests_total{operation="query"} against cache_misses_total; they should be equal. The difference between total queries and cache misses is how many S3 requests the cache has eliminated.
PromQL examples
Cache hit rate (per namespace)
The fraction of queries served from cache without touching S3:
firnflow_cache_hits_total{namespace="production"}
/
(firnflow_cache_hits_total{namespace="production"} + firnflow_cache_misses_total{namespace="production"})
Cache hit rate (global, over 5 minutes)
sum(rate(firnflow_cache_hits_total[5m]))
/
(sum(rate(firnflow_cache_hits_total[5m])) + sum(rate(firnflow_cache_misses_total[5m])))
Query latency p50 / p99
# p50
histogram_quantile(0.50, rate(firnflow_query_duration_seconds_bucket[5m]))
# p99
histogram_quantile(0.99, rate(firnflow_query_duration_seconds_bucket[5m]))
S3 request rate (per namespace)
rate(firnflow_s3_requests_total{namespace="production"}[5m])
S3 requests saved (total avoided queries)
sum(firnflow_cache_hits_total)
Each cache hit is one S3 request that did not happen.
Write throughput
rate(firnflow_s3_requests_total{operation="upsert"}[5m])
Alerting rules
Suggested Prometheus alerting rules for production deployments:
# alerts.yml
groups:
- name: firn
rules:
# Cache hit rate dropping below 80% over 15 minutes
- alert: FirnLowCacheHitRate
expr: |
sum(rate(firnflow_cache_hits_total[15m]))
/
(sum(rate(firnflow_cache_hits_total[15m]))
+ sum(rate(firnflow_cache_misses_total[15m])))
< 0.80
for: 10m
labels:
severity: warning
annotations:
summary: "Firn cache hit rate is below 80%"
description: >
The cache hit rate has been below 80% for 10 minutes.
This may indicate the working set exceeds cache capacity
or a write-heavy workload is causing frequent invalidation.
# Query latency p99 above 1 second (cold queries are slow)
- alert: FirnHighQueryLatency
expr: |
histogram_quantile(0.99,
rate(firnflow_query_duration_seconds_bucket[5m])
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "Firn query latency p99 above 1 second"
description: >
High query latency suggests frequent cache misses
hitting S3. Consider increasing cache size, building an
index, or warming the cache.
# S3 request rate spike (unexpected backend load)
- alert: FirnHighS3RequestRate
expr: |
sum(rate(firnflow_s3_requests_total{operation="query"}[5m])) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "Firn S3 query request rate above 10/s"
description: >
The cache is not absorbing enough queries. This increases
S3 costs and latency. Check if the working set has
changed or if write-heavy invalidation is the cause.
Grafana dashboard
A minimal Grafana dashboard for Firn should include these panels:
| Panel | Type | PromQL |
|---|---|---|
| Cache hit rate | Gauge | sum(rate(cache_hits[5m])) / (sum(rate(cache_hits[5m])) + sum(rate(cache_misses[5m]))) |
| Query latency (p50, p99) | Time series | histogram_quantile(0.50, rate(query_duration_seconds_bucket[5m])) |
| S3 requests/sec by operation | Time series | rate(s3_requests_total[5m]) grouped by operation |
| S3 requests saved (counter) | Stat | sum(cache_hits_total) |
| Active namespaces | Stat | firnflow_active_namespaces |
| Write latency (p50, p99) | Time series | histogram_quantile(0.50, rate(write_duration_seconds_bucket[5m])) |
| Cache hits vs misses | Time series (stacked) | rate(cache_hits_total[5m]) and rate(cache_misses_total[5m]) |
Interpreting the metrics
Healthy signals
- Cache hit rate above 80% for read-heavy workloads
s3_requests_total{operation=query}rate is low and stable- Query latency p99 under 10ms (warm queries dominate)
Warning signals
- Falling cache hit rate: the working set may exceed cache capacity. Increase
FIRNFLOW_CACHE_MEMORY_BYTESorFIRNFLOW_CACHE_NVME_BYTES. - High
s3_requests_totalrate: too many cache misses are hitting S3. This costs money and adds latency. Consider cache warmup, larger cache, or building an index. - Rising query latency: if cold queries dominate, build an IVF_PQ index. If warm queries are slow, check for serialisation overhead with large result sets.
- Write duration spikes: may indicate S3 throttling or contention. Check S3 request metrics and consider compaction.