Monitoring

Firn exposes Prometheus metrics at GET /metrics that give full visibility into cache effectiveness and S3 cost savings.

Scrape configuration

Add Firn to your Prometheus scrape targets:

# prometheus.yml
scrape_configs:
  - job_name: firn
    scrape_interval: 15s
    static_configs:
      - targets: ['firn:3000']

The endpoint returns metrics in Prometheus text exposition format (text/plain; version=0.0.4).

Metric reference

Cache metrics

MetricTypeLabelsDescription
firnflow_cache_hits_total Counter namespace Total cache hits. Each hit means the result was served from RAM or NVMe with zero S3 access.
firnflow_cache_misses_total Counter namespace Total cache misses. Each miss triggered a query to S3 via LanceDB and populated the cache.

Latency metrics

MetricTypeLabelsDescription
firnflow_query_duration_seconds Histogram namespace, query_type End-to-end query latency through the cache-aside path, including serialisation. The query_type label is vector, fts, or hybrid.
firnflow_write_duration_seconds Histogram namespace Upsert or delete latency, including cache invalidation time.
firnflow_index_build_duration_seconds Histogram namespace, kind Time to build a vector or FTS index. Buckets go up to 600 seconds. The kind label is ivf_pq or fts.
firnflow_compaction_duration_seconds Histogram namespace Time to compact data files. Buckets go up to 600 seconds.

Cost metrics

MetricTypeLabelsDescription
firnflow_s3_requests_total Counter namespace, operation Number of Firn-initiated operations that hit S3. Operations: query (cache miss), upsert, delete. This is the primary signal for whether the cache is saving you S3 costs.
firnflow_active_namespaces Gauge none Number of distinct namespaces that have been accessed since startup.
The key metric
firnflow_s3_requests_total is the metric that proves the cache is working. Compare s3_requests_total{operation="query"} against cache_misses_total; they should be equal. The difference between total queries and cache misses is how many S3 requests the cache has eliminated.

PromQL examples

Cache hit rate (per namespace)

The fraction of queries served from cache without touching S3:

firnflow_cache_hits_total{namespace="production"}
/
(firnflow_cache_hits_total{namespace="production"} + firnflow_cache_misses_total{namespace="production"})

Cache hit rate (global, over 5 minutes)

sum(rate(firnflow_cache_hits_total[5m]))
/
(sum(rate(firnflow_cache_hits_total[5m])) + sum(rate(firnflow_cache_misses_total[5m])))

Query latency p50 / p99

# p50
histogram_quantile(0.50, rate(firnflow_query_duration_seconds_bucket[5m]))

# p99
histogram_quantile(0.99, rate(firnflow_query_duration_seconds_bucket[5m]))

S3 request rate (per namespace)

rate(firnflow_s3_requests_total{namespace="production"}[5m])

S3 requests saved (total avoided queries)

sum(firnflow_cache_hits_total)

Each cache hit is one S3 request that did not happen.

Write throughput

rate(firnflow_s3_requests_total{operation="upsert"}[5m])

Alerting rules

Suggested Prometheus alerting rules for production deployments:

# alerts.yml
groups:
  - name: firn
    rules:

      # Cache hit rate dropping below 80% over 15 minutes
      - alert: FirnLowCacheHitRate
        expr: |
          sum(rate(firnflow_cache_hits_total[15m]))
          /
          (sum(rate(firnflow_cache_hits_total[15m]))
           + sum(rate(firnflow_cache_misses_total[15m])))
          < 0.80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Firn cache hit rate is below 80%"
          description: >
            The cache hit rate has been below 80% for 10 minutes.
            This may indicate the working set exceeds cache capacity
            or a write-heavy workload is causing frequent invalidation.

      # Query latency p99 above 1 second (cold queries are slow)
      - alert: FirnHighQueryLatency
        expr: |
          histogram_quantile(0.99,
            rate(firnflow_query_duration_seconds_bucket[5m])
          ) > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Firn query latency p99 above 1 second"
          description: >
            High query latency suggests frequent cache misses
            hitting S3. Consider increasing cache size, building an
            index, or warming the cache.

      # S3 request rate spike (unexpected backend load)
      - alert: FirnHighS3RequestRate
        expr: |
          sum(rate(firnflow_s3_requests_total{operation="query"}[5m])) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Firn S3 query request rate above 10/s"
          description: >
            The cache is not absorbing enough queries. This increases
            S3 costs and latency. Check if the working set has
            changed or if write-heavy invalidation is the cause.

Grafana dashboard

A minimal Grafana dashboard for Firn should include these panels:

PanelTypePromQL
Cache hit rate Gauge sum(rate(cache_hits[5m])) / (sum(rate(cache_hits[5m])) + sum(rate(cache_misses[5m])))
Query latency (p50, p99) Time series histogram_quantile(0.50, rate(query_duration_seconds_bucket[5m]))
S3 requests/sec by operation Time series rate(s3_requests_total[5m]) grouped by operation
S3 requests saved (counter) Stat sum(cache_hits_total)
Active namespaces Stat firnflow_active_namespaces
Write latency (p50, p99) Time series histogram_quantile(0.50, rate(write_duration_seconds_bucket[5m]))
Cache hits vs misses Time series (stacked) rate(cache_hits_total[5m]) and rate(cache_misses_total[5m])

Interpreting the metrics

Healthy signals

Warning signals