Prometheus

Metrics collection, PromQL, exporters, and alerting

Prometheus

Prometheus is a time-series database and monitoring system. It pulls metrics from targets at regular intervals, stores them, and supports powerful querying with PromQL.

Prometheus Architecture

Metric Types & PromQL

  • Counter — only increases (requests_total, errors_total)
  • Gauge — can go up or down (temperature, active_connections)
  • Histogram — distribution of values in buckets (request_duration_seconds)
  • Summary — similar to histogram with quantiles
promql
# Request rate (per second) over 5 minutes
rate(http_requests_total[5m])

# Error rate percentage
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# CPU usage per pod
rate(container_cpu_usage_seconds_total{pod=~"myapp.*"}[5m])

# Memory usage in MB
container_memory_usage_bytes{pod=~"myapp.*"} / 1024 / 1024

Exporters

  • Node Exporter — system metrics (CPU, memory, disk, network)
  • cAdvisor — container metrics (built into kubelet)
  • Blackbox Exporter — probe endpoints (HTTP, DNS, TCP)
  • App-level — instrument code with client libraries (prom-client for Node.js)

💬 Counter vs Gauge — when to use which?

Counter: for things that only go up — requests served, errors occurred, bytes sent. Always use rate() to get useful per-second values. Gauge: for things that fluctuate — temperature, queue depth, active users. Can be used directly.