We set up Prometheus for monitoring our microservices. Here’s how.

Why Prometheus?

  • Pull-based metrics collection
  • Powerful query language (PromQL)
  • Built-in alerting
  • Great for dynamic environments (Kubernetes)
  • Open source

Architecture

Microservices → Prometheus → Grafana

                Alertmanager

Installation

# Prometheus
docker run -d \
  -p 9090:9090 \
  -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

# Grafana
docker run -d \
  -p 3000:3000 \
  grafana/grafana

Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'api'
    static_configs:
      - targets: ['api:8080']
  
  - job_name: 'worker'
    static_configs:
      - targets: ['worker:8080']

Instrumenting Applications

Go

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequests = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )
)

func init() {
    prometheus.MustRegister(httpRequests)
}

func handler(w http.ResponseWriter, r *http.Request) {
    httpRequests.WithLabelValues(r.Method, r.URL.Path, "200").Inc()
    // Handle request
}

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Java

import io.prometheus.client.Counter;
import io.prometheus.client.exporter.HTTPServer;

public class MyApp {
    static final Counter requests = Counter.build()
        .name("http_requests_total")
        .help("Total HTTP requests")
        .labelNames("method", "endpoint", "status")
        .register();
    
    public static void main(String[] args) throws Exception {
        HTTPServer server = new HTTPServer(8080);
        
        // Your app logic
        requests.labels("GET", "/api/users", "200").inc();
    }
}

Queries

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# 95th percentile latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Memory usage
process_resident_memory_bytes

Grafana Dashboards

Created dashboards for:

  • Request rate
  • Error rate
  • Latency (p50, p95, p99)
  • CPU/Memory usage
  • Database connections

Alerting

# alert.rules
groups:
  - name: api
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        annotations:
          summary: "High error rate on {{ $labels.instance }}"

The Results

  • Full visibility into all services
  • Alerts before users notice issues
  • Easy troubleshooting with metrics

Prometheus is now essential to our operations.

Questions? Ask away!