Service Mesh Monitoring with Istio and Prometheus: Complete Observability
We had 30 microservices. Monitoring required instrumenting each service. Adding metrics meant code changes, testing, deployment. It took weeks.
I deployed Istio service mesh. Got traffic metrics, tracing, and service graph for all services instantly. Zero code changes. Complete observability in one day.
Table of Contents
The Problem
Traditional monitoring:
- Instrument each service manually
- Different libraries for different languages
- Inconsistent metrics
- No automatic service dependencies
- Code changes for every new metric
We needed better.
Istio Overview
Service mesh provides:
- Traffic management: Routing, load balancing
- Security: mTLS, authorization
- Observability: Metrics, logs, traces
All without changing application code!
Installing Istio
Download:
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.3.0 sh -
cd istio-1.3.0
export PATH=$PWD/bin:$PATH
Install:
istioctl manifest apply --set profile=demo
Enable sidecar injection:
kubectl label namespace default istio-injection=enabled
Automatic Metrics
Deploy app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: web-app:latest
ports:
- containerPort: 8080
Istio automatically injects sidecar and collects metrics!
Available Metrics
Istio provides:
Request metrics:
istio_requests_total
istio_request_duration_milliseconds
istio_request_bytes
istio_response_bytes
TCP metrics:
istio_tcp_sent_bytes_total
istio_tcp_received_bytes_total
istio_tcp_connections_opened_total
istio_tcp_connections_closed_total
Prometheus Queries
Request rate:
rate(istio_requests_total{destination_service="web-app.default.svc.cluster.local"}[5m])
Error rate:
rate(istio_requests_total{destination_service="web-app.default.svc.cluster.local",response_code=~"5.."}[5m])
Latency p95:
histogram_quantile(0.95,
rate(istio_request_duration_milliseconds_bucket{destination_service="web-app.default.svc.cluster.local"}[5m])
)
Success rate:
sum(rate(istio_requests_total{destination_service="web-app.default.svc.cluster.local",response_code!~"5.."}[5m]))
/
sum(rate(istio_requests_total{destination_service="web-app.default.svc.cluster.local"}[5m]))
Service Dependencies
Automatic service graph:
# Incoming traffic
sum(rate(istio_requests_total{destination_service="web-app.default.svc.cluster.local"}[5m])) by (source_app)
# Outgoing traffic
sum(rate(istio_requests_total{source_app="web-app"}[5m])) by (destination_service)
Kiali Dashboard
Visualize service mesh:
kubectl apply -f samples/addons/kiali.yaml
kubectl port-forward svc/kiali 20001:20001 -n istio-system
Access: http://localhost:20001
Shows:
- Service topology
- Traffic flow
- Health status
- Request rates
- Error rates
Grafana Dashboards
Istio includes pre-built dashboards:
kubectl apply -f samples/addons/grafana.yaml
kubectl port-forward svc/grafana 3000:3000 -n istio-system
Dashboards:
- Istio Mesh Dashboard
- Istio Service Dashboard
- Istio Workload Dashboard
- Istio Performance Dashboard
Distributed Tracing
Istio integrates with Jaeger:
kubectl apply -f samples/addons/jaeger.yaml
kubectl port-forward svc/jaeger-query 16686:16686 -n istio-system
Automatic trace propagation across services!
Custom Metrics
Add custom metrics without code changes:
apiVersion: config.istio.io/v1alpha2
kind: metric
metadata:
name: doublerequestcount
spec:
value: "2"
dimensions:
source: source.workload.name | "unknown"
destination: destination.workload.name | "unknown"
monitored_resource_type: '"UNSPECIFIED"'
---
apiVersion: config.istio.io/v1alpha2
kind: prometheus
metadata:
name: doublehandler
spec:
metrics:
- name: doublerequestcount
instance_name: doublerequestcount.metric.default
kind: COUNTER
label_names:
- source
- destination
Traffic Splitting Metrics
Monitor canary deployments:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: web-app
spec:
hosts:
- web-app
http:
- match:
- headers:
canary:
exact: "true"
route:
- destination:
host: web-app
subset: v2
- route:
- destination:
host: web-app
subset: v1
weight: 90
- destination:
host: web-app
subset: v2
weight: 10
Query canary metrics:
# v1 traffic
rate(istio_requests_total{destination_version="v1"}[5m])
# v2 traffic
rate(istio_requests_total{destination_version="v2"}[5m])
# v2 error rate
rate(istio_requests_total{destination_version="v2",response_code=~"5.."}[5m])
/
rate(istio_requests_total{destination_version="v2"}[5m])
Alerting Rules
Prometheus alerts for Istio:
groups:
- name: istio
rules:
- alert: HighErrorRate
expr: |
sum(rate(istio_requests_total{response_code=~"5.."}[5m])) by (destination_service)
/
sum(rate(istio_requests_total[5m])) by (destination_service)
> 0.05
for: 5m
annotations:
summary: "High error rate for {{ $labels.destination_service }}"
- alert: HighLatency
expr: |
histogram_quantile(0.95,
rate(istio_request_duration_milliseconds_bucket[5m])
) > 1000
for: 5m
annotations:
summary: "High latency for {{ $labels.destination_service }}"
mTLS Monitoring
Monitor mutual TLS:
# mTLS connections
sum(rate(istio_requests_total{connection_security_policy="mutual_tls"}[5m]))
# Non-mTLS connections
sum(rate(istio_requests_total{connection_security_policy!="mutual_tls"}[5m]))
Resource Usage
Monitor Envoy sidecar resources:
# CPU usage
rate(container_cpu_usage_seconds_total{container="istio-proxy"}[5m])
# Memory usage
container_memory_working_set_bytes{container="istio-proxy"}
# Network I/O
rate(container_network_transmit_bytes_total{pod=~".*istio-proxy.*"}[5m])
Performance Impact
Measure Istio overhead:
# Latency added by sidecar
histogram_quantile(0.95,
rate(istio_request_duration_milliseconds_bucket[5m])
)
-
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket[5m])
) * 1000
Typical overhead: 1-3ms
Troubleshooting
High latency:
# Check Envoy stats
kubectl exec -it pod-name -c istio-proxy -- curl localhost:15000/stats
# Check config
istioctl proxy-config cluster pod-name
Missing metrics:
# Verify sidecar injection
kubectl get pod pod-name -o jsonpath='{.spec.containers[*].name}'
# Check Prometheus targets
kubectl port-forward svc/prometheus 9090:9090 -n istio-system
# Visit http://localhost:9090/targets
Real-World Dashboard
Our production Grafana dashboard:
{
"panels": [
{
"title": "Request Rate",
"targets": [{
"expr": "sum(rate(istio_requests_total[5m])) by (destination_service)"
}]
},
{
"title": "Error Rate",
"targets": [{
"expr": "sum(rate(istio_requests_total{response_code=~\"5..\"}[5m])) by (destination_service)"
}]
},
{
"title": "P95 Latency",
"targets": [{
"expr": "histogram_quantile(0.95, rate(istio_request_duration_milliseconds_bucket[5m]))"
}]
},
{
"title": "Service Dependencies",
"type": "graph",
"targets": [{
"expr": "sum(rate(istio_requests_total[5m])) by (source_app, destination_service)"
}]
}
]
}
Results
Before:
- Manual instrumentation
- Weeks to add metrics
- Inconsistent across services
- No service dependencies
After:
- Automatic metrics
- Zero code changes
- Consistent metrics
- Complete service graph
- Deployed in 1 day
Lessons Learned
- Start with Istio early - Easier than retrofitting
- Monitor sidecar overhead - Usually minimal
- Use Kiali - Visual service mesh understanding
- Leverage built-in dashboards - Don’t reinvent
- Combine with app metrics - Istio + custom metrics
Conclusion
Istio service mesh provides complete observability without code changes. Essential for microservices at scale.
Key takeaways:
- Automatic traffic metrics
- Distributed tracing
- Service dependency graph
- Zero application changes
- Consistent observability
Deploy Istio. Get instant observability for all your services.