MonitoringPrometheusGrafanaKubernetes

Monitoring Kubernetes with Prometheus and Grafana

Set up a production-ready observability stack for Kubernetes using Prometheus for metrics collection and Grafana for visualization. Learn to configure alerts, dashboards, and scrape targets.

January 10, 2026·Phan Minh Anh

Why Observability Matters

In distributed systems, you can't fix what you can't see. A proper monitoring stack gives you real-time visibility into cluster health, application performance, and infrastructure metrics.

Architecture Overview

┌─────────────────────────────────────────┐
│  Kubernetes Cluster                      │
│  ┌──────────┐  ┌──────────┐             │
│  │ App Pods │  │  Nodes   │             │
│  └────┬─────┘  └────┬─────┘             │
│       │              │   /metrics        │
│  ┌────▼──────────────▼─────┐            │
│  │      Prometheus          │            │
│  └────────────┬─────────────┘            │
│               │                          │
│  ┌────────────▼─────────────┐            │
│  │        Grafana           │            │
│  └──────────────────────────┘            │
└─────────────────────────────────────────┘

Installing with Helm

# Add the kube-prometheus-stack chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the full monitoring stack
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values values.yaml

Prometheus Configuration

# values.yaml
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          resources:
            requests:
              storage: 50Gi

grafana:
  adminPassword: "your-secure-password"
  persistence:
    enabled: true
    size: 10Gi

Instrumenting Your Application

Add Prometheus metrics to your app:

from prometheus_client import Counter, Histogram, start_http_server

REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'])
REQUEST_LATENCY = Histogram('http_request_duration_seconds', 'Request latency')

@REQUEST_LATENCY.time()
def handle_request(method, endpoint):
    # your logic here
    REQUEST_COUNT.labels(method=method, endpoint=endpoint, status='200').inc()

Key Alerts to Configure

groups:
- name: cluster-health
  rules:
  - alert: HighCPUUsage
    expr: node_cpu_seconds_total{mode="idle"} < 0.1
    for: 5m
    annotations:
      summary: "Node {{ $labels.instance }} CPU usage above 90%"

  - alert: PodCrashLooping
    expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
    for: 5m
    annotations:
      summary: "Pod {{ $labels.pod }} is crash-looping"

Essential Grafana Dashboards

Kubernetes Cluster Overview (Dashboard ID: 315) — node CPU, memory, disk
Kubernetes Pod Resources (Dashboard ID: 6417) — per-pod resource usage
Node Exporter Full (Dashboard ID: 1860) — deep node-level metrics

Import these from grafana.com directly in your Grafana UI.

Pro Tips

Set recording rules to pre-compute expensive queries
Use Alertmanager for routing alerts to Slack, PagerDuty, email
Enable persistent storage for Prometheus — losing metrics history is painful