Debug and Optimize

This lesson brings together debugging techniques and performance optimization into a practical workflow for identifying and resolving production issues.

Performance Profiling

Start by measuring current resource usage across the cluster:

# Cluster-wide resource consumption
kubectl top nodes

# Per-pod resource usage in a namespace
kubectl top pods -n production --sort-by=memory

# Identify pods with no resource limits set
kubectl get pods -n production -o json | \
  python3 -c "
import json,sys
data=json.load(sys.stdin)
for pod in data['items']:
  for c in pod['spec']['containers']:
    if 'limits' not in c.get('resources',{}):
      print(f\"No limits: {pod['metadata']['name']}/{c['name']}\")
"

# Check resource quotas and current usage
kubectl describe resourcequota -n production

Debugging Workflow

Follow a structured approach when investigating issues:

# Step 1: Get the big picture
kubectl get pods -n production -o wide
kubectl get events -n production --sort-by=.lastTimestamp

# Step 2: Drill into problem pods
kubectl describe pod problematic-pod -n production
kubectl logs problematic-pod -n production --previous

# Step 3: Interactive debugging
kubectl debug problematic-pod -it --image=busybox -n production

# Step 4: Network-level debugging
kubectl run netshoot --rm -it --image=nicolaka/netshoot -n production -- bash
# Inside: dig, curl, tcpdump, ss, ip route

Optimization Strategies

# Right-size resource requests using VPA recommendations
kubectl get vpa -n production

# Check HPA status and scaling behavior
kubectl get hpa -n production
kubectl describe hpa myapp-hpa -n production

# Identify pods that are overprovisioned
kubectl top pods -n production --containers

# Review PodDisruptionBudgets
kubectl get pdb -n production

Continuous Improvement

# Audit resource usage vs requests over time
kubectl get pods -n production \
  -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory

# Check for stale resources
kubectl get deployments -n production \
  -o custom-columns=NAME:.metadata.name,REPLICAS:.spec.replicas,\
AVAILABLE:.status.availableReplicas,AGE:.metadata.creationTimestamp

Build debugging into your operational routine rather than treating it as an emergency activity. Regular profiling catches issues before they become incidents.

Debug & Optimize

Debug and Optimize

Performance Profiling

Debugging Workflow

Optimization Strategies

Continuous Improvement