Debug and Optimize
This lesson brings together debugging techniques and performance optimization into a practical workflow for identifying and resolving production issues.
Performance Profiling
Start by measuring current resource usage across the cluster:
# Cluster-wide resource consumption
kubectl top nodes
# Per-pod resource usage in a namespace
kubectl top pods -n production --sort-by=memory
# Identify pods with no resource limits set
kubectl get pods -n production -o json | \
python3 -c "
import json,sys
data=json.load(sys.stdin)
for pod in data['items']:
for c in pod['spec']['containers']:
if 'limits' not in c.get('resources',{}):
print(f\"No limits: {pod['metadata']['name']}/{c['name']}\")
"
# Check resource quotas and current usage
kubectl describe resourcequota -n production
Debugging Workflow
Follow a structured approach when investigating issues:
# Step 1: Get the big picture
kubectl get pods -n production -o wide
kubectl get events -n production --sort-by=.lastTimestamp
# Step 2: Drill into problem pods
kubectl describe pod problematic-pod -n production
kubectl logs problematic-pod -n production --previous
# Step 3: Interactive debugging
kubectl debug problematic-pod -it --image=busybox -n production
# Step 4: Network-level debugging
kubectl run netshoot --rm -it --image=nicolaka/netshoot -n production -- bash
# Inside: dig, curl, tcpdump, ss, ip route
Optimization Strategies
# Right-size resource requests using VPA recommendations
kubectl get vpa -n production
# Check HPA status and scaling behavior
kubectl get hpa -n production
kubectl describe hpa myapp-hpa -n production
# Identify pods that are overprovisioned
kubectl top pods -n production --containers
# Review PodDisruptionBudgets
kubectl get pdb -n production
Continuous Improvement
# Audit resource usage vs requests over time
kubectl get pods -n production \
-o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory
# Check for stale resources
kubectl get deployments -n production \
-o custom-columns=NAME:.metadata.name,REPLICAS:.spec.replicas,\
AVAILABLE:.status.availableReplicas,AGE:.metadata.creationTimestamp
Build debugging into your operational routine rather than treating it as an emergency activity. Regular profiling catches issues before they become incidents.