Scaling Deployments

Scaling adjusts the number of pod replicas running your application. Kubernetes supports both manual scaling with kubectl scale and automatic scaling with the Horizontal Pod Autoscaler (HPA).

Manual Scaling with kubectl scale

Change the replica count instantly:

# Scale up to 5 replicas
kubectl scale deployment/web-app --replicas=5

# Scale down to 2 replicas
kubectl scale deployment/web-app --replicas=2

# Scale multiple deployments at once
kubectl scale deployment/api deployment/worker --replicas=3

You can also add a precondition to prevent accidental scaling:

# Only scale if currently at 3 replicas
kubectl scale deployment/web-app --current-replicas=3 --replicas=5

Horizontal Pod Autoscaler (HPA) Basics

HPA automatically adjusts replica count based on observed metrics:

# Create an HPA targeting 50% CPU utilization
kubectl autoscale deployment/web-app --min=2 --max=10 --cpu-percent=50

This creates an HPA resource that monitors CPU usage and scales between 2 and 10 replicas. View its status:

kubectl get hpa
kubectl describe hpa web-app

HPA as a YAML definition:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

HPA requires the Metrics Server to be installed in the cluster.

Scaling to Zero

Kubernetes Deployments support scaling to zero replicas:

kubectl scale deployment/web-app --replicas=0

This removes all pods but keeps the Deployment resource intact. Useful for temporarily deactivating workloads, saving costs in non-production environments, or pausing processing jobs.

Standard HPA does not scale to zero because it relies on existing pods for metrics. For true scale-to-zero with automatic wake-up, use KEDA (Kubernetes Event-Driven Autoscaling):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: web-app-scaler
spec:
  scaleTargetRef:
    name: web-app
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        query: sum(rate(http_requests_total[2m]))
        threshold: "100"

KEDA watches external metrics and spins up pods from zero when events or load arrive. Scale back to at least one replica manually with kubectl scale when you need the workload active again.