Requests vs limits

Requests — the amount the scheduler guarantees to the container. Used to decide which node to place the pod on. The container is always allocated at least this much.

Limits — the maximum the container is allowed to consume. Enforcement differs by resource type:

  • CPU: throttled (slowed down, not killed)
  • Memory: OOMKilled (pod is killed and restarted)

What happens when requests are omitted

When requests is not set, Kubernetes defaults it to equal limits. A pod with only limits defined still “reserves” the full limit amount from the scheduler’s perspective.

Bursting pattern

Set requests lower than limits to allow the pod to burst during spikes while reserving less capacity on the node:

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 400m
    memory: 512Mi

This is useful for workloads that are mostly idle but occasionally spike — the pod only “reserves” 100m CPU on the node but can use up to 400m when available.

Tip

QoS class — when requests equal limits, the pod gets Guaranteed QoS (last to be evicted under pressure). When requests are lower than limits, it gets Burstable. When no limits or requests are set at all, it gets BestEffort (evicted first).

Warning

Memory OOMKill — unlike CPU throttling, exceeding the memory limit kills the pod immediately. Set memory limits conservatively and monitor actual usage before tightening them.