gRPC Load Balancing in Kubernetes

The core problem

gRPC runs over HTTP/2, which multiplexes many calls over a single persistent TCP connection. Kubernetes Services do L4 (connection-level) load balancing via kube-proxy — they pick a backend pod when a TCP connection opens, then get out of the way. If the gRPC client never closes the connection, kube-proxy never gets another turn.

flowchart LR
    client[omni] -->|"1 TCP connection, ∞ gRPC calls"| svc["K8s Service<br/>ClusterIP"]
    svc --> pod1[Pod 1 - all load]
    svc -.->|idle| pod2[Pod 2]
    svc -.->|idle| pod3[Pod 3]

Context: voltnet-omni → charge-points calls. Adding replicas didn’t distribute load — omni latched onto one pod, which died under surge.

Why DNS round-robin didn’t work

gRPC’s round_robin policy needs multiple A records from DNS — one per pod. A standard K8s Service returns a single ClusterIP. Nothing to round-robin across → no children to pick from.

Why dial-per-request works (current fix)

Opening a fresh TCP connection per call forces kube-proxy to make a new routing decision each time. Exploits L4 balancing. Cost: ~77ms per call for TCP + HTTP/2 handshake.

Proper solutions

Option A: Headless Service + gRPC round_robin

clusterIP: None makes DNS return one A record per pod instead of a ClusterIP.

spec:
  clusterIP: None
  ports:
    - port: 50051
  selector:
    app: charge-points

conn, err := grpc.Dial(
    "dns:///charge-points-headless:50051",
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
)

Warning

Critical port gotcha — With a headless service, kube-proxy is bypassed; the client connects directly to the pod IP. Dial the pod port (e.g. :8080), not the service port. With ClusterIP, kube-proxy handles the translation; with headless it doesn’t. This only surfaces when switching.

Four components required — all of them:

clusterIP: None — DNS returns pod IPs instead of ClusterIP
dns:/// prefix — tells gRPC to use DNS resolver (default treats address as a static endpoint and uses pick_first)
round_robin policy — gRPC defaults to pick_first without this
Connection pooling — eliminates the 77ms handshake per call by holding one persistent connection per backend pod

Headless alone defaults to pick_first. Connection pooling alone without round-robin reintroduces pod affinity. All four together gives you even distribution with no handshake overhead.

Option B: Istio (recommended for production)

Envoy sidecar intercepts all traffic and does L7 (request-level) routing — each gRPC call independently routed even over a persistent connection.

trafficPolicy:
  loadBalancer:
    simple: LEAST_REQUEST
  connectionPool:
    http:
      maxRequestsPerConnection: 10  # forces periodic connection rotation

Warning

kubectl port-forward bypasses Envoy — always test gRPC load balancing from inside the mesh, not via port-forward. Overhead: ~20%, but you get circuit breaking, retries, and observability.

Summary

Approach	Balancing level	Works?
Standard K8s Service	L4, per connection	✅ only per-connection
Dial-per-request	L4 (hack)	✅ with 77ms latency cost
Headless Service + round_robin	L7 client-side	✅ with caveats
Istio	L7 proxy	✅ cleanest

Zhu Yuechen's Tech Notes

Explorer

gRPC Load Balancing in Kubernetes

The core problem

Why DNS round-robin didn’t work

Why dial-per-request works (current fix)

Proper solutions

Option A: Headless Service + gRPC round_robin

Option B: Istio (recommended for production)

Summary

See also

Graph View

Table of Contents