The core problem
gRPC runs over HTTP/2, which multiplexes many calls over a single persistent TCP connection. Kubernetes Services do L4 (connection-level) load balancing via kube-proxy — they pick a backend pod when a TCP connection opens, then get out of the way. If the gRPC client never closes the connection, kube-proxy never gets another turn.
flowchart LR client[omni] -->|"1 TCP connection, ∞ gRPC calls"| svc["K8s Service<br/>ClusterIP"] svc --> pod1[Pod 1 - all load] svc -.->|idle| pod2[Pod 2] svc -.->|idle| pod3[Pod 3]
Context: voltnet-omni → charge-points calls. Adding replicas didn’t distribute load — omni latched onto one pod, which died under surge.
Why DNS round-robin didn’t work
gRPC’s round_robin policy needs multiple A records from DNS — one per pod. A standard K8s Service returns a single ClusterIP. Nothing to round-robin across → no children to pick from.
Why dial-per-request works (current fix)
Opening a fresh TCP connection per call forces kube-proxy to make a new routing decision each time. Exploits L4 balancing. Cost: ~77ms per call for TCP + HTTP/2 handshake.
Proper solutions
Option A: Headless Service + gRPC round_robin
clusterIP: None makes DNS return one A record per pod instead of a ClusterIP.
spec:
clusterIP: None
ports:
- port: 50051
selector:
app: charge-pointsconn, err := grpc.Dial(
"dns:///charge-points-headless:50051",
grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
)Warning
Critical port gotcha — With a headless service, kube-proxy is bypassed; the client connects directly to the pod IP. Dial the pod port (e.g.
:8080), not the service port. With ClusterIP, kube-proxy handles the translation; with headless it doesn’t. This only surfaces when switching.
Four components required — all of them:
clusterIP: None— DNS returns pod IPs instead of ClusterIPdns:///prefix — tells gRPC to use DNS resolver (default treats address as a static endpoint and usespick_first)round_robinpolicy — gRPC defaults topick_firstwithout this- Connection pooling — eliminates the 77ms handshake per call by holding one persistent connection per backend pod
Headless alone defaults to pick_first. Connection pooling alone without round-robin reintroduces pod affinity. All four together gives you even distribution with no handshake overhead.
Option B: Istio (recommended for production)
Envoy sidecar intercepts all traffic and does L7 (request-level) routing — each gRPC call independently routed even over a persistent connection.
trafficPolicy:
loadBalancer:
simple: LEAST_REQUEST
connectionPool:
http:
maxRequestsPerConnection: 10 # forces periodic connection rotationWarning
kubectl port-forwardbypasses Envoy — always test gRPC load balancing from inside the mesh, not via port-forward. Overhead: ~20%, but you get circuit breaking, retries, and observability.
Summary
| Approach | Balancing level | Works? |
|---|---|---|
| Standard K8s Service | L4, per connection | ✅ only per-connection |
| Dial-per-request | L4 (hack) | ✅ with 77ms latency cost |
| Headless Service + round_robin | L7 client-side | ✅ with caveats |
| Istio | L7 proxy | ✅ cleanest |
See also
- setnx-dedup-multi-pod — multi-pod coordination pattern (Redis SETNX)
- websocket-transport — WebSocket transport patterns, similar concurrent connection concerns