The core problem
gRPC runs over HTTP/2, which multiplexes many calls over a single persistent TCP connection. Kubernetes Services do L4 (connection-level) load balancing via kube-proxy — they pick a backend pod when a TCP connection opens, then get out of the way. If the gRPC client never closes the connection, kube-proxy never gets another turn.
flowchart LR client[omni] -->|"1 TCP connection, ∞ gRPC calls"| svc["K8s Service<br/>ClusterIP"] svc --> pod1[Pod 1 - all load] svc -.->|idle| pod2[Pod 2] svc -.->|idle| pod3[Pod 3]
Context: voltnet-omni → charge-points calls. Adding replicas didn’t distribute load — omni latched onto one pod, which died under surge.
Why DNS round-robin didn’t work
gRPC’s round_robin policy needs multiple A records from DNS — one per pod. A standard K8s Service returns a single ClusterIP. Nothing to round-robin across → no children to pick from.
Why dial-per-request works (current fix)
Opening a fresh TCP connection per call forces kube-proxy to make a new routing decision each time. Exploits L4 balancing. Cost: ~77ms per call for TCP + HTTP/2 handshake.
Proper solutions
Option A: Headless Service + gRPC round_robin
clusterIP: None makes DNS return one A record per pod instead of a ClusterIP.
Warning
Critical port gotcha — With a headless service, kube-proxy is bypassed; the client connects directly to the pod IP. Dial the pod port (e.g.
:8080), not the service port. With ClusterIP, kube-proxy handles the translation; with headless it doesn’t. This only surfaces when switching.
Four components required — all of them:
clusterIP: None— DNS returns pod IPs instead of ClusterIPdns:///prefix — tells gRPC to use DNS resolver (default treats address as a static endpoint and usespick_first)round_robinpolicy — gRPC defaults topick_firstwithout this- Connection pooling — eliminates the 77ms handshake per call by holding one persistent connection per backend pod
Headless alone defaults to pick_first. Connection pooling alone without round-robin reintroduces pod affinity. All four together gives you even distribution with no handshake overhead.
Option B: Service mesh (Istio / Linkerd)
A sidecar proxy intercepts all traffic and does L7 (request-level) routing — each gRPC call independently routed even over a persistent connection. Also eliminates DNS discovery lag via EndpointSlice Watch rather than polling.
Warning
kubectl port-forwardbypasses the sidecar proxy — always test gRPC load balancing from inside the mesh, not via port-forward.
Linkerd
Linkerd injects a lightweight Rust sidecar (linkerd2-proxy, ~10–20MB). Its linkerd-destination control plane watches the EndpointSlice API directly — discovery is driven by Watch events rather than polling, significantly faster than the 30s DNS floor.
Scenario comparison
| Scenario | Headless + DNS round-robin | Linkerd |
|---|---|---|
| Rolling deploy — pod terminated, replacement starts | New pod invisible for up to 30s; surviving pods absorb full load | New pod enters rotation quickly via EndpointSlice Watch |
| Pod crash — no replacement | TCP RST detected in ms; pod removed from DNS within 30s | Same; EndpointSlice Watch-driven |
| Scale up | New pod invisible for up to 30s | Receives traffic quickly once Ready |
| Scale down — intentional | Without PreStop, in-flight RPCs dropped mid-stream | Proxy drains in-flight RPCs before proxy terminates |
| Degraded pod (slow, not failing) | Stays in round-robin at full weight | EWMA load balancer reduces traffic to slow pods automatically |
| Network partition — pod alive but unreachable | Keepalive detects dead connection in ~40s | Same; keepalive detection applies equally |
Properties
| Property | Value |
|---|---|
| DNS discovery lag at 2–3 replicas | ✅ Eliminated — EndpointSlice Watch |
| App changes | None — transparent proxy |
| New infrastructure | Control plane (linkerd-destination, linkerd-identity, linkerd-proxy-injector) |
| EKS / VPC CNI | Compatible |
| Observability | Built-in golden signals (latency, success rate, RPS) via Prometheus |
| Lock-in | Medium — Linkerd CRDs and annotations |
Stable release situation: as of February 2024, stable release artifacts require a paid Buoyant Enterprise for Linkerd subscription (employee-count based; free for ≤50 employees). Free path: edge releases. Pick the latest “recommended” release from the releases page; plan roughly quarterly upgrades. No CVE backports on the free tier.
Warning
Graceful shutdown caveat — the Linkerd proxy receives SIGTERM at the same time as the app container. If the app takes longer to shut down, outbound calls fail after the proxy terminates. Mitigation:
--wait-before-exit-secondsonlinkerd inject, or native sidecars (K8s v1.29+).
Migration paths:
- In: inject the mesh, keep services as ClusterIP, no app code changes
- Out: remove injection and revert services to ClusterIP; safe fallback is the original per-request pattern
Headless + DNS vs Linkerd
| Headless + DNS (current) | Linkerd | |
|---|---|---|
| DNS discovery lag at 2–3 replicas | ❌ 30s window | ✅ EndpointSlice Watch |
| App changes | Yes (dns:///) | None |
| New infrastructure | None | Control plane |
| Ops overhead | Low | Low–medium |
| Observability added | None | Golden signals |
| Lock-in | None | Medium |
| Rollback | Feature flag | Mesh removal |
Alternatives ruled out
- kuberesolver (
sercand/kuberesolverv6) — watches EndpointSlice API directly; would eliminate DNS lag without a mesh. Ruled out: unmaintained (last main commit >1.5 years ago, no releases). - Istio — operational complexity far exceeds current requirements for 1 DevOps.
- Envoy standalone — more LB control than needed; no proportional benefit over Linkerd.
- Cilium — requires replacing Amazon VPC CNI; hard blocker on EKS.
Summary
| Approach | Balancing level | Works? |
|---|---|---|
| Standard K8s Service | L4, per connection | ✅ only per-connection |
| Dial-per-request | L4 (hack) | ✅ with 77ms latency cost |
| Headless Service + round_robin | L7 client-side | ✅ with caveats |
| Service mesh (Linkerd / Istio) | L7 proxy | ✅ cleanest |
See also
- grpc-client-side-load-balancing-verification — practical verification method for coverage, reuse, and distribution
- grpc-connection-registry — implementation: headless services, voltnet-common round-robin API, connection pool
- grpc-keepalive-enhance-your-calm — ENHANCE_YOUR_CALM reconnect loop from keepalive misconfiguration
- grpc-dns-discovery-lag — 30s DNS polling floor and mitigation patterns
- websocket-transport — WebSocket transport patterns, similar concurrent connection concerns