The core problem

gRPC runs over HTTP/2, which multiplexes many calls over a single persistent TCP connection. Kubernetes Services do L4 (connection-level) load balancing via kube-proxy — they pick a backend pod when a TCP connection opens, then get out of the way. If the gRPC client never closes the connection, kube-proxy never gets another turn.

flowchart LR
    client[omni] -->|"1 TCP connection, ∞ gRPC calls"| svc["K8s Service<br/>ClusterIP"]
    svc --> pod1[Pod 1 - all load]
    svc -.->|idle| pod2[Pod 2]
    svc -.->|idle| pod3[Pod 3]

Context: voltnet-omnicharge-points calls. Adding replicas didn’t distribute load — omni latched onto one pod, which died under surge.

Why DNS round-robin didn’t work

gRPC’s round_robin policy needs multiple A records from DNS — one per pod. A standard K8s Service returns a single ClusterIP. Nothing to round-robin across → no children to pick from.

Why dial-per-request works (current fix)

Opening a fresh TCP connection per call forces kube-proxy to make a new routing decision each time. Exploits L4 balancing. Cost: ~77ms per call for TCP + HTTP/2 handshake.

Proper solutions

Option A: Headless Service + gRPC round_robin

clusterIP: None makes DNS return one A record per pod instead of a ClusterIP.

Warning

Critical port gotcha — With a headless service, kube-proxy is bypassed; the client connects directly to the pod IP. Dial the pod port (e.g. :8080), not the service port. With ClusterIP, kube-proxy handles the translation; with headless it doesn’t. This only surfaces when switching.

Four components required — all of them:

  1. clusterIP: None — DNS returns pod IPs instead of ClusterIP
  2. dns:/// prefix — tells gRPC to use DNS resolver (default treats address as a static endpoint and uses pick_first)
  3. round_robin policy — gRPC defaults to pick_first without this
  4. Connection pooling — eliminates the 77ms handshake per call by holding one persistent connection per backend pod

Headless alone defaults to pick_first. Connection pooling alone without round-robin reintroduces pod affinity. All four together gives you even distribution with no handshake overhead.

Option B: Service mesh (Istio / Linkerd)

A sidecar proxy intercepts all traffic and does L7 (request-level) routing — each gRPC call independently routed even over a persistent connection. Also eliminates DNS discovery lag via EndpointSlice Watch rather than polling.

Warning

kubectl port-forward bypasses the sidecar proxy — always test gRPC load balancing from inside the mesh, not via port-forward.

Linkerd

Linkerd injects a lightweight Rust sidecar (linkerd2-proxy, ~10–20MB). Its linkerd-destination control plane watches the EndpointSlice API directly — discovery is driven by Watch events rather than polling, significantly faster than the 30s DNS floor.

Scenario comparison

ScenarioHeadless + DNS round-robinLinkerd
Rolling deploy — pod terminated, replacement startsNew pod invisible for up to 30s; surviving pods absorb full loadNew pod enters rotation quickly via EndpointSlice Watch
Pod crash — no replacementTCP RST detected in ms; pod removed from DNS within 30sSame; EndpointSlice Watch-driven
Scale upNew pod invisible for up to 30sReceives traffic quickly once Ready
Scale down — intentionalWithout PreStop, in-flight RPCs dropped mid-streamProxy drains in-flight RPCs before proxy terminates
Degraded pod (slow, not failing)Stays in round-robin at full weightEWMA load balancer reduces traffic to slow pods automatically
Network partition — pod alive but unreachableKeepalive detects dead connection in ~40sSame; keepalive detection applies equally

Properties

PropertyValue
DNS discovery lag at 2–3 replicas✅ Eliminated — EndpointSlice Watch
App changesNone — transparent proxy
New infrastructureControl plane (linkerd-destination, linkerd-identity, linkerd-proxy-injector)
EKS / VPC CNICompatible
ObservabilityBuilt-in golden signals (latency, success rate, RPS) via Prometheus
Lock-inMedium — Linkerd CRDs and annotations

Stable release situation: as of February 2024, stable release artifacts require a paid Buoyant Enterprise for Linkerd subscription (employee-count based; free for ≤50 employees). Free path: edge releases. Pick the latest “recommended” release from the releases page; plan roughly quarterly upgrades. No CVE backports on the free tier.

Warning

Graceful shutdown caveat — the Linkerd proxy receives SIGTERM at the same time as the app container. If the app takes longer to shut down, outbound calls fail after the proxy terminates. Mitigation: --wait-before-exit-seconds on linkerd inject, or native sidecars (K8s v1.29+).

Migration paths:

  • In: inject the mesh, keep services as ClusterIP, no app code changes
  • Out: remove injection and revert services to ClusterIP; safe fallback is the original per-request pattern

Headless + DNS vs Linkerd

Headless + DNS (current)Linkerd
DNS discovery lag at 2–3 replicas❌ 30s window✅ EndpointSlice Watch
App changesYes (dns:///)None
New infrastructureNoneControl plane
Ops overheadLowLow–medium
Observability addedNoneGolden signals
Lock-inNoneMedium
RollbackFeature flagMesh removal

Alternatives ruled out

  • kuberesolver (sercand/kuberesolver v6) — watches EndpointSlice API directly; would eliminate DNS lag without a mesh. Ruled out: unmaintained (last main commit >1.5 years ago, no releases).
  • Istio — operational complexity far exceeds current requirements for 1 DevOps.
  • Envoy standalone — more LB control than needed; no proportional benefit over Linkerd.
  • Cilium — requires replacing Amazon VPC CNI; hard blocker on EKS.

Summary

ApproachBalancing levelWorks?
Standard K8s ServiceL4, per connection✅ only per-connection
Dial-per-requestL4 (hack)✅ with 77ms latency cost
Headless Service + round_robinL7 client-side✅ with caveats
Service mesh (Linkerd / Istio)L7 proxy✅ cleanest

See also