The core problem

gRPC runs over HTTP/2, which multiplexes many calls over a single persistent TCP connection. Kubernetes Services do L4 (connection-level) load balancing via kube-proxy — they pick a backend pod when a TCP connection opens, then get out of the way. If the gRPC client never closes the connection, kube-proxy never gets another turn.

flowchart LR
    client[omni] -->|"1 TCP connection, ∞ gRPC calls"| svc["K8s Service<br/>ClusterIP"]
    svc --> pod1[Pod 1 - all load]
    svc -.->|idle| pod2[Pod 2]
    svc -.->|idle| pod3[Pod 3]

Context: voltnet-omnicharge-points calls. Adding replicas didn’t distribute load — omni latched onto one pod, which died under surge.

Why DNS round-robin didn’t work

gRPC’s round_robin policy needs multiple A records from DNS — one per pod. A standard K8s Service returns a single ClusterIP. Nothing to round-robin across → no children to pick from.

Why dial-per-request works (current fix)

Opening a fresh TCP connection per call forces kube-proxy to make a new routing decision each time. Exploits L4 balancing. Cost: ~77ms per call for TCP + HTTP/2 handshake.

Proper solutions

Option A: Headless Service + gRPC round_robin

clusterIP: None makes DNS return one A record per pod instead of a ClusterIP.

spec:
  clusterIP: None
  ports:
    - port: 50051
  selector:
    app: charge-points
conn, err := grpc.Dial(
    "dns:///charge-points-headless:50051",
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
)

Warning

Critical port gotcha — With a headless service, kube-proxy is bypassed; the client connects directly to the pod IP. Dial the pod port (e.g. :8080), not the service port. With ClusterIP, kube-proxy handles the translation; with headless it doesn’t. This only surfaces when switching.

Four components required — all of them:

  1. clusterIP: None — DNS returns pod IPs instead of ClusterIP
  2. dns:/// prefix — tells gRPC to use DNS resolver (default treats address as a static endpoint and uses pick_first)
  3. round_robin policy — gRPC defaults to pick_first without this
  4. Connection pooling — eliminates the 77ms handshake per call by holding one persistent connection per backend pod

Headless alone defaults to pick_first. Connection pooling alone without round-robin reintroduces pod affinity. All four together gives you even distribution with no handshake overhead.

Envoy sidecar intercepts all traffic and does L7 (request-level) routing — each gRPC call independently routed even over a persistent connection.

trafficPolicy:
  loadBalancer:
    simple: LEAST_REQUEST
  connectionPool:
    http:
      maxRequestsPerConnection: 10  # forces periodic connection rotation

Warning

kubectl port-forward bypasses Envoy — always test gRPC load balancing from inside the mesh, not via port-forward. Overhead: ~20%, but you get circuit breaking, retries, and observability.

Summary

ApproachBalancing levelWorks?
Standard K8s ServiceL4, per connection✅ only per-connection
Dial-per-requestL4 (hack)✅ with 77ms latency cost
Headless Service + round_robinL7 client-side✅ with caveats
IstioL7 proxy✅ cleanest

See also