Client-side load balancing for gRPC in Kubernetes requires four changes working together: headless services, dns:/// resolver prefix, round_robin policy, and a persistent connection pool. Any one of these alone is insufficient.
Without pooling (current UAT/prod)
kube-proxy picks a pod on each TCP dial. Per-request dial gives load distribution at ~77ms per RPC (TCP + TLS handshake).
voltnet-omni downstream
outbound RPC → dial TCP+TLS → kube-proxy → random pod
→ RPC
→ conn.Close()With pooling (dev)
One persistent *grpc.ClientConn per pod IP. DNS resolves all pod IPs at startup; round-robin distributes RPCs across sub-connections.
flowchart LR registry["ConnectionRegistry<br/>addr → *grpc.ClientConn"] registry -->|conn-A| podA[pod-A] registry -->|conn-B| podB[pod-B] registry -->|conn-C| podC[pod-C] rpc["outbound RPC"] -->|"round_robin policy"| registry
Headless services
clusterIP: None causes DNS to return one A record per pod instead of a single ClusterIP.
spec:
clusterIP: None # DNS returns all pod IPs instead of a single VIPWarning
Pod port, not service port — headless services bypass kube-proxy; the client dials the actual pod port. Standard ClusterIP services handle
:80→:8080translation; headless services do not. See configmap for per-service port mappings.
Client-side round-robin (voltnet-common)
DialGRPCWithOptions in voltnet-common. DialGRPC is a zero-option wrapper kept for legacy callers.
| Option | Default | Description |
|---|---|---|
EnableClientSideLoadBalancing() | disabled | Adds dns:/// prefix and round_robin policy |
WithKeepaliveTime(d) | 10s | Ping interval. Must be ≥ server EnforcementPolicy.MinTime — see grpc-keepalive-enhance-your-calm |
WithKeepaliveTimeout(d) | 5s | Time to wait for pong before declaring connection dead |
WithPermitWithoutStream(bool) | true | Send keepalive pings when no active streams exist |
WithDialTimeout(d) | 5s | Hard timeout on initial connection establishment |
The dns:/// prefix is required — without it, gRPC uses the passthrough resolver (single address, ignores LB policy). round_robin policy is required — gRPC defaults to pick_first without it.
Connection registry (voltnet-omni)
pkg/grpc/registry.go — singleton, thread-safe cache of address → *grpc.ClientConn. Connections established once and reused for the lifetime of the pod.
DialWithPooling handles GRPC_CONNECTION_POOLING_ENABLED transparently — when false it dials per-request; when true it returns a cached connection. All callers use defer cleanup(), so toggling the flag requires no code changes.
Cache key is the address, not the logical service name — prevents stale connections if an address changes across environments.
Graceful shutdown
cmd/main.go registers a shutdown handler that calls Close() on the registry before the process exits. Close() uses a 30-second timeout per connection — worst-case shutdown time with five downstream services: 2.5 minutes if all connections stall.
Note
Shutdown log lines show the address string (e.g.
"service": "voltnet-pricing-svc:8080"), not a human-readable name — the loop variable is namedservicebut holds the address.
Feature flag
| Env var | Default | Effect |
|---|---|---|
GRPC_CONNECTION_POOLING_ENABLED | false | true enables connection reuse; false preserves per-request dial |
Rollback: set to false and redeploy — no code changes needed.
See also
- grpc-load-balancing — the core problem and why dial-per-request works as a workaround
- grpc-client-side-load-balancing-verification — verification method for connection coverage and distribution
- grpc-keepalive-enhance-your-calm — ENHANCE_YOUR_CALM error from 10s keepalive vs server 5min MinTime
- grpc-dns-discovery-lag — 30s DNS polling floor and mitigation patterns