Client-side load balancing for gRPC in Kubernetes requires four changes working together: headless services, dns:/// resolver prefix, round_robin policy, and a persistent connection pool. Any one of these alone is insufficient.

Without pooling (current UAT/prod)

kube-proxy picks a pod on each TCP dial. Per-request dial gives load distribution at ~77ms per RPC (TCP + TLS handshake).

voltnet-omni                              downstream
  outbound RPC → dial TCP+TLS → kube-proxy → random pod
                → RPC
                → conn.Close()

With pooling (dev)

One persistent *grpc.ClientConn per pod IP. DNS resolves all pod IPs at startup; round-robin distributes RPCs across sub-connections.

flowchart LR
    registry["ConnectionRegistry<br/>addr → *grpc.ClientConn"]
    registry -->|conn-A| podA[pod-A]
    registry -->|conn-B| podB[pod-B]
    registry -->|conn-C| podC[pod-C]
    rpc["outbound RPC"] -->|"round_robin policy"| registry

Headless services

clusterIP: None causes DNS to return one A record per pod instead of a single ClusterIP.

spec:
  clusterIP: None  # DNS returns all pod IPs instead of a single VIP

Warning

Pod port, not service port — headless services bypass kube-proxy; the client dials the actual pod port. Standard ClusterIP services handle :80:8080 translation; headless services do not. See configmap for per-service port mappings.

Client-side round-robin (voltnet-common)

DialGRPCWithOptions in voltnet-common. DialGRPC is a zero-option wrapper kept for legacy callers.

OptionDefaultDescription
EnableClientSideLoadBalancing()disabledAdds dns:/// prefix and round_robin policy
WithKeepaliveTime(d)10sPing interval. Must be ≥ server EnforcementPolicy.MinTime — see grpc-keepalive-enhance-your-calm
WithKeepaliveTimeout(d)5sTime to wait for pong before declaring connection dead
WithPermitWithoutStream(bool)trueSend keepalive pings when no active streams exist
WithDialTimeout(d)5sHard timeout on initial connection establishment

The dns:/// prefix is required — without it, gRPC uses the passthrough resolver (single address, ignores LB policy). round_robin policy is required — gRPC defaults to pick_first without it.

Connection registry (voltnet-omni)

pkg/grpc/registry.go — singleton, thread-safe cache of address → *grpc.ClientConn. Connections established once and reused for the lifetime of the pod.

DialWithPooling handles GRPC_CONNECTION_POOLING_ENABLED transparently — when false it dials per-request; when true it returns a cached connection. All callers use defer cleanup(), so toggling the flag requires no code changes.

Cache key is the address, not the logical service name — prevents stale connections if an address changes across environments.

Graceful shutdown

cmd/main.go registers a shutdown handler that calls Close() on the registry before the process exits. Close() uses a 30-second timeout per connection — worst-case shutdown time with five downstream services: 2.5 minutes if all connections stall.

Note

Shutdown log lines show the address string (e.g. "service": "voltnet-pricing-svc:8080"), not a human-readable name — the loop variable is named service but holds the address.

Feature flag

Env varDefaultEffect
GRPC_CONNECTION_POOLING_ENABLEDfalsetrue enables connection reuse; false preserves per-request dial

Rollback: set to false and redeploy — no code changes needed.

See also