Reading metrics

The request lifecycle in order: blocked → connecting → tls_handshaking → sending → waiting → receiving

http_req_duration = http_req_sending + http_req_waiting + http_req_receiving — the time the SUT spent processing and responding. It deliberately excludes connection stages.

MetricReliabilityNotes
http_req_waiting (TTFB)HighMeasures server processing time only — excludes TLS, TCP, and connection overhead, which are tracked separately
http_req_tls_handshakingLow (from laptop)CPU-intensive crypto — inflated by client CPU under load; can show 3s p99 when real cost is 58ms
http_req_blockedMediumTime waiting for a free TCP connection slot — high values mean connection pool exhaustion, not a server problem
http_req_connectingMediumActual TCP handshake duration — high values point to network latency or server backlog

Tip

TTFB is the metric to optimise — compare http_req_waiting against a single curl TTFB to confirm whether degradation is server-side or test-infrastructure noise.

Why connection stage metrics show 0

http_req_connecting and http_req_tls_handshaking legitimately show 0 for many requests — this is correct behaviour, not a bug. k6 reuses HTTP keep-alive connections by default, so the TCP and TLS handshakes only happen on the first request of a VU (or after a connection is dropped). Subsequent requests in the same VU skip the connection stages entirely.

This makes these metrics noisy for analysis: one request pays the full handshake cost, the next pays nothing. Setting noConnectionReuse: true forces a fresh connection per iteration and makes the values consistent, at the cost of higher per-request overhead.

Comparing with JMeter / Gatling

JMeter and Gatling include the connection establishment time in their response time metric. k6’s http_req_duration excludes it. When benchmarking shows k6 reporting lower latency than JMeter for the same endpoint, this is the most likely cause — not that the endpoint got faster. Add http_req_connecting + http_req_tls_handshaking to http_req_duration to produce a comparable total, but note that connection reuse makes the result inconsistent across iterations (see above).

Diagnosing high http_req_tls_handshaking

First rule out the load generator: if the value drops significantly when running from a more powerful machine, it’s client CPU, not the server.

If the generator is not the bottleneck, the server-side causes in order of likelihood:

  • Server CPU saturation — TLS crypto is CPU-intensive; check server CPU during the test
  • TLS version — TLS 1.2 requires 2 round trips to complete the handshake; TLS 1.3 requires 1. At high VU counts the extra RTT compounds. Verify the server is negotiating TLS 1.3
  • Suboptimal cipher suites — older cipher suites (RSA key exchange, non-ECDHE) are more computationally expensive; ensure the server prioritises ECDHE + AES-GCM or ChaCha20
  • Large certificate chain — if intermediate certificates are not bundled on the server, the client must fetch them separately, adding round trips. Check chain size with openssl s_client -connect host:443
  • Network distance — the TLS handshake requires multiple round trips; a geographically distant load generator multiplies this. Always run the load generator in the same region as the SUT

High VU count at low RPS = slow server

If you see 70+ VUs accumulating while RPS stays at ~5, the server is slow — VUs are piling up waiting for responses. This is not a client-side issue.

Validating results

Establish a curl baseline before every test run:

curl -o /dev/null -s -w \
  "dns: %{time_namelookup}s\ntls: %{time_appconnect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" \
  https://your-endpoint

Compare k6’s http_req_waiting p50 against this single-request TTFB. If they match, degradation is server-side.

Cross-reference with jaeger traces to identify which internal service is the bottleneck — k6 tells you the endpoint is slow; Jaeger tells you which downstream call is to blame.

EOF errors

EOF errors (reported as request failed: EOF or status code 0) have several distinct causes. The pattern of when they occur is the main diagnostic signal:

PatternLikely causeHow to confirm
All VUs at the exact same secondInfrastructure restart — pod, load balancer, or cache flushed existing connectionsCheck k9s for pod restart count spike at that timestamp
Scattered throughout the testServer overloaded — TCP backlog full, server actively closing connectionsCross-reference with server CPU/memory; check if error rate tracks VU ramp
Only at stage transitionsStale keep-alive connection race (see k6-script-config)Set noVUConnectionReuse: true and rerun
Only during ramp-upk6 hitting OS file descriptor limits — can’t open new TCP socketsCheck with ulimit -n; see k6-os-tuning

Note

A k8s rolling deployment mid-test produces the same pattern as an infrastructure restart — all connections drop at once as the old pod is terminated. Avoid deploying during a test run.

See also

  • k6-script-config — pre-test configuration: connection reuse, TLS settings, tagging
  • k6-large-tests — capacity, OS tuning, script optimisations, distributed execution
  • k6-vu-saturation — load generator as the bottleneck
  • jaeger — tracing for identifying internal bottlenecks