Reading metrics
The request lifecycle in order: blocked → connecting → tls_handshaking → sending → waiting → receiving
http_req_duration = http_req_sending + http_req_waiting + http_req_receiving — the time the SUT spent processing and responding. It deliberately excludes connection stages.
| Metric | Reliability | Notes |
|---|---|---|
http_req_waiting (TTFB) | High | Measures server processing time only — excludes TLS, TCP, and connection overhead, which are tracked separately |
http_req_tls_handshaking | Low (from laptop) | CPU-intensive crypto — inflated by client CPU under load; can show 3s p99 when real cost is 58ms |
http_req_blocked | Medium | Time waiting for a free TCP connection slot — high values mean connection pool exhaustion, not a server problem |
http_req_connecting | Medium | Actual TCP handshake duration — high values point to network latency or server backlog |
Tip
TTFB is the metric to optimise — compare
http_req_waitingagainst a singlecurlTTFB to confirm whether degradation is server-side or test-infrastructure noise.
Why connection stage metrics show 0
http_req_connecting and http_req_tls_handshaking legitimately show 0 for many requests — this is correct behaviour, not a bug. k6 reuses HTTP keep-alive connections by default, so the TCP and TLS handshakes only happen on the first request of a VU (or after a connection is dropped). Subsequent requests in the same VU skip the connection stages entirely.
This makes these metrics noisy for analysis: one request pays the full handshake cost, the next pays nothing. Setting noConnectionReuse: true forces a fresh connection per iteration and makes the values consistent, at the cost of higher per-request overhead.
Comparing with JMeter / Gatling
JMeter and Gatling include the connection establishment time in their response time metric. k6’s http_req_duration excludes it. When benchmarking shows k6 reporting lower latency than JMeter for the same endpoint, this is the most likely cause — not that the endpoint got faster. Add http_req_connecting + http_req_tls_handshaking to http_req_duration to produce a comparable total, but note that connection reuse makes the result inconsistent across iterations (see above).
Diagnosing high http_req_tls_handshaking
First rule out the load generator: if the value drops significantly when running from a more powerful machine, it’s client CPU, not the server.
If the generator is not the bottleneck, the server-side causes in order of likelihood:
- Server CPU saturation — TLS crypto is CPU-intensive; check server CPU during the test
- TLS version — TLS 1.2 requires 2 round trips to complete the handshake; TLS 1.3 requires 1. At high VU counts the extra RTT compounds. Verify the server is negotiating TLS 1.3
- Suboptimal cipher suites — older cipher suites (RSA key exchange, non-ECDHE) are more computationally expensive; ensure the server prioritises ECDHE + AES-GCM or ChaCha20
- Large certificate chain — if intermediate certificates are not bundled on the server, the client must fetch them separately, adding round trips. Check chain size with
openssl s_client -connect host:443 - Network distance — the TLS handshake requires multiple round trips; a geographically distant load generator multiplies this. Always run the load generator in the same region as the SUT
High VU count at low RPS = slow server
If you see 70+ VUs accumulating while RPS stays at ~5, the server is slow — VUs are piling up waiting for responses. This is not a client-side issue.
Validating results
Establish a curl baseline before every test run:
curl -o /dev/null -s -w \
"dns: %{time_namelookup}s\ntls: %{time_appconnect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" \
https://your-endpointCompare k6’s http_req_waiting p50 against this single-request TTFB. If they match, degradation is server-side.
Cross-reference with jaeger traces to identify which internal service is the bottleneck — k6 tells you the endpoint is slow; Jaeger tells you which downstream call is to blame.
EOF errors
EOF errors (reported as request failed: EOF or status code 0) have several distinct causes. The pattern of when they occur is the main diagnostic signal:
| Pattern | Likely cause | How to confirm |
|---|---|---|
| All VUs at the exact same second | Infrastructure restart — pod, load balancer, or cache flushed existing connections | Check k9s for pod restart count spike at that timestamp |
| Scattered throughout the test | Server overloaded — TCP backlog full, server actively closing connections | Cross-reference with server CPU/memory; check if error rate tracks VU ramp |
| Only at stage transitions | Stale keep-alive connection race (see k6-script-config) | Set noVUConnectionReuse: true and rerun |
| Only during ramp-up | k6 hitting OS file descriptor limits — can’t open new TCP sockets | Check with ulimit -n; see k6-os-tuning |
Note
A k8s rolling deployment mid-test produces the same pattern as an infrastructure restart — all connections drop at once as the old pod is terminated. Avoid deploying during a test run.
See also
- k6-script-config — pre-test configuration: connection reuse, TLS settings, tagging
- k6-large-tests — capacity, OS tuning, script optimisations, distributed execution
- k6-vu-saturation — load generator as the bottleneck
- jaeger — tracing for identifying internal bottlenecks