k6 Metrics Interpretation and Configuration

Before running

Run from a remote machine, not your laptop. Client CPU contention inflates http_req_tls_handshaking — can show 3s p99 when actual TLS cost is 58ms.

Stabilise the environment first. Check k9s for pods with high restart counts before starting. A pod restarting mid-test makes results uninterpretable.

Connection reuse

export const options = {
  noConnectionReuse: true,
};

By default, k6 reuses HTTP keep-alive connections across iterations within the same VU. This creates a race condition at stage boundaries:

The server decides an idle connection has timed out and sends a TCP FIN to close it.
While that FIN is still in-flight (or sitting unprocessed in the OS receive buffer), the VU sends a new request on the same socket.
The server receives the request on an already-closed socket and responds with TCP RST.
k6 sees a “connection reset by peer” or EOF — which looks like a server error but is just a connection lifecycle mismatch.

The root cause is a timeout mismatch: as long as the client timeout ≥ server timeout, the server can close first and trigger this race. Setting noConnectionReuse: true avoids it by opening a fresh connection per iteration, at the cost of slightly higher per-request overhead (TCP + TLS handshake).

TLS in UAT

export const options = {
  insecureSkipTLSVerify: true,
};

UAT environments might have inconsistent certs across pods. Without this, you get spurious TLS errors that look like server failures.

Tagging for per-endpoint metrics

k6 automatically applies a scenario system tag to every metric, but scenario tags don’t create per-endpoint sub-metric breakdowns — they just let you filter by scenario after the fact. To get http_req_waiting{test:X} scoped to a specific request, you must tag at the request level:

http.get(url, { tags: { test: 'my-endpoint' } });

Scenario-level tags (set in options.scenarios[*].tags) propagate to all requests in that scenario but still require a matching threshold to surface as a sub-metric in the summary. Request-level tags are more precise and are the correct unit for per-endpoint breakdown.

Exposing sub-metrics in `handleSummary`

In the open-source k6 CLI, sub-metrics are only computed and included in the handleSummary data object when a threshold is defined for that tag combination. Without a threshold, k6 doesn’t materialise the sub-metric at all — it won’t appear in the summary even if requests were tagged correctly.

Use a dummy threshold that always passes to force the sub-metric to appear without enforcing any pass/fail condition:

export const options = {
  thresholds: {
    'http_req_waiting{test:my-endpoint}': ['p(99)>=0'], // always true, never fails
  },
};

Reading metrics

The request lifecycle in order: blocked → connecting → tls_handshaking → sending → waiting → receiving

Metric	Reliability	Notes
`http_req_waiting` (TTFB)	High	Measures server processing time only — excludes TLS, TCP, and connection overhead, which are tracked separately
`http_req_tls_handshaking`	Low (from laptop)	CPU-intensive crypto — inflated by client CPU under load; can show 3s p99 when real cost is 58ms
`http_req_blocked`	Medium	Time waiting for a free TCP connection slot — high values mean connection pool exhaustion, not a server problem
`http_req_connecting`	Medium	Actual TCP handshake duration — high values point to network latency or server backlog

Tip

TTFB is the metric to optimise — compare http_req_waiting against a single curl TTFB to confirm whether degradation is server-side or test-infrastructure noise.

High VU count at low RPS = slow server

If you see 70+ VUs accumulating while RPS stays at ~5, the server is slow — VUs are piling up waiting for responses. This is not a client-side issue.

Validating results

Establish a curl baseline before every test run:

curl -o /dev/null -s -w \
  "dns: %{time_namelookup}s\ntls: %{time_appconnect}s\nttfb: %{time_starttransfer}s\ntotal: %{time_total}s\n" \
  https://your-endpoint

Compare k6’s http_req_waiting p50 against this single-request TTFB. If they match, degradation is server-side.

Cross-reference with jaeger traces to identify which internal service is the bottleneck — k6 tells you the endpoint is slow; Jaeger tells you which downstream call is to blame.

EOF errors

EOF errors (reported as request failed: EOF or status code 0) have several distinct causes. The pattern of when they occur is the main diagnostic signal:

Pattern	Likely cause	How to confirm
All VUs at the exact same second	Infrastructure restart — pod, load balancer, or cache flushed existing connections	Check k9s for pod restart count spike at that timestamp
Scattered throughout the test	Server overloaded — TCP backlog full, server actively closing connections	Cross-reference with server CPU/memory; check if error rate tracks VU ramp
Only at stage transitions	Stale keep-alive connection race (see Connection reuse)	Set `noConnectionReuse: true` and rerun
Only during ramp-up	k6 hitting OS file descriptor limits — can’t open new TCP sockets	Check with `ulimit -n`; see k6-os-tuning

Note

A k8s rolling deployment mid-test produces the same pattern as an infrastructure restart — all connections drop at once as the old pod is terminated. Avoid deploying during a test run.

Zhu Yuechen's Tech Notes

Explorer

k6 Metrics Interpretation and Configuration

Before running

Connection reuse

TLS in UAT

Tagging for per-endpoint metrics

Exposing sub-metrics in `handleSummary`

Reading metrics

High VU count at low RPS = slow server

Validating results

EOF errors

See also

Graph View

Table of Contents

Backlinks

Zhu Yuechen's Tech Notes

Explorer

k6 Metrics Interpretation and Configuration

Before running

Connection reuse

TLS in UAT

Tagging for per-endpoint metrics

Exposing sub-metrics in handleSummary

Reading metrics

High VU count at low RPS = slow server

Validating results

EOF errors

See also

Graph View

Table of Contents

Backlinks

Exposing sub-metrics in `handleSummary`