Observation

Same k6 script produced different results on two machines: 8 GB laptop vs. 32 GB laptop. The discrepancy pointed to the load generator itself being the bottleneck, not the system under test.

Why this happens

Each k6 VU is a single-threaded JavaScript runtime with its own event loop. When the host runs out of CPU or memory, the event loops can’t process callbacks efficiently — this shows up as artificial latency in results, making the target system look slower than it is.

k6 can handle 30,000–40,000 VUs on a well-resourced machine, but on a constrained host even a few hundred VUs can saturate it.

Warning

Skewed results from a saturated load generator are silent — the test completes and reports numbers, but those numbers reflect the generator’s limits, not the target system’s behaviour.

Signals that the generator is the bottleneck

  • Results differ significantly between machines with different RAM/CPU.
  • CPU utilisation on the test machine hits ~100% during the test.
  • Latency percentiles are suspiciously uniform or plateau in a way that mirrors CPU saturation, not server behaviour.

Guidelines

MetricSafe threshold
Load generator CPU< 80%
Load generator memory< 90%
  • Scale gradually — start with a low VU count and ramp up, watching generator metrics before reading target metrics.
  • Prefer arrival-rate executors (constant-arrival-rate, ramping-arrival-rate) over VU-based executors for high-throughput tests — they give more predictable resource usage.
  • Pre-allocate VUs with preAllocatedVUs rather than relying on maxVUs to scale mid-test, which can cause sudden resource spikes.
  • Run cloud tests on a consistent spec — don’t compare results across machines with different RAM unless you’ve verified the generator was not the bottleneck on both.

See also

  • k6-large-tests — OS tuning, hardware thresholds, script optimisations, and distributed execution
  • k6-metrics-and-config — metric interpretation, connection reuse, tagged sub-metrics, validation workflow