Towards effective assessment of steady state performance in java software: are we there yet?

Status:: 🟩
Links:: Warmup of Java applications Java Microbenchmark Harness

Metadata

Authors:: Traini, Luca; Cortellessa, Vittorio; Di Pompeo, Daniele; Tucci, Michele
Title:: Towards effective assessment of steady state performance in java software: are we there yet?
Publication Title:: "Empirical Software Engineering"
Date:: 2022
URL:: https://doi.org/10.1007/s10664-022-10247-x
DOI:: 10.1007/s10664-022-10247-x

Notes & Annotations

Color-coded highlighting system used for annotations

📑 Annotations (imported on 2025-04-26#21:22:41)

traini.etal.2022.effectiveassessmentsteadystateperformancejavasoftware (pg. 47)

On the basis of our results, our practical suggestion is to never execute a benchmark for less than 5 s (and less than 50 invocations) before starting to collect measurements. When time does not represent a major concern, warmup should last for at least 30 s of continuous benchmark execution, and no less than 300 invocations.

traini.etal.2022.effectiveassessmentsteadystateperformancejavasoftware (pg. 48)

The results for RQ3 show that developer static configurations fail to accurately estimate the end of the warmup phase, often with a non-trivial estimation error (median: 28 s). Developers tend to overestimate warmup time more frequently than underestimating it (48% vs 32%). Nonetheless, both of these kinds of estimation errors produce relevant (though diverse) side effects. For example, we showed that overestimation produces severe time wastes (median: 33 s), thereby hampering the adoption of benchmarks for continuous performance assessment. On the other hand, underestimation often leads to performance measurements that significantly deviate from those collected in the steady state (median 7%), thus leading to poor results quality and potentially wrong judgements.