Recommendations for performing energy measurements
This list of recommendations is work in progressβ¦
Use an appropriate tool for what you want to measure
See Measure energy consumption of software#Overview of Tools.
Observe idle consumption
Itβs important that the energy consumption of the machine in idle mode (baseline) is the same between runs, so it doesnβt influence the results.
Therefore, always measure the consumption in idle mode before performing your actual measurement.
Use percentages instead of raw values
To compare results between measurements, use percentages instead of raw values.
Estimate network transmission costs
If you have a distributed system, you can't measure the transmission costs of the network communication. However, you can measure the amount of bytes that is transmitted between your services and to external services. In an attributional context you might estimate the energy consumption of it. But do that not in a consequential context!
See also Energy consumption of network communication.
Warming up of applications
In performance benchmarking of Java applications, warming up of the runtime environment is crucial. Depending on your measurement goal and your application under test, you should consider warming up of the application before executing your energy measurement.
For example, if you want to compare the energy consumption of a Java application and an application compiled to machine code, you need to warm up the Java application to make a fair comparison under runtime.
See also Performance Testing#Warmup of Java applications.
Per-Process Measurements
Measuring how much energy an individual process has consumed is not straightforward and poses a number of challenges.
See Measure energy consumption of software per process for more information.
You should ask yourself what the goal of your energy measurement is and whether per-process measurement is necessary at all. For example, the Green Metrics Tool follows the philosophy that all components involved in the execution of a standard usage scenario should be measured to reflect actual use cases of the software.
Measured scenario
Either
- measure a standard usage scenario
- long running test
Workload Type
There are different ways of running a benchmark:
- Fixed-work or fixed-time?
- Closed workload or open workload? (Open vs. Closed Model)
Fixed-work vs. fixed-time
When you want to compare the energy efficiency of two systems, the workload should be fixed (e.g. both systems under test are targeted with the same amount of API requests).
Closed workload vs. open workload model
A closed workload model is better and fairer for comparing the energy efficiency of software systems.
- In a closed model, the amount of completed work is fixed. Every system processes exactly the same number of operations, so the comparison is about how efficiently they complete identical work.
- The utilization rate stays stable, meaning you're not introducing noise from workload fluctuations.
- Even though total duration may vary between systems (fast ones finish sooner, slow ones later), energy efficiency is about total energy per unit of work.
- You can measure metrics like joules per request, energy per transaction, or watts per operation precisely and consistently.
In contrast, with an open workload:
- Arrival rates are fixed but completion rates vary, so a slower system might build up a backlog.
- Utilization and system saturation can vary, which introduces additional variables (queueing, retries, dropped work) that muddy a clean efficiency comparison.
- Different systems may not complete the same actual amount of work during the same time, so energy-to-work ratios become harder to trust.
- The isolation of energy per unit of useful work is only possible, if the all systems deliver the same output quality and quantity, what is not ensured.