Recommendations for performing energy measurements

WIP

This list of recommendations is work in progress…

Recommendations by Green Coding Solutions

Use an appropriate tool for what you want to measure

See Measure energy consumption of software#Overview of Tools.

Observe idle consumption

It’s important that the energy consumption of the machine in idle mode (baseline) is the same between runs, so it doesn’t influence the results.
Therefore, always measure the consumption in idle mode before performing your actual measurement.

Use percentages instead of raw values

To compare results between measurements, use percentages instead of raw values.

noureddine.etal.2015.monitoringenergyhotspots (pg. 26)

These results outline the importance of using percentages when comparing energy consuming of software code. This is mainly due to the different hardware that machines use, thus consuming different amount of energy while still keeping similar energy trends and distribution in software.

noureddine.etal.2015.monitoringenergyhotspots (pg. 31)

The goal of our approach is to observe trends in energy consumption and profile applications to detect energy hotspots. Therefore, we argue that using percentages when comparing energy consumption of methods and classes is more useful and representative than raw values. Our approach is thus useful in profiling applications in order to find the origin of energy leaks. Developers can then provide hotfixes for the application in order to reduce its energy footprint.

Estimate network transmission costs

If you have a distributed system, you can't measure the transmission costs of the network communication. However, you can measure the amount of bytes that is transmitted between your services and to external services. In an attributional context you might estimate the energy consumption of it. But do that not in a consequential context!

Warming up of applications

In performance benchmarking of Java applications, warming up of the runtime environment is crucial. Depending on your measurement goal and your application under test, you should consider warming up of the application before executing your energy measurement.

For example, if you want to compare the energy consumption of a Java application and an application compiled to machine code, you need to warm up the Java application to make a fair comparison under runtime.

Per-Process Measurements

Measuring how much energy an individual process has consumed is not straightforward and poses a number of challenges.
See Measure energy consumption of software per process for more information.

You should ask yourself what the goal of your energy measurement is and whether per-process measurement is necessary at all. For example, the Green Metrics Tool follows the philosophy that all components involved in the execution of a standard usage scenario should be measured to reflect actual use cases of the software.

Measured scenario

Either

measure a standard usage scenario
long running test

Workload Type

There are different ways of running a benchmark:

Fixed-work or fixed-time?
Closed workload or open workload? (Open vs. Closed Model)

Fixed-work vs. fixed-time

When you want to compare the energy efficiency of two systems, the workload should be fixed (e.g. both systems under test are targeted with the same amount of API requests).

Closed workload vs. open workload model

A closed workload model is better and fairer for comparing the energy efficiency of software systems.

In a closed model, the amount of completed work is fixed. Every system processes exactly the same number of operations, so the comparison is about how efficiently they complete identical work.
The utilization rate stays stable, meaning you're not introducing noise from workload fluctuations.
Even though total duration may vary between systems (fast ones finish sooner, slow ones later), energy efficiency is about total energy per unit of work.
You can measure metrics like joules per request, energy per transaction, or watts per operation precisely and consistently.

In contrast, with an open workload:

Arrival rates are fixed but completion rates vary, so a slower system might build up a backlog.
Utilization and system saturation can vary, which introduces additional variables (queueing, retries, dropped work) that muddy a clean efficiency comparison.
Different systems may not complete the same actual amount of work during the same time, so energy-to-work ratios become harder to trust.
The isolation of energy per unit of useful work is only possible, if the all systems deliver the same output quality and quantity, what is not ensured.