RAPL

What?

khan.etal.2018.raplactionexperiences (pg. 1)

To improve energy efficiency and comply with the power budgets, it is important to be able to measure the power consumption of cloud computing servers. Intel’s Running Average Power Limit (RAPL) interface is a powerful tool for this purpose. RAPL provides power limiting features and accurate energy readings for CPUs and DRAM, which are easily accessible through different interfaces on large distributed computing systems. Since its introduction, RAPL has been used extensively in power measurement and modeling.

RAPL vs. usage of power meters

khan.etal.2018.raplactionexperiences (pg. 24)

Our overall study suggests that RAPL has evolved toward a better energy measurement tool since its introduction in Sandybridge, and it has appeared to be a useful and efficient alternative for manually instrumented complex power monitors. With the Haswell architecture, RAPL has improved considerably, its power readings now closely match plug power readings and it has now introduced the new measurement domain PSys and improved the power performance in Skylake.

Intel vs. AMD Implementation

Intel only supports CPU-socket-level measurement. AMD instead does offer per-physical-core energy counters. Source: @Qiao.etal.2024.EnergyawareProcessScheduling#^j0eaxc

RAPL survival guide

The road to Scaphandre v1.0 - Challenges and improvements to come on IT energy consumption evaluation | CNCF TAG Environmental Sustainability

According to my review of the literature, RAPL is accurate, starting from it’s second generation (post-Broadwell), but it is not covering a complete perimeter. As you have seen in the schema, β€œPackage”, or β€œPkg”, only includes the CPU (Core), the Ram (DRAM) and integrated GPU (Uncore) power. Comparing Pkg to an IPMI/DCMI-based or a SmartPDU-based evaluation will be likely disappointing if you look at energy consumed on a decent time-period. They are supposed to be closer as you look for times where the CPU is most active, and more different as the machine is close to idle.

Last but not least, while using RAPL metrics could feel empowering as you have a pretty precise view on your machine’s components energy consumption, there is a catch. It should be said that this consumption profile will likely to be very specific to your hardware and configuration. The runtime context of a given software or service is also essential if you want to assess its energy consumption. Depending on its runtime, whether it’s running natively, in a virtual machine (hypervisor configuration will also be important then), or in a container and depending on the other services running on the physical host and their behavior, the evaluation may be more or less impacted. Moreover, from one machine to another, even if the hardware is the same, you may have a closer look to (at least): hyper threading, turbo boost, energy efficiency mode, …

Check if available

Kernel version β‰₯ 5.3:

sudo modprobe intel_rapl_msr

Kernels version < 5.3:

sudo modprobe intel_rapl

Or use perf.

πŸ”— References

Software Development Lifecycle for Energy Efficiency (Georgiou, Rizou, Spinellis 2020)

RAPL in Action (Khan, Hirki, Niemi, Nurminen, Ou 2018)

Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., & Rajwan, D. (2012). Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro, 32(2), 20–27. https://doi.org/10.1109/MM.2012.12

Hackenberg, D., Schone, R., Ilsche, T., Molka, D., Schuchart, J., & Geyer, R. (2015). An Energy Efficiency Feature Survey of the Intel Haswell Processor. 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 896–904. https://doi.org/10.1109/IPDPSW.2015.70
RAPL, SGX and energy filtering - Influences on power consumption | green-coding.berlin by Arne Tarara