Development and evaluation of a reference measurement model for assessing the resource and energy efficiency of software products and components—green software measurement model (GSMM)

Status:: 🟩
Links:: Measure energy consumption of software

Metadata

Authors:: Guldner, Achim; Bender, Rabea; Calero, Coral; Fernando, Giovanni S.; Funke, Markus; Gröger, Jens; Hilty, Lorenz M.; Hörnschemeyer, Julian; Hoffmann, Geerd-Dietger; Junger, Dennis; Kennes, Tom; Kreten, Sandro; Lago, Patricia; Mai, Franziska; Malavolta, Ivano; Murach, Julien; Obergöker, Kira; Schmidt, Benno; Tarara, Arne; De Veaugh-Geiss, Joseph P.; Weber, Sebastian; Westing, Max; Wohlgemuth, Volker; Naumann, Stefan
Title:: Development and evaluation of a reference measurement model for assessing the resource and energy efficiency of software products and components—green software measurement model (GSMM)
Publication Title:: "Future Generation Computer Systems"
Date:: 2024
URL:: https://www.sciencedirect.com/science/article/pii/S0167739X24000384
DOI:: 10.1016/j.future.2024.01.033

Bibliography

Guldner, A., Bender, R., Calero, C., Fernando, G. S., Funke, M., Gröger, J., Hilty, L. M., Hörnschemeyer, J., Hoffmann, G.-D., Junger, D., Kennes, T., Kreten, S., Lago, P., Mai, F., Malavolta, I., Murach, J., Obergöker, K., Schmidt, B., Tarara, A., … Naumann, S. (2024). Development and evaluation of a reference measurement model for assessing the resource and energy efficiency of software products and components—Green software measurement model (GSMM). Future Generation Computer Systems, 155, 402–418. https://doi.org/10.1016/j.future.2024.01.033

Links

Abstract

In the past decade, research on measuring and assessing the environmental impact of software has gained significant momentum in science and industry. However, due to the large number of research groups, measurement setups, procedure models, tools, and general novelty of the research area, a comprehensive research framework has yet to be created. The literature documents several approaches from researchers and practitioners who have developed individual methods and models, along with more general ideas like the integration of software sustainability in the context of the UN Sustainable Development Goals, or science communication approaches to make the resource cost of software transparent to society. However, a reference measurement model for the energy and resource consumption of software is still missing. In this article, we jointly develop the Green Software Measurement Model (GSMM), in which we bring together the core ideas of the measurement models, setups, and methods of over 10 research groups in four countries who have done pioneering work in assessing the environmental impact of software. We briefly describe the different methods and models used by these research groups, derive the components of the GSMM from them, and then we discuss and evaluate the resulting reference model. By categorizing the existing measurement models and procedures and by providing guidelines for assimilating and tailoring existing methods, we expect this work to aid new researchers and practitioners who want to conduct measurements for their individual use cases.

Notes & Annotations

Color-coded highlighting system used for annotations

📑 Annotations (imported on 2024-03-28#18:04:32)

guldner.etal.2024.developmentevaluationreference (pg. 2)

Chowdhury et al. [20,21] proposed a model which is based on dynamic traces of system calls and CPU utilization in order to estimate the energy consumption of software.

[20] S. Chowdhury, A. Hindle, Greenoracle: estimating software energy consumption with energy measurement corpora, in: Proceedings of the 13th International Conference on Mining Software Repositories, 2016, pp. 49–60, http://dx.doi. org/10.1145/2901739.2901763.
[21] S. Chowdhury, S. Borle, S. Romansky, A. Hindle, Greenscaler: training software energy models with automatic test generation, Empir. Softw. Eng. 24 (2019) 1573–7616, http://dx.doi.org/10.1007/s10664-018-9640-7.

guldner.etal.2024.developmentevaluationreference (pg. 2)

Moreover, the influence of software architecture on energy consumption has been addressed by Guamán and Pérez [22] and Cabot et al. [23].

[22] D. Guamán, J. Pérez, Supporting sustainability and technical debt-driven design decisions in software architectures, in: Information Systems Development: Crossing Boundaries Between Development and Operations (DevOps) in Information Systems, AIS, 2021, p. na.
[23] J. Cabot, R. Capilla, C. Carrillo, H. Muccini, B. Penzenstadler, Measuring systems and architectures: A sustainability perspective, IEEE Softw. 36 (03) (2019) 98–100, http://dx.doi.org/10.1109/MS.2019.2897833.

guldner.etal.2024.developmentevaluationreference (pg. 2)

Currently, there is no consensus on measurement setups, methods, or techniques for data analysis. With each researcher applying their own methods, often with little to no documentation or publicly available data (e. g., in the form of replication packages), it is difficult and sometimes outright impossible to check or compare results obtained across studies, to replicate analyses, or to re-use data. To solve this problem, we propose establishing a reference model for measurement and analysis methods to assess the resource and energy efficiency of software.

guldner.etal.2024.developmentevaluationreference (pg. 3)

Ournani [41] and Schade [42] also provide an overview of software energy measurement tools, both software-based and hardware-based, and Jay et al. [36] compare a set of softwarebased measurement tools and investigate how measurements obtained through them correlate to those taken with an external power meter.

[36] M. Jay, V. Ostapenco, L. Lefèvre, D. Trystram, A.C. Orgerie, B. Fichel, An experimental comparison of software-based power meters: focus on CPU and GPU, in: CCGrid 2023 - 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, IEEE, Bangalore, India, 2023, pp. 1–13, URL https://inria.hal.science/hal-04030223.
[41] Z. Ournani, Software Eco-Design: Investigating and Reducing the Energy Consumption of Software (Ph.D. thesis), University of Lille, 2021, URL https://tel.archives-ouvertes.fr/tel-03429300.
[42] T. Schade, Greencoding-measuring-tools, 2023, URL http://github.com/schaDev/GreenCoding-measuring-tools. Accessed 21 November 2023.

guldner.etal.2024.developmentevaluationreference (pg. 3)

The Green Metrics Tool (see Section 6.4) is a versatile industry energy measurement tool from Green Coding Berlin. It isolates applications in containers for precise measurement of power, network, disk, and memory consumption, and supports various metrics. The tool is architecture-agnostic, it can be used for GUI applications, and it separates benchmark runs into distinct life cycle steps. It offers a detailed dashboard and an API for data analysis, focusing on machine-dependent factors to understand energy consumption impacts.

guldner.etal.2024.developmentevaluationreference (pg. 4)

The Cloud Energy Usage Estimation Model (see Section 6.5) devel- oped by Green Coding Berlin is a machine learning model that estimates energy usage in environments where controlled measurements are not feasible. Based on research by Rteil et al. [51], it uses the SPECPower dataset to create an XGBoost model for estimating the AC power draw of servers. The model is particularly useful in settings (e. g., cloud) where detailed CPU information is not available.

[51] N. Rteil, R. Bashroush, R. Kenny, A. Wynne, Interact: IT infrastructure energy and cost analyzer tool for data centers, Sustain. Comput.: Inform. Syst. 33 (2022) http://dx.doi.org/10.1016/j.suscom.2021.100618.

guldner.etal.2024.developmentevaluationreference (image) (pg. 4)

Fig. 1. Components of the GSMM.

guldner.etal.2024.developmentevaluationreference (pg. 4)

Common measurement goals include the comparison of the software entity with itself over the development process, e. g., within a CI/CD pipeline, between releases, or when introducing new features. Furthermore, comparisons between different implementations, libraries, configurations, etc., and between different products performing similar tasks (e. g., within software product groups like browsers, media players, databases) are possible—and it is even feasible to compare individual functionalities or software features across product groups (e. g., there are many software products which provide a feature to ‘‘edit text’’).

guldner.etal.2024.developmentevaluationreference (image) (pg. 5)

Overview of examples of relevant metrics.

a External power meters or PDUs, e. g., from Janitza https://www.janitza.com/energy-and-power-quality-measurement-products.html [2023-11-06], GUDE https://gude-systems. com/en/cat/power-distribution-units/ [2023-11-06], internal power loggers like RAPL and nvidia-smi.
b Software-induced metrics calculated by subtracting the according baseline measurements from the scenario measurements.
c Resource loggers include collectl (https://collectl.sourceforge.net/ [2023-11-06]), Windows performance monitor (https://techcommunity.microsoft.com/t5/ask-the-performance- team/windows-performance-monitor-overview/ba-p/375481 [2023-11-06]), wireshark (https://www.wireshark.org/ [2023-11-06]), and nvidia-smi.

guldner.etal.2024.developmentevaluationreference (pg. 5)

Regarding energy efficiency metrics, it is necessary to define ‘‘useful work’’, as described, e. g., in Johann et al. [55]. This, of course, depends strongly on the software product and is not always feasible to define.

[55] T. Johann, M. Dick, S. Naumann, E. Kern, How to measure energy-efficiency of software: Metrics and measurement results, in: 2012 1st International Workshop on Green and Sustainable Software, GREENS 2012 - Proceedings, 2012, pp. 51–54, http://dx.doi.org/10.1109/GREENS.2012.6224256.

guldner.etal.2024.developmentevaluationreference (pg. 5)

Examples from the methods are the number of created, read, changed, deleted, or transmitted data points, the number of executed operations, or benchmarks. The benefit of these metrics is that they make different implementations directly comparable. If the items cannot be easily defined, e. g., when measuring a complete software product like a word processor, a possibility to compare the efficiency of one software over the other is to make their outcomes as equal as possible (e. g., create the same PDF document with the word processors) and then perform, for instance, a t-test as described in Kern et al. [46] to test if the means of the samples are different and thus determine the more efficient software.

[46] E. Kern, L.M. Hilty, A. Guldner, Y.V. Maksimov, A. Filler, J. Gröger, S. Naumann, Sustainable software products — towards assessment criteria for resource and energy efficiency, Future Gener. Comput. Syst. 86 (2018) 199–210, http://dx. doi.org/10.1016/j.future.2018.02.044.

guldner.etal.2024.developmentevaluationreference (image) (pg. 6)