A machine learning system to make data centers more efficient

Darong Huang and David Atienza © Alex Widerski CC BY SA

Darong Huang and David Atienza © Alex Widerski CC BY SA

The CloudProphet system under development in the Embedded Systems Lab aims to reduce the carbon footprint of data centers by using computing resources more efficiently.

Data center resources are used by customers in ways that are not only unpredictable, but lacking in transparency, as data center staff are not permitted to observe the processes being used by their customers. To solve the problem of how to allocate computing resources efficiently, it would appear that a crystal ball is necessary. The crystal ball being developed at EPFL is called CloudProphet.

"Companies like Google and Amazon provide virtual machines for customers," explains David Atienza, head of the Embedded Systems Lab in the School of Engineering. "But these customers do not tell you anything about what they are actually doing, and we are not permitted to look inside. Therefore, the behavior of applications is hard to predict – they are black boxes."

By identifying application processes from the outside, and basing performance prediction only on hardware counter information, CloudProphet learns to anticipate an application's demands on resources. Neural networks multiply in a balletic form of machine learning, building up a picture of the predicted requirements.

"Data centers do have diagnostic tools for the identification and performance prediction of applications, but they can only claim an improvement rate of 18%. We are achieving results that are orders of magnitude above that," explains ESL PhD student Darong Huang. "Our results have just been published in IEEE Transactions on Sustainable Computing. We hope that CloudProphet will pave the way to a more intelligent resource management system for modern data centers, thus reducing their carbon footprint."

Atienza explains that PhD students and postdoctoral fellows are funded by industry to work on these projects, maintaining systems at a distance.

"Industry project managers are invited to give advice on the physical and logistical constraints in place so that we get as close as possible to a real-world application,” he says.

The new EPFL data center, the CCT building (Centrale de Chauffe par Thermopompe), presents another opportunity to apply this technology to the real world. In collaboration with Mario Paolone's Distributed Electrical Systems Lab, the EcoCloud Center, and the EPFL Energy Center, a framework will be set up to put these systems to the test.

"Once we can see up to what extent the carbon data footprint is reduced, we can look at whether the next step is to license the software, or to start a spin-off company,” concludes Atienza.

References

D. Huang, L. Costero, A. Pahlevan, M. Zapater and D. Atienza, "CloudProphet: A Machine Learning-Based Performance Prediction for Public Clouds," in IEEE Transactions on Sustainable Computing, doi: 10.1109/TSUSC.2024.3359325


Author: John Maxwell

Source: School of Engineering | STI

This content is distributed under a Creative Commons CC BY-SA 4.0 license. You may freely reproduce the text, videos and images it contains, provided that you indicate the author’s name and place no restrictions on the subsequent use of the content. If you would like to reproduce an illustration that does not contain the CC BY-SA notice, you must obtain approval from the author.