“I'm inspired by problems in biomedicine”

Tenure Track Assistant Professor Maria Brbic © EPFL 2023

Tenure Track Assistant Professor Maria Brbic © EPFL 2023

Maria Brbic joined EPFL in September 2022 to head the Machine Learning for Biomedical Discovery Laboratory. She develops new machine learning methods in a bid to transcend our prevalent machine learning paradigm to become the driving force of new scientific discoveries.

In the past decade there’s been an explosion in the growth of available genomic data, with an estimated 2 to 40 billion gigabytes now generated every year. This data helps researchers uncover information hidden in our DNA to better understand the way organisms function at their most fundamental levels.

Yet biomedical data is challenging in many ways. On the one hand there is now so much available that some researchers believe it threatens to drown the field in its enormity. On the other, biomedical data are incredibly diverse or heterogeneous and originate from different experimental conditions. This makes integrating and analyzing data very difficult. Additionally, in biomedical applications collecting high-quality labeled datasets is often impossible. All of this breaks conventional ML assumptions of large labeled datasets generated from a single source.

“Biological data is very complex and one problem is that we often can’t get a sufficient number of labeled examples to train machine learning models to recognize some very rare cases such as rare disease states. Additionally, data sets may be collected from different experiments, different patients, different disease states or different tissues. To get new biological insights, we need to combine these heterogenous datasets,” explained Brbic.

“Given the tremendous amount of biological data that is being generated, machine learning becomes essential. However, existing methods we have in our toolbox are not sufficient to solve current problems. We want to be able to bridge this gap between traditional machine learning, which assumes very large amounts of labeled data, and biomedicine which is full of interesting and important problems that need solutions,” she continued.

An assistant professor in the School of Computer and Communications Sciences (IC) with a courtesy appointment in the School of Life Sciences (SV), during her postdoctoral research at Stanford University Brbic worked on discovering novel cell types in large single cell datasets, one of the fundamental computational problems in single cell biology.

“Advances in biotechnology are giving us more and more complex biological data and we need new computational methods to be able to make sense of these and to give meaning to them. I think that both fields really need each other. Machine learning needs these insights that come from analyzing heterogeneous data sets to develop generalizable ML methods for real-world data and biology needs machine learning to understand fundamental processes in biology,” said Brbic.

At EPFL, Brbic’s lab works on problems that are inspired by single cell genomics data. “Single cell genomics is a revolutionary technology that allows us to measure each individual cell in the human body, offering the potential to transform biology and medicine. There's huge excitement about it because previous technologies allowed us to measure only groups of cells, not individual cells. For the first time we have technologies to create a complete cellular makeup of the human body and understand what goes wrong on the cellular level in disease states.”

“I want to design machine learning methods that are able to discover novel things in the data, going beyond assigning something to a category it has seen before. I want to develop ML that has the ability to generalize across heterogenous datasets and allows us to integrate and jointly analyze these datasets.”

In her interactions within the field, Brbic sees that most biologists are embracing machine learning methods and are excited to collaborate with ML researchers and experts.

“I'm trained in computer science but I've really enjoyed collaborating with biologists and medical researchers. Our skills are complimentary and now there is a real need for machine learning people to be focused on biology! We need to find a way to bring these two communities together: convince machine learning researchers that these problems in biology are important and interesting and that, when we bridge the gap between the two worlds, together we can achieve amazing things.”


Author: Tanya Petersen

Source: Life Sciences | SV

This content is distributed under a Creative Commons CC BY-SA 4.0 license. You may freely reproduce the text, videos and images it contains, provided that you indicate the author’s name and place no restrictions on the subsequent use of the content. If you would like to reproduce an illustration that does not contain the CC BY-SA notice, you must obtain approval from the author.