A Big Data tool begins new era for biology and personalized medicine
29.11.17 - Researchers from EPFL have developed a novel series of systems genetics tools to identify new links between genes and phenotypes. The work, a hallmark of EPFL’s endeavors into the advancement of open science, brings biology to the cloud and sets the stage for the development of precision medicine. The study is published in Cell Systems.
Most complex diseases, such as obesity, longevity, and diabetes, are largely influenced by genetic factors. But at the same time, they are also modulated by environmental stimuli, such as diet and physical activity. This interaction between the environment and genetic makeup makes every human unique, and underpins the need for personalized medicine.
This custom-built medical approach is founded on the premise that predicting, diagnosing, and treating disease differ between individuals, and depend on their personal genetic variations integrated within their specific environments. But to make personalized medicine a reality, we first need a deeper understanding of the interaction between genetics, environment, and disease.
To do this, hundreds of research groups have worked on a mouse population, called the BXD. This mouse population has been used as a model to study the genetic basis of phenotypic traits and diseases. Over several decades, thousands of sets of phenotype data from BXD mice have been collected into databases, ranging from coat color to lifespan.
But researchers have also collected very large molecular datasets linked to gene expression in different organs of BXD mice, such as levels of mRNA, proteins, and metabolites. By now, the BXD community has gathered around 300 million phenotype data points from these animals, generating by far the largest coherent “phenome” (a set of all phenotypes expressed in an organism) for any animal experimental cohort. In short, the BXD phenome is a perfect resource for modeling human populations for genetic studies.
“These rich and large phenome data remain, however, largely unexploited, as they are difficult to access and the tools to analyze them require advanced skills,” says Johan Auwerx, whose lab at EPFL led the study with colleagues from Germany, Netherlands, and the US, as well as EPFL professors Kristina Schoonjans and Stephan Morgenthaler.
The scientists addressed the problem by organizing all this knowledge in a cloud-based data warehouse that integrates all 300 million data points collected in the BXD mouse population. The researchers developed an easy-to-use toolkit, which can be used to integrate the different layers of “omics” data from the BXD mouse population.
The toolkit is already online at systems-genetics.org and is expected to significantly facilitate the discovery of gene-phenotype and gene-gene links. Scientists have already used the resource to identify hundreds of thousands of gene-phenotypes associations, many of which are completely new. “One striking example is the link between ribosomal protein Rpl26 and body weight,” says Hao Li, first author of an article published today in Cell Systems. “Mice inheriting this gene from one parent are on average 10 g heavier than their cousins.”
The toolkit can be applied to any population with available multi-layered data of distinct nature (referred to as “multi-omics data”). This enables the re-use of existing data to make new discoveries. “Findings in populations are generally robust and translate well across cohorts and species – making the data directly relevant to human biology,” says Li.
“We have deposited all the data and the toolkit in a public platform, which will help researchers identify and validate the functions of their genes of interest,” says Auwerx, placing the project within the context of EPFL’s ongoing efforts to advance open science. “This resource really is one of the first efforts to bring biology into the cloud, and lays the cornerstone of a new era of biology. Medical doctors will soon be able to use resources of similar nature to personalize treatments for their patients.”
- EPFL Institute of Mathematics (Chair of Applied Statistics)
- EPFL Interfaculty Institute of Bioengineering (Laboratory of Metabolic Signaling)
- Humboldt-Universität zu Berlin
- Swiss Institute of Bioinformatics
- University Hospital of Lausanne
- University of Colorado
- University Medical Center Utrecht
- University of Tennessee
- China Scholarship Council
- École Polytechnique Fédérale de Lausanne (EPFL)
- Swiss National Science Foundation
- Velux Stiftung
- Kristian Gerhard Jebsen Foundation
- Swiss Initiative for Systems Biology (AgingX program)
- National Institutes of Health (NIH)
Hao Li, Xu Wang, Daria Rukina, Qingyao Huang, Tao Lin, Vincenzo Sorrentino, Hongbo Zhang, Maroun Bou Sleiman, Danny Arends, Aaron McDaid, Peiling Luan, Naveed Ziari, Laura A. Velázquez-Villegas, Karim Gariani, Zoltan Kutalik, Kristina Schoonjans, Richard A. Radcliffe, Pjotr Prins, Stephan Morgenthaler, Robert W. Williams, Johan Auwerx. An integrated systems genetics and omics toolkit to probe gene function. Cell Systems 29 November 2107. DOI: 10.1016/j.cels.2017.10.016