Blue Brain Nexus - brain scale data and knowledge management

© 2023 EPFL

© 2023 EPFL

Modern day science typically involves iterative cycles of data discovery, acquisition, preparation, analysis, model building and validation which often lead to knowledge discovery as well as knowledge sharing and dissemination. Given the ambitious goal of digitally building and simulating the mouse brain, the EPFL Blue Brain Project required a data and knowledge management system that could not only handle the enormous diversity and evolution of data at the scale of the whole brain, but also track the data’s provenance to ensure quality, reproducibility and accurate attribution throughout these iterative cycles. Accordingly, Blue Brain built and open sourced Blue Brain Nexus as a key technology for organizing brain tissue data and models and as a complementary approach to classical neuroinformatics tools. Blue Brain Nexus has now grown into an ecosystem of secured, domain-agnostic, scalable and interoperable tools with a growing community of adopters including large international organizations, across use cases that include neuroscience, psychiatry and open linked data.

“Building a digital copy of brain tissue is a complex problem and neuroscience is a very big, Big Data challenge,” explains Prof. Henry Markram, Founder and Director of the Blue Brain Project. The first step in Blue Brain’s data-driven research approach is the acquisition and organization of data from many sources including neuroscience experiments, hundreds of thousands of published scientific papers, and from brain databases all around the world. All of this data, acquired using machine learning, data science and knowledge engineering techniques, describes the structural and functional organization of the brain at various levels – from synapses and subcellular components to individual neurons, circuits, and entire brain regions. Next, multi-disciplinary teams of scientists and engineers analyze and integrate the collected data and infer the necessary missing information before synthesizing the data from which detailed brain tissue models are built. In the process, new knowledge is generated, shared and disseminated.

© 2023 EPFL

In a paper published in Semantic Web, the authors discuss how Blue Brain Nexus provides the foundation to address some of the key challenges that appear along the iterative data-driven research cycle.

Blue Brain Nexus is built on an open, scalable, extensible, standards-based and interoperable technology stack with a knowledge graph at its heart. In knowledge graphs, entities of a domain of interest (e.g cell, cell type, neuron morphology, electrophysiology recordings, cell density, a brain tissue model or a workflow execution) can be described with high quality metadata and validated using ontologies and schemas based on web open standards. The entities can be semantically linked with other ones to form a graph of data which represents factual knowledge about the chosen domain.

Within Blue Brain, this is done within the spatial context of a brain atlas, e.g a brain tissue model derived from an experimental dataset about a cell of a given type and located in a specific region of the mouse brain.

“The description of data with rich and high quality metadata, as well as the ability to validate that metadata, are key to Blue Brain Nexus’ capability to assess the data’s quality, and the trustworthiness and reliability of a given data source. Simultaneously, it increases the data’s overall utility and longevity for downstream tasks and pipelines. Finally, the resulting semantic and rich graph structure can be leveraged to build inference rules in order to infer missing data and uncover hidden patterns.” explains Mohameth François Sy, Data and Knowledge Engineering Section Manager at Blue Brain.

Consistently managing data, metadata, ontologies, schemas and inference rules together while simultaneously tracking their provenance – the scientific context of their generation (who, when, where, how and why) – and making explicit the sources of data that drive the evolution of a knowledge framework, can facilitate the critical review, assessment and credit assignment for new discoveries.

“As part of our team science approach, Blue Brain’s scientists alongside our data and knowledge engineers use the Nexus ecosystem to manage a huge amount of neuroscience data. Using Blue Brain Nexus’ customizable web-based data studios and plugins, they search, explore and visualize neuroscience data (e.g neuron morphologies, electrophysiology recordings, and brain atlases) to derive new data and build models,” discloses Samuel Kerrien, Blue Brain Neuroinformatics Software Engineering Section Manager.

Using Nexus data studios’ creation capabilities, Blue Brain has published and shared many neuroscience data and models with the research community; these are accessible through web applications for users but also as knowledge graphs programmatically accessible by machines. For example, the recently-released Thalamoreticular Microcircuitry data studio allows users to browse, visualize and download the experimental data (including neuron morphologies 3D and electrophysiology recordings interactive visualizations) used to build digital reconstructions (including single cell model and microcircuit reconstruction) as well as network simulations.

© 2023 EPFL

Nexus Fusion based data studio accompanying BBP Thalamoreticular Microcircuitry paper showing: I) an abstract, II) experimental dataset and models, III) 3D Neuronal Morphology viewer plugin and Interactive Electrophysiology recordings viewer both along with metadata.

Solving large-scale data integration and dissemination challenges globally

“While Blue Brain Project’s use cases are the main driver behind creating Blue Brain Nexus, we made it open source as we recognized that others could use and profit from it too,” explains Prof. Sean Hill, co-director of the Blue Brain Project and the Scientific Director of the Krembil Centre for Neuroinformatics, an interdisciplinary, computationally focused research institute, located within the Centre for Addiction and Mental Health (CAMH), the largest mental health hospital in Canada. Early on, Hill saw that Nexus could solve the problem of linking the clinical with the research data at CAMH and is responsible for its successful adoption at the Centre. This allowed for complex queries across data domains and brain hierarchies, integrating real-time clinical record information towards a rich knowledge commons to apply in order to accelerate discovery and care.

By open sourcing Nexus, the Blue Brain Project also supports the FAIR guiding principles for scientific data and stewardship in the Neuroscience and broader scientific community by enabling heterogeneous data generated from different contexts to be made Findable, Accessible, Interoperable and Reusable. Today, Blue Brain Nexus is deployed to solve large-scale data integration and dissemination challenges in computational modeling, neuroscience, psychiatry and open linked data. These adopters include the European Human Brain Project via the EBRAINS project, CAMH and the Swiss Research Data Connectome Project.

Find out more about Blue Brain Nexus - https://bluebrainnexus.io

References

Sy, M. F., Roman, B., Kerrien, S., Mendez, D. M., Genet, H., Wajerowicz, W., Dupont, M., Lavriushev, I., Machon, J., Pirman, K., Neela Mana, D., Stafeeva, N., Kaufmann, A.-K., Lu, H., Lurie, J., Fonta, P.-A., Martinez, A. G. R., Ulbrich, A. D., Lindqvist, C., Jimenez, S., Rotenberg, D., Markram, H., Hill, S. L. (2022). Blue Brain Nexus: An open, secure, scalable system for knowledge graph management and data-driven science. Semantic Web, 1–31. https://doi.org/10.3233/SW-222974