A tool to detect higher-order phenomena in real-world data
EPFL researchers have developed a novel approach to network analysis that allows them to reveal and interpret, for the first time, interactions among multiple variables in data from neuroscience, economics, and epidemiology.
Many phenomena – brain signals, stock prices, or COVID hospitalizations, for example – can be studied using time series data, which are collected as repeated measurements over a given time interval. Most tools for interpreting such data rely on what is known as pairwise statistics, which takes into account the interaction between two variables. But in the real world, events are often dependent on more than just two variables.
“Imagine a conversation in a pub between two people versus three or four, or imagine the interactions between a couple versus a couple with a child; the dynamics change completely the more variables you add,” explains Enrico Amico of the Medical Image Processing Lab (MIP:Lab). Amico is currently an SNSF Ambizione scholar hosted by the lab, which is run jointly between EPFL’s School of Engineering and the University of Geneva Faculty of Medicine.
“As a computational neuroscientist, I know that neuronal activity is coordinated by many different parts of the brain, but when I collect brain data, I am only able to analyze time series data related to pairs of network nodes; I cannot analyze higher-order (or group) interactions,” he says.
Recognizing the need for an improved computational framework for interpreting the complexity of real-world phenomena, Amico and Andrea Santoro of the Neuro-X Institute collaborated with colleagues from Austria’s Central European University and Italy’s CENTAI Institute to create a method for analyzing the higher-order organization of multivariate time series data. Their groundbreaking work has been published in Nature Physics.
“Simply put, we developed a method to detect and infer higher-order information from real data. This is part of an exciting new branch of higher-order mathematics with potential applications in many real-world systems, from neuroscience, finance, and epidemiology to medicine, climate science, ecology – anything, really,” Amico says.
Revealing multivariate interactions with data ‘Polaroids’
The researchers applied their new methodology to three complex real-world datasets on brain activity, stock price fluctuations, and 20th-century epidemics. Their higher-order approach was able to distinguish major features in each regime that could not be detected by standard pairwise statistics. As Amico puts it, each time series measurement acted as a kind of three-dimensional data “Polaroid”, or snapshot of the spatial configuration of the system under study.
For example, in the case of brain activity, the researchers’ multivariate time series method was able to detect oscillations between chaotic and synchronized neural interactions occurring in a brain at rest. Similarly, in the economic example, their method was better able to distinguish between periods of financial stability and crisis. In the epidemiological example, the researchers were even able to detect interactions between the spread of different diseases, like flu and pertussis.
“You might imagine that epidemics spread independently, but with our approach, we were able to classify different diseases with better accuracy, and even see how the spread of one interacted with the spread of another.”
Computing power – and creativity – is key
Amico explains that the reason multivariate computations have not previously been attempted is largely down to recent advances in computing power. While the concept of multivariate time series analysis is simple enough, it is much easier said than done, as the complexity of the mathematical modelling grows exponentially with each added variable.
“We are able to use ancient mathematics in new ways thanks to modern computing power, and access to big data. Computing power is key – and so is creativity. We are creating a new mathematics, and creative thinking is important for tackling these issues.”
So, when it comes to the number of variables that can be analyzed concurrently, is the sky the limit? In theory perhaps, but in practice, no.
“In our paper, we focused on three variables. I think that five would likely reach the limits of today’s maximum computing power,” Amico says.
SNSF COST project 'Mathematical models for interacting dynamics on networks' (grant no. IZCOZ0_198144).
Santoro, A., Battiston, F., Petri, G. et al. Higher-order organization of multivariate time series. Nat. Phys. (2023). https://doi.org/10.1038/s41567-022-01852-0