A Machine to Learn Them All

Separating active and inactive protein ligands, and predicting the Si(111)-7x7 reconstruction, by machine learning © COSMO / 2017 EPFL

Separating active and inactive protein ligands, and predicting the Si(111)-7x7 reconstruction, by machine learning © COSMO / 2017 EPFL

Researchers at the Laboratory of Computational Science and Modelling at EPFL have developed a machine-learning model that may greatly accelerate drug discovery by accurately predicting the interactions between a protein and a drug molecule using only a handful of reference experiments or simulations. The algorithm, which can also tackle materials science problems such as modelling the structure of silicon surfaces, promises to revolutionise materials and chemical modelling, and gives insight into the nature of intermolecular forces.

Researchers have designed an algorithm that uses just a few training references to predict whether or not a candidate drug molecule will bind to a target protein with 99% accuracy. This is equivalent to predicting with near-certainty the activity of hundreds of compounds after actually running only a couple dozen tests and could accelerate the screening of candidate molecules. The method is so precise that the single case in which it failed turned out to be due to a clerical error in the reference database.

The approach, developed by scientists from EPFL’s Laboratory of Computational Science and Modelling in collaboration with scientists at the University of Cambridge, the University of Warwick, the UK Science and Technology Facilities Council and the U.S. Naval Research Laboratory, can also identify which parts of the molecules are crucial for the interaction.

Researchers showed that the design of this algorithm, which combines local information from the neighborhood of each atom in a structure, makes it applicable across many different classes of chemical, materials science, and biochemical problems. The approach is remarkably successful in predicting the stability of organic molecules as well as the subtle properties of silicon surfaces that are crucial for microelectronic applications, and does so at a fraction of the computational effort involved in a quantum mechanical calculation.

The model at the heart of the machine-learning approach also provides insight into the range and energy scale of intermolecular forces and allows us to understand how various electronic-structure methods disagree in the description of different kinds of interactions. That is, machine learning not only changes the way we calculate the properties of materials and molecules, it also teaches us something about chemistry and materials science.

The research, which has been published on Science Advances and received funding from the ERC Starting Grant HBMAP and the NCCR MARVEL, illustrates how chemical and materials discovery is now benefitting from the Machine Learning and Artificial Intelligence approaches that already underlie disruptive technologies from self-driving cars to go-playing bots and automated medical diagnostics. New algorithms allow us to predict the behavior of new materials and molecules with great accuracy and little computational effort, saving time and money in the process.