How the EPFL community rescued a PhD thesis

Kevin Jablonka (credit: own) and a Wordcloud of the color terms used in the survey

Kevin Jablonka (credit: own) and a Wordcloud of the color terms used in the survey

EPFL scientists combined citizen science with machine learning to predict colors of millions of crystal structures of metal-organic frameworks, nanoporous materials with multiple applications in numerous fields, from carbon capture to water purification.

In chemistry, it is common to report the color of a crystal once it has been synthesized; a casual perusal of the Cambridge Structural Database (CSD) will reveal not just millions of crystal structures but also the color of some.

But it is difficult to predict the color of a crystal. To address this, scientists led by Berend Smit at EPFL’s School of Basic Sciences developed a machine-learning approach to harvest CSD data for all metal-organic frameworks (MOFs), a class of materials with nano-sized pores that make them useful in numerous technologies like carbon capture, sensing, and water purification. Millions of different MOFs can be synthesized by combining organic linkers with metal nodes. And each combination generates crystals with very different colors that are directly relevant for applications like photocatalysis, lightning, or sensing.

“We were completely stuck,” says Berend Smit. “We discovered that there were many crystals for which it was impossible to uniquely associate a name with a color, especially if the name does not appear on the color tables like the ones on XKCD.” This wasn’t just a research problem, but also a massive hurdle for Kevin Jablonka, a doctoral student with Smit’s lab whose PhD focuses on this project.

So the scientists asked the EPFL community for help. “We asked them to pick a color for a given name,” explains Smit. The call didn’t go unheeded, with over 4000 people responding. “The responses allowed us generate a distribution of colors for a given name and with these distributions we could map the discrete color names in the CSD to numbers and see if our machine-learning model worked.”

If, for example, the model predicted the color of a crystal as “straw yellow”, the scientists compared that color with the distribution of colors that the EPFL community described as “straw yellow”. The overlap of their prediction with this distribution could then quantify the accuracy of their predictions.

But that wasn’t all: the work also offered a new way of reporting colors. “We also realized that we could not improve our machine learning further unless we improved the way colors are reported,” says Smit. “For this, we started a collaboration with Luc Patiny (EPFL) to develop an application of his Electronic Laboratory Notebook (ELN), asking some of the chemists to synthesize some colorful MOFs, and then to take a picture of the crystals together with a color calibration card, and upload this picture in the ELN.”

The color app that they developed recognizes the color calibration card and automatically corrects for differences in lightning or the quality of the camera, and generates an average RGB value of the sample, as well as its standard deviation for quantifying how homogenous the sample is.

“This is a great example of citizen science,” says Berend Smit. “And, to make it completely Open Science, the ELN automatically publishes all the photos and much more on CERN's Zenodo repository, from which the data can be explored with a web browser or downloaded for further analysis.”

Information on the electronic notebook: https://cheminfo.github.io/eln.epfl.ch/

Funding

European Research Council (ERC-Adv)

NCCR-MARVEL

Swiss National Science Foundation

PrISMa Project of the ACT Programme

Swiss Federal Office of Energy (SFOE)

References

Kevin Maik Jablonka, Seyed Mohamad Moosavi, Mehrdad Asgari, Christopher Ireland, Luc Patiny, Berend Smit. A data-driven perspective on the colours of metal–organic frameworks. Chemical Science 28 December 2020. DOI: 10.1039/D0SC05337F