EPFL's Predikon: predicting voting results with machine learning

Switzerland votes © 2020 iStock

Switzerland votes © 2020 iStock

On September 27 Switzerland votes for the first time since the COVID-19 pandemic began, including on a contentious initiative to end the free movement of workers with the European Union. Predikon will be predicting the final outcome within minutes of the release of the first partial municipal results from the Swiss Federal Statistical Office.

In the past half-decade, many pre-vote polls and initial vote counting around the world have turned out to be unreliable. Perhaps the two most notorious recent examples are the Brexit ‘yes’ vote in the UK and the election of Donald Trump as president of the United States. In both cases, not only were the majority of pre-vote polls wrong but many of us went to bed with initial counting showing that the UK would remain in the EU and that Hillary Clinton would be the 45th U.S. president. The next morning’s results were confounding. 

For the past six years a group of researchers at EPFL’s Information and Network Dynamics Lab (INDY), part of the School of Computer and Communication Sciences, have been using probabilistic modelling, large-scale data analytics and machine learning to develop Predikon, in a bid to better predict final election and referendum results from partial, early ballot counts. In August, they presented a paper outlining their statistical method and results to the Knowledge Discovery and Data Mining Conference (ACM KDD).

With an obvious initial focus on Switzerland, PhD student Victor Kristof and Master student Alexander Immer (now a PhD student at ETH Zurich), led by Professors Matthias Grossglauser and Patrick Thiran, have been analysing voting data, searching for structure in the voting behaviour of the country’s 26 cantons and around 2200 municipalities. “We require historical data, of course, to learn something interesting, and in Switzerland we have a lot of data thanks to direct democracy. We have now been able to input the results of more than 300 different votes going back to 1981 for these 2200 municipalities. In this endeavour, the Swiss Federal Statistical Office has been very transparent and helpful in understanding their data,” says Victor. 

Whilst municipalities are different, they are also not completely independent. The researchers have developed an algorithm that learns how voting biases (such as cultural, demographic, and historical) influence poll outcomes, and have used this to make accurate predictions from partial counts. In a country with four official languages (German, French, Italian and Romansch), the first version of Predikon, in 2014, indeed found that a municipality’s language did influence voting behaviour. Since then the tool’s algorithms have evolved, and for the four most recent votes, it was able to predict the outcome using very early results from a small number of municipalities with a margin of error of about 1%. “We take the past national vote results of every municipality and develop a model of how they relate to each other. If we compute all those results the average will vary quite a lot but our algorithm is able to correct various linguistic, cultural and demographic biases. That allows us, with a few partial results and in whatever order they come in, to make a better prediction than just taking the average, as typically reported by news outlets,” explains Victor.

In the last national vote held on February 9 before the coronavirus crisis hit, Predikon predicted the results of both the Amendment to the penal code and the military penal code (discrimination and incitement to hatred because of sexual orientation) and the Popular initiative for more affordable housing to almost pinpoint accuracy, within minutes of the first partial results. On September 27, we expect that Predikon will tell us very early on whether the free movement of people between Switzerland and the EU will become restricted.

© 2020 EPFL

The underlying model is general and has been successfully applied to predict, in addition to Swiss referenda, the outcomes of German parliamentary elections and the popular vote of the US 2016 election. As for future applications, Matthias Grossglauser can imagine Predikon evolving, for example, to turn survey data before a vote into better outcome predictions. “We could try to exploit our model to improve predictions from crowdsourced or poll data. We could also explore the dynamics of how opinions shift in different areas over time due to population and/or demographic changes.”

And what of the question of Predikon’s impact? “We develop our statistical models and algorithms to make predictions and inferences and then there's always a quest to find new areas of application and opportunities for impact. Victor has been perfect to lead on Predikon’s continuing development. He cares about the environment and about society, and I think this is a project that has allowed us to do something meaningful and maybe enhance the democratic process,” Matthias concludes.

Visit Predikon at www.predikon.ch

Author: Tanya Petersen
Source: EPFL