“Data protection policies that seem clear are often ambiguous"

“Data protection policies that seem clear are in fact highly ambiguous” says Hamza Harkous developer of Polisis © 2018 Alain Herzog

“Data protection policies that seem clear are in fact highly ambiguous” says Hamza Harkous developer of Polisis © 2018 Alain Herzog

A new EU regulation that went into effect in May has prompted consumers to pay more attention to how websites use the personal data they collect. With Polisis, an AI-based program developed at EPFL, consumers can get quick, easy-to-read summaries of websites’ data protection policies. And the program is becoming highly popular – EPFL’s Technology Transfer Office has received numerous license requests. Hamza Harkous, one of the software developers, is pleasantly surprised at its success and has just entered into a license agreement with US-based search engine DuckDuckGo. We spoke with Harkous about his one-of-a-kind program.

EPFL’s Technology Transfer Office (TTO) has recently seen a surge in requests for licenses for your software, which generates color-coded graphs highlighting the key elements of websites’ data protection policies. Who are all these requests coming from?

It’s true that the TTO has received over 20 license requests in the past few months. Most of them come from companies offering web services – such as personal data protection software and data monetization applications – as well as lawyers drafting data protection policies and disclaimers, who want to test their texts. Some ad agencies have requested licenses so that they can place ads in accordance with new regulatory requirements. A few of the requests we have received were for exclusivity agreements, which is something we want to avoid. We just entered into a license agreement with DuckDuckGo, a US search engine that stands apart for its commitment to protecting personal data. For instance, it doesn’t store any personal information about its users.

What benefits does your software offer DuckDuckGo?

Its developers plan to incorporate our AI algorithms into the search engine’s Privacy Essentials extension. This extension already provides a summary of important data protection information, but that summary is generated by hand based on the policies of just a few dozen websites. With our algorithms, the summary generation process can be automated and expanded to include tens of thousands of websites. However, it’s worth pointing out that the summaries produced by our software have no legal value – they’re for information purposes only – but they do have a nearly 80% accuracy rate at this point.

Did you expect Polisis to be so successful when you were developing the algorithms?

We spent a year and a half developing the technology at EPFL’s Distributed Information Systems Laboratory, in association with the University of Wisconsin and the University of Michigan. The development was done for research purposes, but the number of license requests the TTO has received since we put the program online has far exceeded our expectations. And over 30,000 people have tried it out. Part of Polisis’ success is due to the fact that it’s easy to use – you don’t have to be an expert to understand the color-coded graphs it generates. The EU’s General Data Protection Regulation (GDPR) that took effect in May also helped. Our program clearly provides the easiest way to decipher all the different data protection policies, which are often long-winded but now mandatory for all websites. The GDPR has also prompted people who are simply curious to take an interest in our program. That makes me almost wish we had created a start-up to market our technology!

Do you have any competitors?

There are other websites out there that analyze data protection policies, but they don’t work the same way. They highlight specific sentences and clauses based on what a user searches for. But they don’t provide automatically-generated summaries of how websites use personal data.

Are you surprised by some of the things you found using your program?

It was interesting to see that some websites that you wouldn’t think are much concerned about data privacy actually show up as having robust policies. The developers of those websites probably hired an army of lawyers to comb through their policies and introduce generic terms that make it through filters. The end result is policies with extremely vague terms like “with our partners for different reasons,” “if necessary” and “effectively.” These policies seem clear when you first read them, but dig deeper and you’ll find that’s not the case. And the GDPR has not completely eliminated this trend. Our algorithms are designed to boost transparency and draw attention to these kinds of highly important details. We are currently working with Kassem Fawaz’s lab at the University of Wisconsin to improve our algorithms so they can better recognize generic terms.

Your software can help identify where personal data are sent and how they’re used, but can it check whether websites are actually following their data protection policies?

No, because we don’t have access to websites’ servers. But extensions like those provided by DuckDuckGo can spot advertising trackers and check whether websites disclose them in their data protection policies. That means you can see the websites’ privacy shortcomings – and avoid potential problems.