Hackers donate 90% of profit to charity 2019/06/13

NGI Zero awarded two EC research and innovation actions 2018/12/01

EC publishes study on Next Generation Internet 2025 2018/10/05

Bob Goudriaan successor of Marc Gauw 2017/10/12

NLnet Labs' Jaap Akkerhuis inducted in Internet Hall of Fame 2017/09/19


variation graph (vgteam)

[variation graph (vgteam)]

Vgteam is pioneering privacy-preserving variation graphs, that allow to capture complex models and aggregate data resources with formal guarantees about the privacy of the individual data sources from which they were constructed. Variation graphs relate collections of sequences together as walks through a graph. They are most-commonly applied to genomic data, where they support the compression and query of very large collections of genomes. In this project, we will apply formal models of differential privacy to build variation graphs which do not leak information about the individuals whose genomes were used to build them. The same techniques allow us to extend these models to include phenotype and health information, maximizing their utility for biological research and clinical practice without risking the privacy of participants who shared their data to build them. These tools are not limited to genomes, and will allow the production of privacy-preserving representations of collections of any sensitive data that can be represented in a variation graph form, including collections of personal writing, web browsing histories, or the trajectories of individuals and vehicles through transportation networks.

Why does this actually matter to end users?

Worries over our health and safety will in many cases take precedence over the perceived value of our privacy. When it comes to our physical health and wellbeing, we are often in a strongly dependent position. Especially in times of great mental stress (like when a medical doctor breaks bad news to us) or fear (my daughter is late from school) we often lack the time and knowledge to really consider what data we actually want to make available and under which conditions. Many people in such situations reach a point of detachment and panic, where they hand out whatever data requested from them by whomever promises to resolve the stress. And once data is out there, it is hard to trace back. But what if we do not have to give up our privacy for the sake of better, and more personalized health care, or our safety? What if we can have both? An interesting example is genetic research, which can be crucial to identify hereditary diseases and ultimately create a type of care that perfectly fits your unique needs. It also involves extremely personal and uniquely identifying data (literally the DNA that made us the individuals we are), the wider availability of which has a potential impact on the privacy of your children and their children and their children's children. Who know what future generations will have to endure, in good times and in bad times? With the technology easily available to them, would insurance companies, employers or governments be tempted to test for yet undiscovered heart conditions or expensive and rare diseases - or worse? And yet we make important decisions about this in times of stress. Things do not have to be so black and white. Maybe doctors do not need to have access to all of our DNA in order to help us, so we don't have to share everything. Maybe we don't need to trace where our children are 24/7, as long as we know they are safe. As it turns out, there are clever ways to aggregate data in a privacy preserving way, preserve the characteristics needed and removing the rest. This project will build on these so called "variation graphs" to further explore and develop these technologies. There are applications throughout many other use cases as well - variation graphs can be used to produce privacy-preserving representations of collections of other sensitive data, including collections of personal writing, web browsing histories, or even quantified self. The general tenet is always to only share the relevant information, while preventing the identification of individuals. Variation graphs have huge potential. The primary demonstration case in this project is extremely ambitious: to enable the creation of searchable DNA databases that protect the individuals contributing in a provable way. Input data from healthy and non-healthy people can be transformed in such a way that the privacy of all involved is protected while intensive study of DNA data remains possible. This will greatly help to convince people that they can contribute to medial research. Of course success in such a critical application breaks the ice for all other use cases, where we see the benefit of big data but also the threats. Such a solution, if it becomes widely available, would be nothing short of revolutionary.

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322. Applications are still open, you can apply today.


Send in your ideas.
Deadline February 1st, 2020.