Calls: Send in your ideas. Deadline August 1st, 2021.

OpenData

Projects to facilitate the creation, collection and curation of free information.

This page contains a concise overview of projects funded by NLnet foundation that belong to OpenData (see the thematic index). There is more information available on each of the projects listed on this page - all you need to do is click on the title or the link at the bottom of the section on each project to read more. If a description on this page is a bit technical and terse, don't despair — the dedicated page will have a more user-friendly description that should be intelligible for 'normal' people as well. If you cannot find a specific project you are looking for, please check the alphabetic index or just search for it (or search for a specific keyword).

Record Federation for Corteza Clouds — Data federation over ActivityPub

Corteza is a low code platform for building cloud-based web applications. This is typically for private, records-based management purposes (e.g. case management, insurance claims processing, public sector management applications, CRM, ERP), but the uses can also be public if required. It has a modular architecture and its data later, presentation layer and automation layer can each be treated individually. Corteza Record Federation makes innovative use of the ActivityPub standard to describe how content from the Corteza data layer can be broadcast across large federations of Corteza clouds. All data types, simple or compound, entire records and entire data models are supported.

Whether it be energy, finance, health, education or smart cities, many industries need to share complex data in real-time or near real-time, while preserving the digital sovereignty of a large number of disparate actors, protecting the privacy of user data and acknowledging the law of whichever territories in which they find themselves operating. Corteza Record Federation allows for the creation of private networks of decentralised “mini-clouds”, all self-hosted and controlled by their owners, where this data exchange can happen as efficiently and more effectively than on any single centralised cloud.

>> Read more about Record Federation for Corteza Clouds

Folksonomy engine for the food ecosystem — Data modelling by the community

Everybody is interested in the food they eat, by many different aspects, ranging from taste, cost, ingredients and nutrition to its impact on health, the environment and society. We also happen to have many different names for the same food, the way we prepare it and other properties - sometimes only used very locally. That means it is not always easy for everyone to effectively search open data sets like OpenFoodFacts. Open Food Facts - sometimes referred to as the "wikipedia for food products" - is the biggest open food-database in the world.

The Folksonomy engine for the food ecosystem created within this project will unleash an ocean of new data and uses regarding food. Citizens, researchers, journalists, professionals, artists, communities, and innovators will be able to define and add new properties of their choice to food products on Open Food Facts for their own use or to enrich the shared knowledge. Open Food Facts already feeds hundreds of data reuses. Thousands more will become possible thanks to the new user defined properties.

>> Read more about Folksonomy engine for the food ecosystem

The Open Green Web — Ethical meta-search filter on green hosted websites

The world wide web has become a mainstay of our modern society, but it is also responsible for a significant use of natural resources. Over the last ten years, The Green Web Foundation (TGWF) has developed a global database of around 1000 hosters in 62 countries that deliver green hosting to their customers, to help speed a transition away from a fossil fuel powered web. This has resulted in roughly 1.5 billion lookups since 2011 - through its browser based plugins, manual checks on the TGWF website and its API, provided by an open source platform. But what if you want to take things one step further? This project will create the world's first search engine with ethical filtering, that will exclusively show green hosted results. In addition to giving a new choice of search engine to environmentally conscious web users, all the code and data will be open sourced. This creates a reference implementation for wider adoption across industry of search providers, increasing demand and visibility around how we power the web. The project build upon the open source search engine Searx, and will collaborate with the developers of that search tool to make "green" search an optional feature for all installs of Searx.

>> Read more about The Open Green Web

Nominatim — Multi-lingual support in address search

Nominatim is an open-source geographic search engine (geocoder). It makes use of the data from OpenStreetMap to built up a database and API that allows to search for any place on earth and lookup addresses for any given geographic location. It is used as the main search engine on the OpenStreetMap website where it serves millions of requests per day but it can also be installed locally. You can easily set it up for a small country on your laptop. Nominatim has always aimed to be usable world-wide for any place in any language. To that end it has used generic, language-agnostic algorithms that assume a uniform data model. This has served us especially well while the OpenStreetMap database was in its early stages of development and changing fast. Now that it has matured, it is time to further improve the search experience by taking into account the particularities of different languages and the different practises when it comes to geographic addressing. We aim to restructure the part of the software that parses the place names and search queries to make it more configurable and make it easier to take into account languages and regional peculiarities.

>> Read more about Nominatim

Personal Food Facts — Privacy protecting personalized information about food

Open Food Facts is a collaborative database containing data on 1 million food products from around the world, in open data. This project will allow users of our website, mobile app and our 100+ mobile apps ecosystem, to get personalized search results (food products that match their personal preferences and diet restrictions based on ingredients, allergens, nutritional quality, vegan and vegetarian products, kosher and halal foods etc.) without sacrificing their privacy and having to send those preferences to us.

>> Read more about Personal Food Facts

Plaudit — Make good science discoverable through endorsements

Plaudit is open source software that collects endorsements of scholarly content from the academic community, and leverages those to aid the discovery and rapid dissemination of scientific knowledge. Endorsements are made available as open data. The NGI Search & Discovery Grant will be used to simplify the re-use of endorsement data by third parties by exposing them through web standards.

>> Read more about Plaudit

Software Heritage — Collect, preserve and share the source code of all software ever written

Software Heritage is a non profit, multi-stakeholder initiative with the stated goal to collect, preserve and share the source code of all software ever written, ensuring that current and future generations may discover its precious embedded knowledge. This ambitious mission requires to proactively harvest from a myriad source code hosting platforms over the internet, each one having its own protocol, and coping with a variety of version control systems, each one having its own data model. This project will amongst other help ingest the content of over 250000 open source software projects that use the Mercurial version control system that will be removed from the Bitbucket code hosting platform in June 2020.

>> Read more about Software Heritage

Solid-Search — Queries in a pod

Solid-Search aims to provide an open source module that adds full-text search functionality to Solid pods. Solid is an emergent specification initiated by the inventor of the World Wide Web, sir Tim Berners-Lee. Solid aims to decentralize the web by decoupling applications from databases by introducing Solid Pods (personal online datastores that are in full control of the data owner). Having a way to search through your personal data on your Solid Pod is a must-have for the project to become truly successful. However, this requires technology that does not exist yet: a full-text search interface that works with schema-less RDF data. In order to maximize adoption and retain a modular, open approach, we will standardize the way in which data changes are described. By doing so, it will be relatively easy to introduce new search / query systems (such as search by location). The project will will create the open source search back-end, improve linked data synchronisation specs, link the module to two solid implementations, create a front-end for end-users, and write a tutorial for adding data sources.

>> Read more about Solid-Search

StreetComplete — Fix open geodata with OpenStreetMap

The project will make collecting data for OpenStreetMap easier and more efficient. OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes. Improving OSM increases quality of services using open data rather than proprietary datasets kept as a trade secret by established companies.

>> Read more about StreetComplete

WebXray Discovery — Expose tracking mechanism in search hubs

WebXray intends to build a filter extension for the popular and privacy-friendly meta-search Searx that will show users what third party trackers are used on the sites in their results pages. Full transparency of what tracker is operated by what company is provided to users, who will be able to filter out sites that use particular trackers. This filter tool will be built on the unique ownership database WebXray maintains of tracking companies that collect personal data of website visitors.

Mapping the ownership of tracking companies which sell behavioural profiles of individuals, is critical for all privacy and trust-enhancing technologies. Considerable scrutiny is given to the large players who conduct third party tracking and advertising whilst little scrutiny is given to large numbers of smaller companies who collect and sell unknown volumes of personal data. Such collection is unsolicited, with invisible beneficiaries. The ease and speed of corporate registration provides the opportunity for data brokers to mitigate their liability when collecting data profiles. We must therefore establish a systematic database of data broker domain ownership.

The filter extension that will be the output of the project will make this ownership database visible and actionable to end users, and to curate the crowdsourced data and add it to the current database of ownership (which is already comprehensive, containing detailed information on more than 1,000 ad tech tracking domains).

>> Read more about WebXray Discovery

Fashion Freedom — Supporting research, development, and education to bring the fashion industry into the 21st century

The Fashion Freedom Initiative wants to make sure that everyone benefits from new advances in technology in the fashion industry and beyond. It aims to assist the industry and the wider society in transitioning into a new phase where social responsibility, art, usability, privacy and sustainability are combined into a better and smarter fashion for everyone. Designing and making clothes isn't just a luxury for the affluent, or a prerogative of large factories and consumer brands: it is a universal need at the largest possible scale.

>> Read more about Fashion Freedom

LTSP Deskop — Remote desktop via an LTSP-Cluster

Thin clients (PCs where all data is kept on a remote server and only the desktop is kept locally), are already in use for a long time. These days, increased bandwidth and Cloud Computing allow us to go further, even to stream the complete desktop from the Internet. The possibility to start a desktop "on demand" from the cloud offers interesting new collaboration possibilities: any application can instantly become remote accessible. For instance, having a graphic design reviewed by a design interface specialist. Or program together/review code within a single IDE instance.

The goal of this project is to completely integrate remote access to a cluster of LTSP servers that can be directly accessible or streamed from any private or public cloud (like Amazon EC2 or Eucalyptus).

At start, the project is targeted at Open Source specialists which should test the new functionality, translations and design. Development versions are simple to test: no need to "scrap" my computer: simply instantiate a remote development desktop.

Schools are a second target. Schools will be able to distribute any application to any computer with the LTSP-Cluster. Schools all over the World will be able to provide the complete school environment to any child (using Windows, Linux or Mac computer). All students have access to the same educational tools.

>> Read more about LTSP Deskop

OpenStreetMapNL — maintenance software for OpenStreetMap Nederland

() Het geodatalandschap verandert. Overheidsdata wordt meer en meer vrij beschikbaar. Belangrijke kaartenleveranciers TeleAtlas en Navteq verliezen hun onafhankelijke positie door inlijving bij TomTom respectievelijk Nokia. Tegelijkertijd neemt het belang van het `Geografische Web' steeds toe en nemen gebruikers van geografische informatie geen genoegen meer met een passieve gebruikersrol. De commerciële leveranciers herijken hun strategie teneinde een graantje te kunnen meepikken van `user generated content'.

In dit veranderende landschap wordt OpenStreetMap steeds meer een factor om mee te rekenen --in het bijzonder in Nederland. Als onafhankelijke bron van een hoogwaardige, landsdekkende, volledige en bovendien vrij te gebruiken geodataset van Nederland eist OpenStreetMap een duidelijke plaats op. Dat zal niet ongemerkt blijven. Er zullen meer eindgebruikers komen. Er zullen meer bedrijven geïnteresseerd raken in het inzetten van OpenStreetMap-data in hun systemen, websites en applicaties. Nieuwe toepassingen zullen het levenslicht zien. Wellicht volgen er nog meer donaties van geografische data.

Dit project is specifiek gericht op:

  • Ontwikkelen van systemen voor backups, rollback-mogelijkheden, signaleren van wijzigingen en toekenning van niveaus van vertrouwen gekoppeld aan bijdragers en hun wijzigingen.
  • Ontwikkelen van een lichtgewicht mobiele editor om het rechtstreeks controleren en aanpassen van de OpenStreetMap-data `in het veld’ mogelijk te maken.
  • Ontwikkelen van een laagdrempeliger interface voor het doorgeven van eenvoudige wijzigingen door ‘leken’.

>> Read more about OpenStreetMapNL

Searsia — Searsia is a protocol and implementation for large scale federated web search.

Searsia provides the means to create a personal, private, and configurable search engine, that combines search results freely from a very large number of sources. Searsia enables existing sources to cooperate such that they together provide a search service that resembles today’s large search engines. In addition to using external services at will, you can also use it to integrate whatever private information from within your organisation - so your users or community can use a single search engine to serve their needs.

>> Read more about Searsia

TOS;DR — A user rights initiative to rate and label website terms & privacy policies

Terms of service are often too long to read (reading all of these carefully wrought documents could quite literally cost you years of your life), yet it is very important to understand what is in them. After all, your actual legal position online depends on them in a very concrete way. The ratings from TOS;DR can help users get informed about their rights.

>> Read more about TOS;DR

Free Software Vulnerability Database — A resource to aggregate software updates

"Using Components with Known Vulnerabilities" is one of the OWASP Top 10 Most Critical Web Application Security Risks. Identifying such vulnerable components is currently hindered by data structure and tools that are (1) designed primarily for commercial/proprietary software components and (2) too dependent on the National Vulnerability Database (from US Dept. of Commerce). With the explosion of Free and Open Source Software (FOSS) usage over the last decade we need a new approach in order to efficiently identify security vulnerabilities in FOSS components that are the basis of every modern software system and applications. And that approach should be based on open data and FOSS tools. The goal of this project is create new FOSS tools to aggregate software component vulnerability data from multiple sources, organize that data with a new standard package identifier (Package URL or PURL) and automate the search for FOSS component security vulnerabilities. The expected benefits are to contribute to the improved security of software applications with open tools and data available freely to everyone and to lessen the dependence on a single foreign governmental data source or a few foreign commercial data providers.

>> Read more about Free Software Vulnerability Database