News

Hackers donate 90% of profit to charity 2019/06/13

NGI Zero awarded two EC research and innovation actions 2018/12/01

EC publishes study on Next Generation Internet 2025 2018/10/05

Bob Goudriaan successor of Marc Gauw 2017/10/12

NLnet Labs' Jaap Akkerhuis inducted in Internet Hall of Fame 2017/09/19

  Help grow the future. Donate

NGI Zero Discovery

NLnet has an open call as well as thematic funds. This page contains an overview of the projects that fall within the topic NGI Zero Discovery. If you cannot find the project you are looking for, please check the alphabetic index.

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

Applications are still open, you can apply today.

A Distributed Software Stack For Co-operation

Perspectives aims to be to co-operation, what ActivityPub is to social networks. It provides the conceptual building blocks for co-operation, laying the groundwork for a federated, fully distributed infrastructure that supports endless varieties of co-operation. The declarative Perspectives Language allows a model to translate instantly in an application that supports multiple users to contribute to a shared process, each with her own unique perspective. The project builds a reference implementation of the distributed stack that executes these models of co-operation, and makes the information concerned searchable.

Real life is an endless affair of interlocking activities. Likewise, Perspectives models of services can overlap and build on common concepts, thus forming a federated conceptual space that allows users to move from one service to another as the need arises in a most natural way. Such an infrastructure functions as a map, promoting discovery, decreasing dependency on explicit search. However, rather than being an on-line information source to be searched, such the traditional Yellow Pages, Perspectives models allow their users (individuals and organisations alike) to interact and deal with each other on-line. Supply-demand matching in specific domains (e.g. local transport) integrates readily with such an infrastructure. Other patterns of integrating search with co-operation support form a promising area for further research.

>> Read more about A Distributed Software Stack For Co-operation

Blink RELOAD

Blink is a mature open source real-time communication application that can be used on different operating systems, based on the IETF SIP standard. It offers audio, video, instant messaging and desktop sharing. Blink RELOAD aims to implement the REsource LOcation And Discovery specification (RELOAD), which describes a peer-to-peer network that allows participants to discover each other and to communicate using the IETF SIP protocol. This offers an alternative discovery mechanism, one that does not rely on server infrastructure, in order to allow participants to connect with each other and communicate. In addition, the RELOAD specification describes means by which participants can store, publish and share information, in a way that is secure and fully under the control of the user, without a third party controlling the sharing process or the information being shared.

>> Read more about Blink RELOAD

DeltaBot

The DeltaBot project will research and develop decentralized, e2e-encrypting and socially trustworthy bots for Delta Chat (https://delta.chat). Bots will bridge with messaging platforms like IRC or Matrix, offer media archiving for its users and provide ActivityPub and RSS/Atom integration to allow users to discover new content. Our project is not only to provide well tested and documented Chat Bots in Python but also help others to write and deploy their own custom bots. Bots will perform e2e-encryption by default and we'll explore seamless ways to resist active MITM attacks.

>> Read more about DeltaBot

Discourse ActivityPub

Discourse is a modern open source discussion platform. In some ways it can work similar to email but it much better suited to large scale group discussions that in turn become searchable (i.e. indexable) items of knowledge on the world wide web (given that the forum is publicly viewable). We are building a two-way mirror for Discourse topics, compatible with the ActivityPub standard. The first iteration of this will be "Live Topic Links": When a topic is created on Forum A by pasting a URL to a topic on another Discourse instance (Forum B), the user is prompted "would you like to sync replies between this forum and the forum you're linking to?" If the user clicks "yes," replies to the mirror topic on Forum B would be synced back to the topic on Forum A, and vice versa (if Forum B has "whitelisted"rum A).

>> Read more about Discourse ActivityPub

Discover and move your coins by yourself

The numerous technologies behind cryptocurrencies are probably the most difficult to understand compared to any other networks, even for technical experts - and especially bitcoin based networks. Most users, even those familiar with the technology for years, have to rely on wallets or run/sync full nodes. Empirically we can see that they usually get lost at a certain point of time, especially when said wallets dictate the use of new "features", like bip39 and alike, multisig, segwit and bech32. Most users don't understand where their coins are and on what addresses, what is the format of these addresses and what are their seeds and what they need to unlock their coins. This situation pushes users to give their private keys to dubious services, resulting to the loss of all of their coins. The alternative is to let exchanges manage their coins, which removes their agency and puts them at risk. The goal of this project is to correct this situation allowing people to simply discover where are their coins and what are their addresses, whatever features are used. It will allow them to discover their addresses from one coin to another, rediscover their seed if they lost a part, sign/verify addresses ownership, discover public keys from private keys and create their hierarchical deterministic addresses. In fact, all the tools needed to discover and check what is related to their coins - and this for any bitcoin based network, in addition it allows them to create their transactions by themselves and send them to the networks, or just check them. The tool is a standalone secure open source webapp inside browsers that must be used offline, this is a browserification of a nodejs module that can be also used or modified for those that have the technical knowledge.

>> Read more about Discover and move your coins by yourself

elRepo.io - Resilient, human-centered, distributed content sharing and discovery.

In this project AlterMundi and NetHood collaborate to develop a critical missing part in decentralized and distributed p2p systems: content search. More specifically, this project will implement advanced search for elRepo.io, the self-hosted and distributed culturesharing platform currently under active development by AlterMundi and partners. Search functionalities will expand on the already proven coupling of thelibxapian searching and indexing library and turtle routing. The distributed search functionality will be implemented to be flexible and modular. It will become the meeting point of three complementary threads of on-going work: Libre technology and tools for building Community Networks (LibreRouter & LibreMesh), fully decentralized, secure and anonymous Friend2Friend software (Retroshare), and a transdisciplinary participatory methodology for local applications in Community Networks (netCommons).

>> Read more about elRepo.io - Resilient, human-centered, distributed content sharing and discovery.

Explain

The Explain project aims to bring open educational resources to the masses. Many disparate locations of learning material exist, but as of yet there isn’t a single place which combines these resources to make them easily discoverable for learners. Using a broad array of deep content metadata extraction techniques developed in conjunction with the Delft University of Technology, the Explain search engine indexes content from a wide variety of sources. With this search engine, learners can then discover the learning material they need through a fine-grained topic search or through uploading their own content (eg. exams, rubrics, excerpts) for which learners require additional educational resources. The project focuses on usability and discoverability of resources.

>> Read more about Explain

Extending PeerTube

This project aims to extend PeerTube to support the availability, accessibility, and discoverability of large-scale public media collections on the next generation internet. Although PeerTube is technically capable to support the distribution of large public media collections, the platform currently lacks practical examples and extensive documentation to achieve this in a timely and cost-efficient way. This project will function as a proof-of-concept that will showcase several compelling improvements to the PeerTube software by [1] developing and demonstrating the means needed for this end by migrating a large corpus of open video content, [2] implementing trustworthy open licensing metadata standards for video publication through the PeerTube platform, [3] and emphasizing the importance of accompanying subtitle files by recommending ways to generate them.

>> Read more about Extending PeerTube

fediverse.space

Fediverse.space is a tool for understanding decentralized social networks. The fediverse, or federated universe, is the set of social media servers, hosted by individuals across the globe, forming a libre and more democratic alternative to traditional social media. When displaying these servers in an intuitive visualization, clusters quickly emerge. For instance, servers with the same primary language will be close to each other. There are more subtle groupings, too: topics of discussion, types of users (serious vs. ironic), and political leanings all play a role. fediverse.space aims to be the best tool for understanding and discovering communities on this emerging social network.

>> Read more about fediverse.space

Free Software Vulnerability Database

"Using Components with Known Vulnerabilities" is one of the OWASP Top 10 Most Critical Web Application Security Risks. Identifying such vulnerable components is currently hindered by data structure and tools that are (1) designed primarily for commercial/proprietary software components and (2) too dependent on the National Vulnerability Database (from US Dept. of Commerce). With the explosion of Free and Open Source Software (FOSS) usage over the last decade we need a new approach in order to efficiently identify security vulnerabilities in FOSS components that are the basis of every modern software system and applications. And that approach should be based on open data and FOSS tools. The goal of this project is create new FOSS tools to aggregate software component vulnerability data from multiple sources, organize that data with a new standard package identifier (Package URL or PURL) and automate the search for FOSS component security vulnerabilities. The expected benefits are to contribute to the improved security of software applications with open tools and data available freely to everyone and to lessen the dependence on a single foreign governmental data source or a few foreign commercial data providers.

>> Read more about Free Software Vulnerability Database

Geographic tagging and discovery of Internet Routing and Forwarding

SCION is the first clean-slate Internet architecture designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. As a path-based architecture, SCION end-hosts learn about available network path segments, and combine them into end-to-end paths, which are carried in packet headers. By design, SCION offers transparency to end hosts with respect to the path a packet travels through the network. This has numerous applications related to trust, compliance, and also privacy. By better understanding of the geographic and legislative context of a path, users can for instance choose trustworthy paths that best protect their privacy. Or avoid the need for privacy intrusive and expensive CDN's by selecting resources closer to them. SCION is the first to have such a decentralised system offer this kind of transparency and control to users of the network.

>> Read more about Geographic tagging and discovery of Internet Routing and Forwarding

GNU Guix

GNU Guix is a universal functional package manager and operating system which respects the freedom of computer users. It focuses on boostrappability and reproducibility to give the users strong guarantees on the integrity of the full software stack they are running. It supports atomic upgrades and roll-backs which make for an effectively unbreakable system. This project aims to enhance multiple facets; the main three goals are: (1) distributed package distribution (e.g. over IPFS), (2) composable and programmable user configurations / services (a way to replace "dotfiles" by modules that can be distributed and serve a wide audience), (3) broaden accessibility via, among others, a graphical user interface for installation / package management.

>> Read more about GNU Guix

GNU Name System

Today, the starting point of any discovery on the Internet is the Domain Name System (DNS). DNS suffers from security and privacy issues. The GNU project has developed the GNU Name System (GNS), a fully decentralized, privacy-preserving and end-to-end authenticated name resolution protocol. In this project, we will document the protocol on a bit-level (RFC-style) and create a second independent implementation against the specification. Furthermore, we will simplify the installation by providing proper packages that, when installed, automatically integrate the GNS logic into the operating system.

>> Read more about GNU Name System

Handling Data from IPv6 Scanning

Scanning is state of the art to discover hosts on the Internet. Today’s scanning relies on IPv4 and simply probes all possible addresses. But global IPv6 adoption will render brute-forcing useless due to the sheer size of the IPv6 address space, and demands more sophisticated ways of target generation. Our team developed such an approach that generally allows to probe all subnets in the currently deployed IPv6 Internet within reasonable time. Positive responses are however scarce in the IPv6 Internet; thus, we include error messages in our analysis as they provide meaningful insight into the current deployment status of networks. First experiments covering only parts of the Internet were promising and at least 5% of our probes trigger error messages. However, a full scan would lead to approx. 10^14 responses causing Petabytes of data, and demands an adequate solution of data handling. In this project, we will develop a data storage and analysis solution for high-speed IPv6 scanning. It will process the high amount of received data concurrently with scanning, and provide continuous results while scanning for long periods. This effort enables full scans of the IPv6 Internet.

>> Read more about Handling Data from IPv6 Scanning

IN COMMON

IN COMMON emerged as a transnational European collective from a network of non-profit actors to identify, promote, and defend the Commons. We decided to start a common pool for Information Technologies with the aim to create, maintain, and share with the public geo-localized data that belong to our constituents and to articulate citizen movements around a free, public and common platform to map and act together for the Commons. IN COMMON forms a cooperative data library that provides collective maintenance to ensure data is always accurate.

>> Read more about IN COMMON

ipfs-search.com

ipfs-search.com is a Free and Open Source (FOSS) search engine for directories, documents, videos, music on the Interplanetary Filesystem (IPFS), supporting the creation of a decentralized web where privacy is possible, censorship is difficult, and the internet can remain open to all.

>> Read more about ipfs-search.com

librarian

Search engines are the default way of finding information on the internet. Although there is a host of search engines for users to choose - from library catalogs to cooking portals - there is currently only a small number of dominant search engines that practically decide who finds what on the internet. This situation has the following disadvantages: 1) by designing their algorithms these dominant search engines influence our world view, 2) the huge amounts of user data they record, creates sever risks of data leaks and misuse, finally 3) search engines can misuse their market power to gain advantages in other lines of business (e.g. the mobile phone market).

Federated web search is a technology where users connect to a so-called broker which forwards their search request to suitable search engines and combines the results. Using federated search lessens the risks of few dominant search engines: it shows a blend of search results created by different algorithms, it prevents the search engine to record data of individual users, and its search results are usually more divers. Still, for federated web search to become widely used, it faces the following challenges: 1) while exploiting user behavior is known to improve search effectiveness, brokers exploiting this data also risk leaks and misuse, 2) as brokers typically serve many users, they are not able to include search engines for personal content, such as email, social media or cloud storage because the public broker cannot know the user’s credentials to access these services, finally 3) brokers consider for every user the same base set of search engines, while considering a more focused set of engines could improve search results, given the diversity of users.

To improve upon these challenges, while avoiding the disadvantages of dominant search engines, this project will investigate a radical change to the federated search architecture: users run a broker on their own computer using a browser plugin. In this architecture the broker can safely analyze the user's behavior to improve search results as the data is accumulated on a per-user basis on disconnected computers. Furthermore, the search requests forwarded to search engines use the user's credentials and thus can access search engines for personal data, such as email etc. Finally, starting from sensible defaults, each user can configure its broker with his or her individual needs.

>> Read more about librarian

Librecast Live

The Librecast Live project contributes to decentralizing the Internet by enabling multicast. Multicast is a major network capability for a secure, decentralized and private by default Next Generation Internet. The original design goals of the Internet do not match today's privacy and security needs, and this is evident in the technologies in use today. There are many situations where multicast can already be deployed on the Internet, but also some that are not. This project will build transitional protocols and software to extend the reach of multicast and enable easy deployment by software developers. Amongst others it will produce a C library and POC code using a tunneling method to make multicast available to the entire Internet, regardless of upstream support. We will then use these multicast libraries, WebRTC and the W3C-approved ActivityPub protocol to build a live streaming video service similar to twitch.tv. This will be a complement to the existing decentralised Mastodon and Peertube projects, and will integrate with these services using ActivePub. By doing so we can bring live video streaming services to these existing decentralised userbases and demonstrate the power of multicast at the same time. Users will be able to chat and comment in realtime during streaming (similar to YouTube live streaming). This fills an important gap in the Open Source decentralised space. All video and chat messages will be transmitted over encrypted channels.

>> Read more about Librecast Live

Lizard

The Lizard project aims to develop a common protocol for end-to-end encrypted social applications using Tor as underlying transport mechanism, with the addition of store-and-forward servers discovered through the Tor hidden service directory. The protocol takes care of confidentiality and anonymity concerns, and adds mechanisms for easily synchronising application-level state on top. All communications are done "off the grid" using Tor, but identities can be publicly attested to using existing social media profiles. Using a small marker in your social profiles, you can signal to other Lizard users that they can transparently message you over Lizard instead. By taking care of these common discovery and privacy concerns in one easy-to-use software suite, we hope that more applications will opt for end-to-end encryption by default without compromising on anonymity.

>> Read more about Lizard

Mailpile Search Integration

Mailpile is an e-mail client and personal e-mail search engine, with a strong focus on user autonomy and privacy. This project, "Mailpile Search Integration", will adapt and enhance Mailpile so other applications can make use of Mailpile's built-in search engine and e-mail store. This requires improving Mailpile in three important ways: First, the project will add fine-grained access control, so the user can control which data is and isn't exposed. Second, enabling remote access will be facilitated, allowing a Mailpile running on a personal device to communicate with applications elsewhere on the network (such as smartphones, or services in "the cloud"). And finally, the interoperability functions themselves (the APIs) need to be defined (building on existing standards wherever possible), implemented and documented.

>> Read more about Mailpile Search Integration

Minedive

The minedive project is building several components: first, minedive is a browser extension aiming to allow users to search the web while preserving their anonymity and privacy. The second is an open source reference implementation of its rendez-vous server. minedive instances connect each-other (via WebRTC data channels) forming a two layered P2P network. The lower layer (L1) provides routing, the upper layer (L2) provides anonymous and encrypted communication among peers acting as a MIX network. This architecture guarantees that peers which know your IP address (L1) do not know search data for (L2) and vice-versa. A central (websocket) rendez-vous server is needed to find and connect with L1 peers, and to exchange keys with L2 peers, but no search goes through it. We are running a default server which can be overridden by users who want to run their own (using our reference implementation or a custom one). Users can also set the extension to pick peers from a given community (identified by an opaque tag). Currently all requests are satisfied by letting L2 peers return results from the 1st page of mainstream search engines (as they see it, in an attempt to escape the search bubble). While this will stay as a fallback, we plan to implement web crawling on peers, doing keyword extraction from URLs in local bookmarks and history and ranking with open algorithms, being transparent with users about which techniques are used and open to suggestions.

>> Read more about Minedive

Mynij

People feel lost when their connection to the internet is cut. All of a sudden, they cannot search for some reference or quickly look up something online. At the other end, hundreds of millions of servers are 'always on', awaiting the user to come online. Of course, this is neither very resilient nor economic. And it is also not necessary. In the 60s, computers used to occupy a large room. Nowadays, with smartphones, they fit in your hand. A complete copy of the Web (10 PB) already fits on 100 SSDs of 100 TB occupying a volume similar to an original IBM PC. A partial copy of the Web optimised for a single person will thus soon fit on a smartphone.

Mynij believes that Web search will eventually run offline for legal, technical and economic rationale. This is why it is building a general purpose Web search engine that runs offline and fits into a smartphone. It can provide fast results with better accuracy than online search engines. It protects privacy and freedom of expression against recent forms of digital censorship. It reduces the cost of online advertising for small businesses. It brings search algorithms and information presentation under end-user control. And you control its availability: as long as you have a copy and a working device, it can work.

>> Read more about Mynij

neuropil

Neuropil is an open-source de-centralized messaging layer that focuses on security and privacy by design. Persons, machines, and applications first have to identify their respective partners and/or content before real information can be sent. The discovery is handled internally and is based on so called "intent messages" that are secured by cryptographic primitives. This project aims to create distributed search engine capabilities based on neuropil, that enable the discovery and sharing of information with significantly higher levels of trust and privacy and with more control over the search content for data owners than today's standard.

As of now large search engines have implemented "crawlers", that constantly visit webpages and categorize their content. The only way to somehow influence the information that is used by search engines is by using a file called „robots.txt“. Other algorithms are only known to the search engine provider. By using a highly standardized "intents" format that protects the real content of users, this model is reversed: data owners define the searchable public content. As an example we seek to implement the neuropil messaging layer with its extended search capabilities into a standard web server to become one actor and to handle and maintain the search index contents of participating data owners. By using the Neuropil messaging layer it is thus possible to build a distributed search engine database that is able to contain and reveal any kind of information in a distributed, concise and privacy preserving manner, without the need for any central search engine provider.

>> Read more about neuropil

Next

Next is a new type of web browser designed to empower users to find and filter information on the Internet. Web browsers today, largely compete on performance in rendering, all whilst maintaining similar UIs. The common UI they employ is easy to learn, though unfortunately it is not effective for traversing the internet due to its limited capabilities. This presents itself as a problem when a user is trying to navigate the large amounts of data on the Internet and in their open tabs. To deal with this problem, Next offers a set of powerful tools to index and jump around one's open tabs, through search results and the wider Internet. For example, Next offers the ability for the user to filter and process their open tabs by semantic content search. Because each workflow and discipline is unique, the real advantage of Next is in its fully programmable and open API. The user is free to modify Next in any way they wish, even whilst it is running.

>> Read more about Next

Nextcloud

The internet helps people to work, manage, share and access information and documents. Proprietary cloud services from large vendors like Microsoft, Google, Dropbox and others cannot offer the privacy and security guarantees users need. Nextcloud is a 100% open source solution where all information can stay on premise, with the protected users choose themselves. The Nextcloud Search project will solve the last remaining open issue which is unified, convenient and intelligent search and discoverability of data. The goal is to build a powerful but user friendly user interface for search across the entire private cloud. It will be possible to select data date, type, owner, size, keywords, tags and other metadata. The backend will offers indexing and searching of file based content, as well as integrated search for other contents like text chats, calendar entries, contacts, comments and other data. It will integrate with the private search capabilities of Searx. As a result the users will have the same powerful search functionalities they know and like elsewhere, but respecting the privacy of users and strict regulations like the GDPR.

>> Read more about Nextcloud

openEngiadina

The openEngiadina project wants to explore the cross between a Linked Data repository and a social network - a semantic social network. openEngiadina started off as a platform for creating, publishing and using open local knowledge. The data is structured as Linked Data, allowing semantic queries and intelligent discovery of information. The ActivityPub protocol enables decentralized creation and federation of such structured data, so that local knowledge can be created by indepent actors in a certain area (e.g. a music association publishes concert location and timing). The project aims to develop a backend allowing such a platform, research ideas into user interfaces and strengthen the ties between the Linked Data and decentralized social networking communities.

>> Read more about openEngiadina

Openki.net

Openki is an interface between technology and culture. It provides an interactive web platform developed with the goal to remove barriers for universal education for all. The platform makes it simple to organise and manage "peer-to-peer" courses. The platform can be self-hosted, and integrates with OpenStreetMap. At the moment Openki is focused on facilitating learning groups and workshops. The project will improve the tool, so it can be used not only to organise courses (with the collaboration of many different actors, in a more participatory way) but much broader,for bottom-up project initiation, for grassroot organizations and facilitating societal dialogue.

>> Read more about Openki.net

P2Pcollab

This project is working towards creating a more decentralized, privacy-preserving, collaborative internet based on the end-to-end principle where users engage in peer-to-peer collaboration and have full control over their own data, enabling them to collaborate on, publish & subscribe to content in a decentralized way, as well as to discover & disseminate content based on collaborative filtering, while allowing local, offline search of all subscribed & discovered content. The project is researching & developing P2P gossip-based protocols and implementing them as composable libraries and lightweight unikernels with a focus on privacy, security, robustness, and scalability.

>> Read more about P2Pcollab

Pixelfed

Pixelfed is an open source and decentralised photo sharing platform, in the same vein as services like Instagram. The twist is that you can yourself run the service, or pick a reliable party to run it for you. Who better to trust with your privacy and the privacy of the people that follow you? The magic behind this is the ActivityPub protocol - which means you can comment, follow, like and share from other Pixelfed servers around the world as if you were all on the same website. Timelines are in chronological order, and there is no need to track users or sell their data. The project has many features including Discover, Hashtags, Geotagging, Photo Albums, Photo Filters and a few still in development like Ephemeral Stories. The goal of the project is among others to solidify the technical base, add new features and design and build a mobile app that is compatible with Mastodon apps like Fedilab and Tusky.

>> Read more about Pixelfed

Plaudit

Plaudit is open source software that collects endorsements of scholarly content from the academic community, and leverages those to aid the discovery and rapid dissemination of scientific knowledge. Endorsements are made available as open data. The NGI Search & Discovery Grant will be used to simplify the re-use of endorsement data by third parties by exposing them through web standards.

>> Read more about Plaudit

PoliFLW EU

PoliFLW is an interactive online platform that allows journalists and citizens to stay informed, and keep up to date with the growing group of political parties and politicians relevant to them - even those whose opinions they don't directly share. The prize-winning polical crowdsourcing platform makes finding hyperlocal, national and European political news relevant to the individual far easier. By aggregating the news political parties share on their websites and social media accounts, PoliFLW is a time-saving and citizen-engagement enhancing tool that brings the internet one step closer to being human-centric. In this project the platform will add the news shared by parties in the European Parliament and national parties in all EU member states. , showcasing what it can mean for access to information in Europe. There will be a built-in translation function, making it easier to read news across country borders. PoliFLW is a collaborative environment that helps to create more societal dialogue and better informed citizens, breaking down political barriers.

>> Read more about PoliFLW EU

Practical Decentralised Search and Discovery

Internet search and service discovery are invaluable services, but are reliant on an oligopoly of centralised services and service providers, such as the internet search and advertising companies. One problem with this situation, is that global internet connectivity is required to use these services, precisely because of their centralised nature. For remote and vulnerable communities stable, affordable and uncensored internet connectivity may simply not be available. Prior work with mesh technology clearly shows the value of connecting local communities, so that they can call and message one another, even in the absence of connectivity to the outside world. The project will implement a system that allows such isolated networks to also provide search and advertising capabilities, making it easier to find local services, and ensuring that local enterprises can promote their services to members of their communities, without requiring the loss of capital from their communities in the form of advertising costs. The project will then trial this system with a number of pilot communities, in order to learn how to make such a system best serve its purpose.

>> Read more about Practical Decentralised Search and Discovery

Private Searx

Searx is a popular meta-search engine letting people query third party services to retrieve results without giving away personal data. However, there are other sources of information stored privately, either on the computers of users themselves or on other machines in the network that are not publically accessible. To share it with others, one could upload the data to a third party hosting service. However, there are many cases in which it is unacceptable to do so, because of privacy reasons (including GPPR) or in case of sensitive or classified information. This issue can be avoided by storing and indexing data on a local server. By adding offline and private engines to searx, users can search not only on the internet, but on their local network from the same user interface. Data can be conveniently available to anyone without giving it away to untrusted services. The new offline engines would let users search in local file system, open source indexers and data bases all from the UI of searx.

>> Read more about Private Searx

SCION-Swarm

With the amount of downloadable resources such as content and software updates available over the Internet increasing year over year, it turns out not all content has someone willing to serve all of it up eternally for free for everyone. And in other cases, the resources concerned are not meant to be public, but do need to be available in a controlled environment. In such situations users and other stakeholders themselves need to provide the necessary capacity and infrastructure in another, collective way.

This of course creates new challenges. Unlike a website you can follow a link to or find through a standard search engine and which you typically only have to vet once for security and trustworthiness, the distributed nature of such a system makes it difficult for users to find the relevant information in a fast and trustworthy manner. One of the essential challenges of information management and retrieval in such a system is the location of data items in a way that the communication complexity remains scalable and a high reliability can be achieved even in case of adversaries. More specifically, if a provider has a particular data item to offer, where shall the information be stored such that a requester can easily find it? Moreover, if a user is interested in a particular information, how does he discover it and how can he quickly find the actual location of the corresponding data item?

The project aims to develop a secure and reliable decentralized storage platform enabling fast and scalable content search and lookup going beyond existing approaches. The goal is to leverage the path-awareness features of the SCION Internet architecture to use network resources efficiently in order to achieve a low search and lookup delay while increasing the overall throughput. The challenge is to select suitable paths considering those performance requirements, and potentially combining them into a multi-path connection. To this end, we aim to design and implement optimal path selection and data placement strategies for a decentralized storage system.

>> Read more about SCION-Swarm

Search and Displace

The goal of this project is to establish a workflow and toolchain which can address the problem of mass search and displacement for document content where the original documents are in a range of forms, including a wide variety of digital document formats, both binary and more modern compressed XML forms, and potentially even encompassing older documents where the only surviving form is printed or even handwritten. The term "displacement" is meant to encompass actions taken on the discovered content that are beyond straight replacement, including content tagging and redaction, as well as more complex contextual and user-refined replacement on an iterative basis. It is assumed that this process will be a server application with documents uploaded as needed, on either an individual or bulk upload basis. The solution would be built in a modular fashion so that future deployments could deploy and/or modify only the parts needed. In practical terms this involves the creation of an open source tool chain that facilitates searching for private and confidential content inside documents, for instance attachments to email messages or documents that are to be published on a website. The tool can subsequently be used for the secure and automated redaction of sensitive documents; by building this as a modular solution enables the solution to be used “standalone” with a simple GUI, or used via command line, or embedded within 3rd party systems such as document management systems, content management systems and machine learning systems. In addition a modular approach will facilitate the use of the solution both with different languages (natural and programming) and different specialities e.g. government archives, winning tenders, legal contracts, court documents etc..

>> Read more about Search and Displace

searx

Searx (/sɜːrks/) is a free metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. Across all categories, Searx can fetch and combine search results from more than 80 different engines. This includes major commercial search engines like Bing, Google, Qwant, DuckDuckGo and Reddit, as well as site-specific searches such as Wikipedia and Archive.is. Searx is a self hosted web application, meaning that every user can run it for themselves and others - and add or remove any features they want. Meanwhile, numerous publicly accessible instances are hosted by volunteer organizations and individuals alike. The project will consolidate the many suggestions and feature requests from users and operators into the first full-blown release (1.0) for Searx, as well as spend the necessary engineering effort in making the technology ready for even wider deployment.

>> Read more about searx

searx

Searx is a popular meta-search engine, with the aim of protecting the privacy of its users. In the typical use case, few users trust one instance. However, a third-party services can easily fingerprint the users using the IP address of the searx instance and the user's queries. The project aims to create a searx federation to solve this issue. First, a protocol needs to be defined to allow the instances to discover themselves. Then, each instance will be able to proxy the HTTPS requests through other instances, so the user only has to trust one instance. Also, each instance will spread the requests to other instance according to their response time, and make that IP addresses are evenly used, or at least in the best possible way. To ensure the latter, the statistics page will be enhanced and available through an API that other instances will use. The federation will make sure that bots can't abuse this pool of IP address.

>> Read more about searx

SensifAI

Billions of users manually upload their captured videos and images to cloud storages such as Dropbox, Google Drive and Apple iCloud straight from their camera or phone. Their private pictures and video material are subsequently stored unprotected somewhere else on some remote computer, in many cases in another country with quite different legislation. Users depend on the tools from these service providers to browse their archives of often thousands and thousands of videos and photo's in search of some specific image or video of interest. The direct result of this is continuous exposure to cyber threats like extortion and an intrinsic loss of privacy towards the service providers. There is a perfectly valid user-centric approach possible in dealing with such confidential materials, which is to encrypt everything before uploading anything to the internet. At that point the user may be a lot more safe, but from now on would have a hard time locating any specific videos or images in their often very large collection. What if smart algorithms could describe the pictures for you, recognise who is in it and you can store this information and use it to conveniently search and share? This project develops an open source smart-gallery app which uses machine learning to recognize and tag all visual material automatically - and on the device itself. After that, the user can do what she or he wants with the additional information and the original source material. They can save them to local storage, using the tags for easy search and navigation. Or offload the content to the internet in encrypted form, and use the descriptions and tags to navigate this remote content. Either option makes images and videos searchable while fully preserving user privacy.

>> Read more about SensifAI

Software vulnerability discovery

nixpkgs-update automates the updating of software packages in the nixpkgs software repository. It is a Haskell program. In the last year, about 5000 package updates initiated by nixpkgs-update were merged. This project will focus on two improvements: One, developing infrastructure so that the nixpkgs-update can run continuously on dedicated hardware to deliver updates as soon as possible, and Two, integrating with CVE systems to report CVEs that are addressed by proposed updates. I believe these improvements will increase the security of nixpkgs software and the NixOS operating system based on nixpkgs.

>> Read more about Software vulnerability discovery

Sonar: a modular peer-to-peer search engine for the next-generation web

Sonar is a project to research and build a toolkit for decentralized search. Currently, most open-source search engines are designed to work on centralized infrastructure. This proves to be problematic when working within a decentralized environment. Sonar will try to solve some of these problems by making a search engine share its indexes incrementally over a P2P network. Thereby, Sonar will provide a base layer for the integration of full-text search into peer to peer/decentralized applications. Initially, Sonar will focus on integration with a peer-to-peer network (Dat) to expose search indexes securely in a decentralized structure. Sonar will provide a library that allows to create, share, and query search indexes. An user interface and content ingestion pipeline will be provided through integration with the peer to peer archiving tool Archipel.

>> Read more about Sonar: a modular peer-to-peer search engine for the next-generation web

StreetComplete

The project will make collecting data for OpenStreetMap easier and more efficient. OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes. Improving OSM increases quality of services using open data rather than proprietary datasets kept as a trade secret by established companies.

>> Read more about StreetComplete

The Open Green Web

The world wide web has become a mainstay of our modern society, but it is also responsible for a significant use of natural resources. Over the last ten years, The Green Web Foundation (TGWF) has developed a global database of around 1000 hosters in 62 countries that deliver green hosting to their customers, to help speed a transition away from a fossil fuel powered web. This has resulted in roughly 1.5 billion lookups since 2011 - through its browser based plugins, manual checks on the TGWF website and its API, provided by an open source platform. But what if you want to take things one step further? This project will create the world's first search engine with ethical filtering, that will exclusively show green hosted results. In addition to giving a new choice of search engine to environmentally conscious web users, all the code and data will be open sourced. This creates a reference implementation for wider adoption across industry of search providers, increasing demand and visibility around how we power the web. The project build upon the open source search engine Searx, and will collaborate with the developers of that search tool to make "green" search an optional feature for all installs of Searx.

>> Read more about The Open Green Web

Transparency Toolkit

Transparency Toolkit is building a decentralized hosted archiving service that allows journalists, researchers, and activists to create censorship-resistant searchable document archives from their browser. Users can upload documents in many different file formats, run web crawlers to collect data, and manually contribute research notes from a usable interface. The documents are then OCRed (when needed) and indexed in a searchable database. Transparency Toolkit provides a variety of tools to help analyze and understand the documents with text mining, searching/filtering, and manual collaborative analysis. Once users are ready, they can make some or all of the documents available in a public searchable archive. These archives will be automatically mirrored across multiple instances of the software and the raw data will be stored in a distributed fashion.

>> Read more about Transparency Toolkit

variation graph (vgteam)

Vgteam is pioneering privacy-preserving variation graphs, that allow to capture complex models and aggregate data resources with formal guarantees about the privacy of the individual data sources from which they were constructed. Variation graphs relate collections of sequences together as walks through a graph. They are most-commonly applied to genomic data, where they support the compression and query of very large collections of genomes. In this project, we will apply formal models of differential privacy to build variation graphs which do not leak information about the individuals whose genomes were used to build them. The same techniques allow us to extend these models to include phenotype and health information, maximizing their utility for biological research and clinical practice without risking the privacy of participants who shared their data to build them. These tools are not limited to genomes, and will allow the production of privacy-preserving representations of collections of any sensitive data that can be represented in a variation graph form, including collections of personal writing, web browsing histories, or the trajectories of individuals and vehicles through transportation networks.

>> Read more about variation graph (vgteam)

WebXray Discovery

WebXray intends to build a filter extension for the popular and privacy-friendly meta-search Searx that will show users what third party trackers are used on the sites in their results pages. Full transparency of what tracker is operated by what company is provided to users, who will be able to filter out sites that use particular trackers. This filter tool will be built on the unique ownership database WebXray maintains of tracking companies that collect personal data of website visitors.

Mapping the ownership of tracking companies which sell behavioural profiles of individuals, is critical for all privacy and trust-enhancing technologies. Considerable scrutiny is given to the large players who conduct third party tracking and advertising whilst little scrutiny is given to large numbers of smaller companies who collect and sell unknown volumes of personal data. Such collection is unsolicited, with invisible beneficiaries. The ease and speed of corporate registration provides the opportunity for data brokers to mitigate their liability when collecting data profiles. We must therefore establish a systematic database of data broker domain ownership.

The filter extension that will be the output of the project will make this ownership database visible and actionable to end users, and to curate the crowdsourced data and add it to the current database of ownership (which is already comprehensive, containing detailed information on more than 1,000 ad tech tracking domains).

>> Read more about WebXray Discovery

YaCy Grid SaaS

YaCy Grid Search-as-a-Service creates document crawling indexing functionality for everyone. Users of this new platform will be able to create their custom search portal by defining their own document corpus. Such a service is an advantage as a privacy or branding tool, but also allows scientific research and annotation of semantic content. User-group specific domain knowledge can be organized for custom applications such as fueling artificial intelligence analysis. This should be a benefit i.e. for private persons, journalists, scientists and large groups of people in communities like universities and companies. Instances of the portal should be able to self-support themselves financially: there is turn-key infrastructure to handle payments for crawling/indexing amounts as a subscription on a periodical basis while search requests are free for everyone. The portal will consist of free software, and users can download the portal software itself together with the acquired search index data - so everyone can start running a portal for themselves whenever they want.

>> Read more about YaCy Grid SaaS

Calls

Send in your ideas.
Deadline December 1st, 2019.

 
Help fundraising for the open internet with 5 minutes of your time

Project list

Project abstracts