Send in your ideas. Deadline December 1, 2024

NGI Zero Discovery

Projects enabling search and discovery

This page contains a concise overview of projects funded by NLnet foundation that belong to NGI Zero Discovery (see the thematic index). There is more information available on each of the projects listed on this page - all you need to do is click on the title or the link at the bottom of the section on each project to read more. If a description on this page is a bit technical and terse, don't despair — the dedicated page will have a more user-friendly description that should be intelligible for 'normal' people as well. If you cannot find a specific project you are looking for, please check the alphabetic index or just search for it (or search for a specific keyword).

Now, what about NGI Zero Discovery? Well, it is an ambitious grant programme led by NLnet as part of the Next Generation Internet initiative, which focuses on search, discovery and discoverability. You could see these as the trinity of relevant search:

  • in order to be able to search
  • you first need to discover as much as possible relevant items within the desired domain
  • meaning that everything needs to be discoverable (i.e. it needs to be made available and be accessible, but also the right structure and metadata need to be in place for everything to be properly indexed and categorised)

In practical terms you traverse these three the reverse way: everything that is discoverable through a set of mechanism, can be discovered, allowing users to search within the bucket of discovered things to hopefully find whatever they need.

The projects within this fund so far are quite diverse: some focus on discoverability standards like ActivityPub or RELOAD or specific domains like open hardware or threedimensional virtual objects, while other focus on ethical search filters, security updates and software vulnerabilities or different aspects and challenges of building a search services like crawling, multimodal search, address search across different languages, and linked data - and many more...

Check them out and use them in whatever way you need - everything is free and open source so you can study, use, modify and share them with anyone you want.

While NGI Zero Discovery is no longer accepting new proposals, if you have an important idea that deserves to be funded - why not look at our other funds. We are always looking for great ideas!

Logo NLnet: abstract logo of four people seen from aboveLogo NGI Zero: letterlogo shaped like a tag

Interesting in applying for a grant yourself? Check our active theme funds, such as NGI Zero Commons Fund, NGI Mobifree, NGI Fediversity or NGI TALER. Applications to this particular fund are currently closed and no new projects are accepted for now. Donate to help us fund more projects like these.

OCCRP Aleph disambiguation — OCCRP Aleph: disambiguating different people and companies

Aleph is an investigative data platform that searches and cross-references global databases with leaks and public sources to find evidence of corruption and trace criminal connections. The project will improve the way that Aleph connects data across different data sources and how it ranks recommendations and searches for reporters. Our goal is to establish a feedback loop where users train a machine learning system that will predict if results showing a person or company refer to the same person or company. If successful this means journalists can conduct more efficient research and investigations, finding key information more quickly and wasting less time trawling through irrelevant documents and datasets.

>> Read more about OCCRP Aleph disambiguation

AREXERA Crawler — C++ based web crawler

The AREXERA web crawler dates back to the early 2000's when AREXERA GmbH (former TECOMAC GmbH) wrote it as part of a toolset to run public search engines like Seekport in Germany and some other European countries. The AREXERA crawler is written in C++ and was designed from the ground up for speed. The crawler supports the common features, like TLS support, robots.txt, politeness rules and WARC file output. The tool was in full production use until the company went out of business, and subsequently development stopped for a while. Recently the code resurfaced, and AREXERA was reborn as a free and open source project. Recent first tests showed still promising performance compared to other widely crawlers. The aim of the project is to bring the crawler up to date with modern requirements and clean up the code, so it can be properly benchmarked with a representative workload - after all, high crawling speed means faster throughput and a lower power consumption per fetched web page.

>> Read more about AREXERA Crawler

Babelia — Search engine and crawler in Scheme

Babelia is a privacy friendly, decentralized, open source, and accessible search engine. Search has been an essential part of knowledge acquisition from the dawn of time, whether it is antique lexicographically ordered filing cabinets or nowadays computer-based wonders such as Google or Bing. From casual search to help achieve common tasks such as cooking, keeping up with the news, a regular dose of cat memes or professional search such as science research. Search is, and will remain, an essential daily-use tool, and steers human progress forward.

Babelia aims to replace the use of privateer search engines with a search engine that is open, hence under the control of the commons. Babelia wants to be an easy to install, easy to use, easy to maintain, no-code, personal search engine that can scale to billions of documents, beyond a terabyte of text data, for under €100 a month per Babelia instance.

>> Read more about Babelia

Blink RELOAD — Secure P2P real-time communications with RELOAD

REsource LOcation And Discovery specification (RELOAD) is a standard produced by the IETF standard to (as the name indicates) describe how people can search within a local network to discover other people and devices they can then exchange video and voice calls with, send messages etc. Why make every discovery depend on the availability of a global DNS system, if you are actually near each other...

Blink is a mature open source real-time communication application that can be used on different operating systems, based on the IETF SIP standard. It offers audio, video, instant messaging and desktop sharing. Blink RELOAD aims to implement RELOAD (RFC 7904) , which describes a peer-to-peer network that allows participants to discover each other and to communicate using the IETF SIP protocol. This offers an alternative discovery mechanism, one that does not rely on server infrastructure, in order to allow participants to connect with each other and communicate. In addition, the RELOAD specification describes means by which participants can store, publish and share information, in a way that is secure and fully under the control of the user, without a third party controlling the sharing process or the information being shared.

>> Read more about Blink RELOAD

Bonfire Search & Discovery — Improving search and discoverability in the Fediverse

Bonfire is a modular ecosystem for federated networks. The project creates interoperable toolkits that people can use to easily build their own apps to meet their specific needs. Users are then free to interact with multiple people and groups using these apps hosted on their own device, regardless of what federated software these other people use. Federated topics within the Bonfire ecosystem can consist of a hashtag, a category in a taxonomy, a location, etc. This enables users to find a topic they are interested in, see everything that was tagged with that (publicly or in their network), and follow it to receive any new tagged content. This will be interoperable with existing fediverse apps like Mastodon without requiring extra development on their end, and will create a decentralised graph of topics that can help relevant information flow from instance to instance.

All content on a Bonfire instance (including remote content coming in via follows or federated topics) will also be aggregated in a local search index with which the user can search their own data, information from people or groups they follow, as well as content from topics or locations they are interested in from around the fediverse. This search will happen locally on their device (which is a plus for privacy), with results appearing instantly while typing a query, and being able to filter the results (e.g., by object or activity type, hashtags, topics, or language). Every line of Bonfire’s code is available to be used or forked, in a collection of libraries that can be assembled and re-assembled to create all kinds of full-featured apps. One example is Bonfire's mutual aid extension where users can post and search for requests and offers across different instances according to topic and/or geographical location.

>> Read more about Bonfire Search & Discovery

Castopod — Podcasting in the fediverse

Castopod is an open-source podcast hosting solution for everyone, that can connect to the Fediverse through the W3C ActivityPub standard (Pixelfed, Mastodon, Pleroma…). Castopod is user friendly, and allows for easy discovery everywhere. Whether you are a beginner, an amateur or a professional, you will get everything you need: you can create, upload, publish, manage server subscriptions (WebSub embedded server). You can allow users to listen to your podcast directly, but just as easily connect to commercial directories (Apple, Google, Spotify…).

Take back control: interact with your audience on your platform (like, share, comment), the social network IS the podcast. In addition to supporting W3C ActivityPub, you can also export to proprietary social networks (Twitter, Instagram, Youtube, Facebook). Castopod is easily hosted on any PHP/MySQL server: unzip it and you and other podcasters are ready to broadcast professionally.

>> Read more about Castopod

Castopod Mobile — Userfriendly mobile podcasting application

Castopod Mobile is a free and open-source mobile podcast player application (GPL v3). It is intended to be installed on your mobile phone (iOS, Google Android, /e/…). You can install it from F-Droid, from your usual app store or you may compile it yourself for your own needs. Castopod Mobile is a two-in-one application: a podcast player and a Fediverse client. It serves several purposes: to provide a mobile application that takes advantages of ActivityPub features for podcasts (the ones that Castopod Server provides for instance). Secondly, to reduce the complexity of the Fediverse ecosystem during onboarding: account creation currently prevents many users into joining the Fediverse because it is difficult to guess where to begin. And thirdly: to provide a podcast application template for communities who want to build and manage their ecosystem from beginning (with your own private Castopod Server) to end (with your own Castopod Mobile based application).

>> Read more about Castopod Mobile

Discover and move your coins by yourself — A safe way to explore and work with cryptocurrency forks

The numerous technologies behind cryptocurrencies are probably the most difficult to understand compared to any other networks, even for technical experts - and especially bitcoin based networks. Most users, even those familiar with the technology for years, have to rely on wallets or run/sync full nodes. Empirically we can see that they usually get lost at a certain point of time, especially when said wallets dictate the use of new "features", like bip39 and alike, multisig, segwit and bech32. Most users don't understand where their coins are and on what addresses, what is the format of these addresses and what are their seeds and what they need to unlock their coins. This situation pushes users to give their private keys to dubious services, resulting to the loss of all of their coins. The alternative is to let exchanges manage their coins, which removes their agency and puts them at risk. The goal of this project is to correct this situation allowing people to simply discover where are their coins and what are their addresses, whatever features are used. It will allow them to discover their addresses from one coin to another, rediscover their seed if they lost a part, sign/verify addresses ownership, discover public keys from private keys and create their hierarchical deterministic addresses. In fact, all the tools needed to discover and check what is related to their coins - and this for any bitcoin based network, in addition it allows them to create their transactions by themselves and send them to the networks, or just check them. The tool is a standalone secure open source webapp inside browsers that must be used offline, this is a browserification of a nodejs module that can be also used or modified for those that have the technical knowledge.

>> Read more about Discover and move your coins by yourself

Connect by Name — Library for easy connection setup

Connect by Name will be a C library providing an interface that allows a software developer to setup internet connections from an application in the most private and secure manner using well-established and open standards. The interface provided to the software developer will be as simple as “Connect to a service on a domain name” and be flexible enough to fit with different programming paradigms and environments. The library will facilitate composability with other systems and will be extensible with future standards. Our goal is to lower the barrier for developing high-quality software and thereby improve the security and privacy of end users.

>> Read more about Connect by Name

Conzept encyclopedia — An alternative encyclopedia

The Conzept encyclopedia is an attempt to create an encyclopedia for the 21st century. A modern topic-exploration tool based on: Wikipedia, Wikidata, the Open Library, Archive.org, YouTube, the Global Biodiversity Information Facility and many other information sources. A semantic web app build for fun, education and research. Conzept allows you to explore any of the millions of topics on Wikipedia from many different angles - such as science, art, digital books and education - both as a defined semantic entity ("thing") as well as a string. Client-side topic-classification in addition allows for a fast, higher-level logic throughout the whole user experience. Conzept also has an uniquely integrated user-interface, which gives you a single well-designed view of all this information (in any of the 300+ Wikipedia languages), without cognitive overload.

>> Read more about Conzept encyclopedia

Record Federation for Corteza Clouds — Data federation over ActivityPub

Corteza is a low code platform for building cloud-based web applications. This is typically for private, records-based management purposes (e.g. case management, insurance claims processing, public sector management applications, CRM, ERP), but the uses can also be public if required. It has a modular architecture and its data later, presentation layer and automation layer can each be treated individually. Corteza Record Federation makes innovative use of the ActivityPub standard to describe how content from the Corteza data layer can be broadcast across large federations of Corteza clouds. All data types, simple or compound, entire records and entire data models are supported.

Whether it be energy, finance, health, education or smart cities, many industries need to share complex data in real-time or near real-time, while preserving the digital sovereignty of a large number of disparate actors, protecting the privacy of user data and acknowledging the law of whichever territories in which they find themselves operating. Corteza Record Federation allows for the creation of private networks of decentralised “mini-clouds”, all self-hosted and controlled by their owners, where this data exchange can happen as efficiently and more effectively than on any single centralised cloud.

>> Read more about Record Federation for Corteza Clouds

Corteza Discovery — (Geo)search and discovery within federated services

Corteza Discovery will render Corteza as a search-oriented architecture. Corteza is an open source Low Code Application Development solution for building records-based management systems. It can be used in a wide array of applications, from Urban Data Platform for smart city management to business applications and CRM. Corteza is capable of many-to-many data federation and WCAG2.0 accessibility is an objective across all components of the solution.

Advanced, permissioned search will be implemented locally, within federations and between federations. Standards-oriented geolocation and mapping will be supported across the platform. The ultimate goal is to create a compelling, modern and friendly UX for users/citizens - yet based on federated, high-utility Low Code applications which have been specifically designed for purposes of data collection, organisation and portability. Search features such as tokenisation, lemmitisation and "more like this" functionality will enrichen user interaction.

From any point of user interaction with any search, to developers building new applications to be searched, Corteza aims to set a standard for inclusive design.

>> Read more about Corteza Discovery

Privacy Infrastructure for Corteza Federations — Allow users to locate and browse their private data wherever

The project summary for this project is not yet available. Please come back soon!

>> Read more about Privacy Infrastructure for Corteza Federations

ArtistHub — Allow creative artists to gain visibility and build reputation on the web

The Artist Hub is a progressive web app developed by The Creative Passport MTU, that allows users - Music makers - to connect different data sources and display their feeds all in the same global wall arranged in chronological order. Music makers will be able to create a custom fan page on a self-hostable server where all their music and related content can be placed and shared with their fans.

The underlying architecture for subscribing to and receiving posts/updates from connected services will be built using ActivityPub. The idea behind this architecture is a free and open-source way for music makers to share their content without needing to post to a number of different websites and social media and for fans to have the freedom to choose their platform of choice for engaging with that content.

We will use ActivityPub to aggregate data from a number of platforms. This will enable us to offer support for video (using PeerTube), audio (using Funkwhale), images (using PixelFed) and text (using Mastodon).

>> Read more about ArtistHub

DeltaBot — Social discovery over mail-based chat

Why make humans be the only ones to search new content that is relevant to you, if bots can be made to do the same on your behalf? The DeltaBot project will research and develop decentralized, e2e-encrypting and socially trustworthy bots for Delta Chat (https://delta.chat). Bots will bridge with messaging platforms like IRC and Matrix, offer media archiving for its users and provide ActivityPub and RSS/Atom integration to allow users to discover new content. Our project is not only to provide well tested and documented Chat Bots in Python but also help others to write and deploy their own custom bots. Bots will perform e2e-encryption by default and we'll explore seamless ways to resist active MITM attacks.

>> Read more about DeltaBot

Extend EFI support in BSDs — Bring automated firmware update to BSDs

UEFI/EFI support covers boot integrity and as such has become a structural part of Linux, Windows, and other OS-es. There are a number of relevant operating systems however that are not able to benefit from this technical capability just yet. This project would fill that gap by extending EFI support to OpenBSD, NetBSD, and DragonflyBSD. This will allow proper hardware initialization as well as additional security features within those open source operating systems.

>> Read more about Extend EFI support in BSDs

EDeA — A forge suitable for open hardware development

The short version: EDeA is a novel approach to allow exploration of and improve discovery within the open hardware ecosystem - in order to help make open hardware designs and components discoverable and reusable.

At this moment in time, pretty much everything surrounding open hardware development is manual. Beyond just typing something into a generic search engine there isn't really suitable tooling available to search across what already exists. Accessible and usable distributions, collaboration tools and version control are what drove the free and open source software revolution, now open hardware needs to take the same leap forward.

Open hardware electronics projects are growing in numbers, thanks to crowdfunding, a strong developer community, and sophisticated open source electronic design automation (EDA) tools like KiCad. Between circuit schematic and printed circuit board (PCB) layout there is a logical association, but are being handled by separate programs, and therefore one can’t simply copy-paste design blocks. In 2020 it is still next to impossible to reuse proven parts of different designs without needless reimplementation. By leveraging KiCad’s pcbnew and eeschema scripting, a new way of building modular, reusable electronics opens. We are creating a catalog and community portal for discovery and development of proven circuit modules: power management, signal conditioning, data conversion, micro-controllers, etc.

>> Read more about EDeA

AEAP — Automated e-mail address porting to a new provider

There is no search for email addresses, like there was in the days long gone of the phone book. Once an old contact disappears (e.g. moves jobs, changes provider), even hough you may have exchanged many emails with that person you can not discover which new email address(es) go(es) with that old contact.

The Automated E-mail Address Porting project (AEAP) wants to allows you to find the new email addresses of these existing email contacts. The project will research and develop the porting of an e-mail address to a new provider. We will implement, document, user-test and release a porting mechanism for Delta Chat, a leading end-to-end encryption mail client. Users can decide they want to use a new provider by entering credentials for a new e-mail address. The outcome of the AEAP project will be Delta Chat Desktop, Android and iOS releases to all app stores, providing seamless porting of e-mail addresses. Changing an e-mail provider will not depend on the consent of the existing one. GMail and various other "free e-mail" provider lock-in strategies will be weakened, also through the e2e-encryption that our AEAP effort spearheads.

>> Read more about AEAP

Email for expert news — Keep up to date with a flow of publication

Full text search can help locate text within a certain corpus, but it doesn't help much with staying up to date with the continuous development of a certain field. Ingesting the daily flood of potentially relevant publications is time-consuming, and so sharing and delegating effort makes a lot of sense. Bims (Biomed News) and NEP (New Economics Papers) are long standing projects in this vein, based on PubMed and RePEc, respectively. They are early examples of expertise sharing systems that deliver digests - human curated sets of the most relevant new publications. Dedicated experts filter the flow of incoming publications in different domains, allowing everyone to stay up to date with the latest developments through publicly available periodic reports on a variety of topics.

This project aims to build a new software tool to allow users to subscribe to these report across different fields of interest. Subscribers get a fully personalised report meaning they will not have to deal with distractions such as duplicate items. The software aims to be generic, so it may be applied to any serial data of records formatted in a structured way.

>> Read more about Email for expert news

The search for ethical Apps — Create custom, self-hostable app stores for Android(-like) OS-es

Once you own a smartphone, often you will want to install additional apps to add additional functionality. In some cases there isn't much choice, like when you as a citizen need to use digital services provided by your government and these are exclusively available through apps. Pre-configured vendor app stores such as the Google Play store and the Apple App store actually require you to agree to privacy-unfriendly terms of service and introduce tracking behaviour - even if you are only going to be installing ethical apps that themselves are open source and privacy-friendly. On top of that, these apps "warehouses" contain a confusing amount of lookalike and dishonest applications that take advantage of naive consumers. Sending users into an app jungle with hundreds of thousands of apps that often resemble each other, leaves users unprotected. In fact, in many cases the whole idea of a "store" doesn't make sense - like when an app is paid for by public funding.

So why not create alternative mechanisms, that give easy and convenient access to apps do not force citizens to sign contracts with commercial third parties. This project will created custom app distribution mechanisms based on F-Droid, allowing anyone to curate a set of applications and distribute these to users directly - without them having to sign away any rights to third parties.

>> Read more about The search for ethical Apps

FairSync — Simplify aggregation and discovery of places and events

How can we make it possible to search across different maps and lists of events maintained by different organisations? By connecting them, of course! FairSync develops and collects best practices to synchronize maps and events and to federate messengers and identities active in the global movement for sustainability. System integrators are faced with fast evolving APIs and protocols when they try to discover and connect systems and make search more easy.

We will work on master-master replication frameworks of metadata enriched data sets and test with platform providers for sustainability affairs. One approach is the "lazy master scheme": a common update propagation strategy where changes on a primary copy are first committed at the master node, afterwards the secondary copy is updated in a separate transaction at slave nodes.

We will try to advance such immediate update propagation in this project using protocols such as ActivityPub or the InCommon API. Federation of identities will be managed with SAML or oAuth2 protocols with fairlogin as a common identity provider.

>> Read more about FairSync

searx — Federating self-hosted search hubs

Searx is a popular meta-search engine, with the aim of protecting the privacy of its users. In the typical use case, few users trust one instance. However, a third-party services can easily fingerprint the users using the IP address of the searx instance and the user's queries. The project aims to create a searx federation to solve this issue. First, a protocol needs to be defined to allow the instances to discover themselves. Then, each instance will be able to proxy the HTTPS requests through other instances, so the user only has to trust one instance. Also, each instance will spread the requests to other instance according to their response time, and make that IP addresses are evenly used, or at least in the best possible way. To ensure the latter, the statistics page will be enhanced and available through an API that other instances will use. The federation will make sure that bots can't abuse this pool of IP address.

>> Read more about searx

First Classify Documents — Categorise different types of official documents

With governments all over the world turning to digital filing systems, millions of paper files still wait to be digitized. One major challenge in this process is a structured approach to classifying and ordering documents. It is an unfortunate fact that many public documents are bitmap images of texts. For instance, tenders are published digitally but the actual resulting contracts are not published in a way that allows them to be indexed and queried - which hinders civil society in their ability to access these documents. Open source OCR software needs to become better to get good results with this. This project developed a system for models to distinguish between different types of official documents. able to classify state documents according to structure, keywords, document name, word and page count, metadata and context.

>> Read more about First Classify Documents

Folksonomy engine for the food ecosystem — Data modelling by the community

Everybody is interested in the food they eat, by many different aspects, ranging from taste, cost, ingredients and nutrition to its impact on health, the environment and society. We also happen to have many different names for the same food, the way we prepare it and other properties - sometimes only used very locally. That means it is not always easy for everyone to effectively search open data sets like OpenFoodFacts. Open Food Facts - sometimes referred to as the "wikipedia for food products" - is the biggest open food-database in the world.

The Folksonomy engine for the food ecosystem created within this project will unleash an ocean of new data and uses regarding food. Citizens, researchers, journalists, professionals, artists, communities, and innovators will be able to define and add new properties of their choice to food products on Open Food Facts for their own use or to enrich the shared knowledge. Open Food Facts already feeds hundreds of data reuses. Thousands more will become possible thanks to the new user defined properties.

>> Read more about Folksonomy engine for the food ecosystem

ForgeFed — Federation for software collaboration tools

When you are searching for new software to use, you will have to visit many different software forges - like Gitlab, Codeberg or Sourcehut. There isn't really a tool to search for anything across the boundaries of these different software forges.

ForgeFed aims to define a vocabulary and a protocol for decentralized communication and federation of websites used for hosting and collaboration on version control repositories, issue tracking and project management. Typical such websites are code forges such as GitLab and Gitea instances (and centralized services like github), but the idea also applies to applications like collaborative civic planning, publishing of creative writing (such as prose and poetry) and more. ForgeFed is to be designed as an extension of ActivityPub, and web apps implementing it would be joining the Fediverse. The world of repo and project hosting would switch from the centralized model of github (and the lonely disconnected websites running GitLab or Gitea etc.) into a network of federating websites, creating a global decentralized community. The project will publish a set of specifications and guides for implementing the federation protocol, and to work with existing projects and communities to refine and finalize the specifications and implement ForgeFed federation.

>> Read more about ForgeFed

Funkwhale — ActivityPub-driven audio streaming and sharing

Funkwhale is a free, decentralized and open-source audio streaming and sharing platform, built on top of the ActivityPub protocol. It enables users to create communities of interest around music and audio content in general, listen to their private music library or distribute their own productions on the network. Each Funkwhale pod, or server, can communicate with other pods to exchange audio content, metadata or for user interactions. In this project, Funkwhale will improve the publication experience for creators, release its first stable version, improve content discovery inside the platform through better sharing and search mechanisms. We will also continue research and development for Retribute, a community wealth sharing platform meant to support creators on Funkwhale or any other platform.

>> Read more about Funkwhale

GNU Name System — Authenticated naming system for the internet from GNU project

Today, the starting point of any discovery on the Internet is the Domain Name System (DNS). DNS suffers from security and privacy issues. The GNU project has developed the GNU Name System (GNS), a fully decentralized, privacy-preserving and end-to-end authenticated name resolution protocol. In this project, we will document the protocol on a bit-level (RFC-style) and create a second independent implementation against the specification. Furthermore, we will simplify the installation by providing proper packages that, when installed, automatically integrate the GNS logic into the operating system.

>> Read more about GNU Name System

GNU social — Modernizing the original FOSS Social Network

GNU social is a free social networking platform, easily self-hostable and highly accessible, that enables both private and public decentralized communications. With NLnet NGI Zero's support, the project is undergoing a change of main focus from microblogging to groups and tags. With this, GNU social will be a space for communities where users can express their passions and explore new ones. Users will be able to immerse themselves in easily filterable content relevant to their interests, and to create and join communities. It's hard to pinpoint an existing alternative service that promotes the same level of functionality in terms of tagging, filtering and connecting with people that share common interests. Especially considering the available degree of accessibility, customization and expansion via plugins.

>> Read more about GNU social

Tooling to improve security and trust in GNU Guix — Contextual software vulnerability discovery

GNU Guix is a universal functional package manager and operating system which respects the freedom of computer users. It focuses on boostrappability and reproducibility to give the users strong guarantees on the integrity of the full software stack they are running. It supports atomic upgrades and roll-backs which make for an effectively unbreakable system. This project aims to automate software vulnerability scanning of packaged software to protect users against possibly dangerous code.

>> Read more about Tooling to improve security and trust in GNU Guix

Geolexica reverse — Reverse Semantic Search and Ontology Discovery via Machine Learning

Ever forgotten a specific word but could describe its meaning? Internet search engines more than often return unrelated entries. The solution is reverse semantic search: given an input of the meaning of the word (search phrase), provide an output with dictionary words that match the meaning. The key to accurate reverse search lies in the machine’s ability to understand semantics. We employ deep learning approaches in natural language processing (NLP) to enable better comparison of meanings between the search phrases with word definitions. Accuracy will be significantly increased. The project outcome will be employed on Geolexica as a pilot application and testbed for evaluation. The ability to identify entities with similar semantics facilitates ontology discovery in the Semantic Web and in Technical Language Processing (TLP).

>> Read more about Geolexica reverse

Federated software forges with Gitea — Use W3C ActivityPub to federate amond software forges

Gitea is a popular free and open-source software forge, a solution for code hosting, version control (using Git) and featuring other collaborative features like bug tracking, wikis and code review. Unlike proprietary platforms like GitHub, anyone can host the software for themselves and for others - and retain full control and confidentiality over their operations and community. The goal of this project is to implement federation features to Gitea, by implementing among other the W3C ActivityPub standard. This is an important enabler that can be used to implement a distributed search across different software repositories - an important feature for decentralised systems. The project will also make sure to verify the implementation of the federation proposed for Gitea is conformant with the ActivityPub W3C standard as well as the Forgefed models.

>> Read more about Federated software forges with Gitea

Real time graph database search engine — Live filtering on graph database streams

Based is the world's first open source pub/sub real time graph database. It allows for millions of concurrent connections to changes in data or relationships, and offers built-in features such as authentication, internationalisation, server-side scripts for automation, time-series data, and user management. This saves money, complexity, and maintenance. In this project we will work on a full text indexing engine, that will give developers and end users the ability to query text in real time – and get back any updates in text instantly. The search engine is geared toward working with our database, but is applicable to any database in which users are interested in text search that updates in real time and indexes dynamically.

>> Read more about Real time graph database search engine

The Open Green Web — Ethical meta-search filter on green hosted websites

The world wide web has become a mainstay of our modern society, but it is also responsible for a significant use of natural resources. Over the last ten years, The Green Web Foundation (TGWF) has developed a global database of around 1000 hosters in 62 countries that deliver green hosting to their customers, to help speed a transition away from a fossil fuel powered web. This has resulted in roughly 1.5 billion lookups since 2011 - through its browser based plugins, manual checks on the TGWF website and its API, provided by an open source platform. But what if you want to take things one step further? This project will create the world's first search engine with ethical filtering, that will exclusively show green hosted results. In addition to giving a new choice of search engine to environmentally conscious web users, all the code and data will be open sourced. This creates a reference implementation for wider adoption across industry of search providers, increasing demand and visibility around how we power the web. The project build upon the open source search engine Searx, and will collaborate with the developers of that search tool to make "green" search an optional feature for all installs of Searx.

>> Read more about The Open Green Web

Haketilo/Hydrilla — Browser extension for site customisatoin

Internauts today have very little control over their web browsing. Many sites are no longer simple documents meant for reading but complex in-browser applications often equipped with facilities to mistreat their users. Haketilo is a browser extension that aims to change this by giving you complete control over the resources your browser loads for websites, starting with JavaScript. One of its features is the ability to replace sites' javascript programs with user-supplied ones. There is currently no other browser extension that provides users with a secure and fully free browsing experience of this kind. Haketilo works together with its repository, Hydrilla, which it can query for community-developed custom site resources. Both tools are available as free/libre software under GNU licenses. In addition, the Hydrilla API can also be utilized by independent developers who want to increase the amount of user agency in their products. For greater website compatibility, Haketilo will work alongside other browser extensions that mitigate harmful JS.

>> Read more about Haketilo/Hydrilla

Great scanning and OCR for mobile devices

The aim of this project is to improve the scanning and optical character recognition on mobile devices. Currently the cameras of many mobile devices have relatively noisy output whenever lighting conditions are less than optimal. Additionally, it's almost impossible to achieve scans that are distortion free as mobile devices don't have a surface to which the document under scan could be pressed to reliably. These two problems lead to difficulties in performing optical character recognition over acquired images as most recognition algorithms require an input that is noise and distortion free. The solution that will be developed by this project will solve both of these problems by acquiring multiple scan images from different angles. Same objects can then be matched across the source images providing two benefits: the noise can be cancelled out and 3D shape of the document under scan can be derived. Such information can then be used to unfold the document to 2D space and provide a noise and distortion-free image to optical character recognition algorithms. The solution will be implemented taking into account the performance limitations of mobile devices and a major optimization effort will be spent to achieve an acceptable latency of the complex image processing algorithms.

>> Read more about Great scanning and OCR for mobile devices

Hubzilla — Federated social networking environment

Hubzilla is one of the most mature stacks within the so called Fediverse, and is able to run different protocols such as ActivitPub, Diaspora and Zot. Hubzilla provides powerful tools for communities and individuals to help organise themselves, while providing a possibility to interact with each other. It is a decentralised identity, communications and permissions framework built, using common webserver technology. The software features many useful apps to enable discussions, event organisation, file sharing etc. with built-in internet-wide access control. With Hubzilla you don't have an account on a server, you own an identity that you can take with you across the network.

With the help of the NGI Zero grant, the new version of the zot protocol (zot6) will be implemented as the primary communication protocol and the UX/UI will be improved to lower the entry barrier for less experienced computer users. And of course you can easily search your Hubzilla server for topics, users, fora and tags.

>> Read more about Hubzilla

ipfs-search.com — Search engine for the Interplanetary File System

ipfs-search.com is a Free and Open Source (FOSS) search engine for directories, documents, videos, music on the Interplanetary Filesystem (IPFS), supporting the creation of a decentralized web where privacy is possible, censorship is difficult, and the internet can remain open to all.

>> Read more about ipfs-search.com

Icebreaker — Gemini centric viewpoint of coding issues and bug tracking

Modern software projects not only require source code repository management but also tools to plan projects and solve technical problems. Closed source solutions and online commercial services may be convenient, but create significant concerns around control, autonomy and privacy - and they skew discoverability. Icebreaker believes in decentralised approaches which keep the coding repo separate from the project management repo. In terms of cooperation and teamwork, this helps to encourage new, flexible and dynamic approaches. These expectations are solved through the minimalism of the Gemini protocol and its terse Markdown format, Gemtext. It is modern because it is easy to understand; accessible to interact with (whether as a consumer or a contributor); and treats privacy as a foremost priority.

Icebreaker's flagship project, gLean, provides building blocks for navigating and interpreting one or more Gemini content sources (with settings, rulesets, and regex magic). (Non core) modules provide output in alternative formats, including Kanban boards. Creators will control their issue trackers. Creators' terms. Creators' conditions. 'Off-the-shelf' solutions can't compete against gLean's tailored approaches. FOSS communities can choose workflows that match their technical requirements, while supporting autonomy and adhering to their ethical values.

>> Read more about Icebreaker

IN COMMON — Public platform to map and act together for the Commons

IN COMMON emerged as a transnational European collective from a network of non-profit actors to identify, promote, and defend the Commons. We decided to start a common pool for Information Technologies with the aim to create, maintain, and share with the public geo-localized data that belong to our constituents and to articulate citizen movements around a free, public and common platform to map and act together for the Commons. IN COMMON forms a cooperative data library that provides collective maintenance to ensure data is always accurate.

>> Read more about IN COMMON

In-document search — Interoperable Rich Text Changes for Search

There is a relatively unexplored layer of metadata inside the document formats we use, such as Office documents. This allows to answer queries like: show me all the reports with edits made within a timespan, by a certain user or by a group of users. Or: Show me all the hyperlinks inside documents pointing to a web resource that is about to be moved. Or: list all presentations that contain this copyrighted image. Such embedded information could be better exposed to and used by search engines than is now the case. The project expands the ODF toolkit library to dissect file formats, and will potentially have a very useful side effect of maturing the understanding of document metadata at large and for collaborative editing of documents in particular.

>> Read more about In-document search

Practical Tools to Build the Context Web — Declarative setup of P2P collaboration

In a nutshell, the Perspectives project makes collaboration behaviour reusable, and workflows searchable. It provides the conceptual building blocks for co-operation, laying the groundwork for a federated, fully distributed infrastructure that supports endless varieties of co-operation and reuse. The declarative Perspectives Language allows a model to translate instantly in an application that supports multiple users to contribute to a shared process, each with her own unique perspective.

The project will extend the existing Alpha version of the reference implementation into a solid Beta, with useful models/apps, aspiring to community adoption to further the growth of applications for citizen end users. Furthermore, necessary services such as a model repository will be provided. This will bring Perspectives out of the lab, and into the field. For users, it will provide support in well-known IDE's for the modelling language, providing syntax colouring, go-to definition and autocomplete.

Real life is an endless affair of interlocking activities. Likewise, Perspectives models of services can overlap and build on common concepts, thus forming a federated conceptual space that allows users to move from one service to another as the need arises in a most natural way. Such an infrastructure functions as a map, promoting discovery, decreasing dependency on explicit search. However, rather than being an on-line information source to be searched, such the traditional Yellow Pages, Perspectives models allow their users (individuals and organisations alike) to interact and deal with each other on-line. Supply-demand matching in specific domains (e.g. local transport) integrates readily with such an infrastructure. Other patterns of integrating search with co-operation support form a promising area for further research.

>> Read more about Practical Tools to Build the Context Web

Indigenous — Indieweb mobile clients

Indigenous is a collection of native, web and desktop applications which allows you to engage with the Internet as you do on social media sites, but posts it all on your website. Use the built-in reader to read and respond to posts across the internet. Indigenous doesn't track or store any of your information, instead you choose a service you trust or host it yourself. Posts are collected on your website or service which supports W3C Microsub, writing posts uses the W3C Micropub specification. Popular services that support both are Wordpress, Micro.blog and Drupal, with more coming soon.

>> Read more about Indigenous

Interpeer — Collaboration infrastructure with near real-time p2p data synchronization

The Interpeer Project's purpose is to research and develop novel peer-to-peer technologies for open and distributed software architectures. The goal is to enable serverless modes of operation for collaborative software with rich feature sets equal to or surpassing centralized client-server architectures. For that reason, the initial focus lies on facilitating the extreme end of the use case spectrum with very low latency and high bandwidth requirements, as exemplified by peer-to-peer video communications in quality as close to 4k resolution as possible. When that initial goal is reached, the project focus will shift to other collaboriative applications of the technology.

>> Read more about Interpeer

Inventaire — Wikidata-based social sharing of reading experiences

The Inventaire Project is an effort to move forward on the front of accessing information on resources using libre software powered by open knowledge. This ideal is being materialized in the form of inventaire.io, a libre book sharing webapp, inviting everyone to make the inventory of their physical books, declare what they want to do with it (giving, sharing, selling), as well as who should be able to see it (shared publicly through e.g. ActivityPub, or only visible by your friends and groups).

To power those inventories with structured bibliographic data, inventaire.io is also playing the role of a Wikidata-federated open and contributive bibliographic database, extending wikidata.org data with Wikidata-compatible entities (CC0, shared data schema) tailored to our needs, but ready to be pushed to Wikidata when the data contributor deems it appropriate. This linked open data architecture allows users to build their inventories on a huge open knowledge graph, that we believe will, in time, offer exceptional discovery capabilities. This project addresses many features, such as improved privacy settings, accessibility, creating publisher collections and data federation.

>> Read more about Inventaire

Inventaire recommender — Book recommendations in Inventaire

The Inventaire Project is an effort to move forward on the front of accessing information on resources using libre software powered by open knowledge. This ideal is being materialized in the form of inventaire.io, a libre book sharing webapp, inviting everyone to make the inventory of their physical books, declare what they want to do with it (giving, sharing, selling), as well as who should be able to see it (shared publicly through e.g. ActivityPub, or only visible by your friends and groups).

To power those inventories with structured bibliographic data, inventaire.io is also playing the role of a Wikidata-federated open and contributive bibliographic database, extending wikidata.org data with Wikidata-compatible entities (CC0, shared data schema) tailored to our needs, but ready to be pushed to Wikidata when the data contributor deems it appropriate. This linked open data architecture allows users to build their inventories on a huge open knowledge graph, that we believe will, in time, offer exceptional discovery capabilities. Now that this first base of inventories and contributive bibliographic data has reached a certain level of maturity, we want to start moving forward on the next challenges: introduce curation and recommendation mechanisms, improve search tools, offer finer privacy settings, and move forward on decentralization.

>> Read more about Inventaire recommender

Irdest — Local P2P mesh discovery of devices and users

How can you search for wireless devices near you to interact with, without other infrastructure present? The Irdest project allows devices such as laptops and smartphones to create wireless mesh networks over Bluetooth and direct WiFi connections, rather than relying on internet access via mobile networks, and traditional internet service providers. It decentralises the routing and peering mechanisms used to connect people together, to allow users to have more control over their digital lives. In addition to this, direct circuits in a Irdest network are end-to-end encrypted, meaning that data privacy is built into the protocol at a fundamental level.

>> Read more about Irdest

Karrot — Save and share food waste

Karrot started as a free and open-source tool to support grassroots initiatives that save and share food waste, but it has been gradually re-designed to become a more general purpose tool to support various groups of people in their face-to-face activities on a local, autonomous, solidarity-driven and voluntary basis. Some of its defining features are the self-assignment of tasks, full transparency of members' actions and no admin roles, using a trust-based system instead. In order to better support the diverse ways in which people self-organize and practice commoning, this project will further develop features focused in the needs of end users through a participatory design process. We will work with the themes of collective agreements, role assignment and going beyond group boundaries for organising, which includes exploring options for federating. The same way we envision the software to be used, we will continue to work for the governance and organisation of Karrot project itself to be community-driven, transparent and democratic.

>> Read more about Karrot

Kazarma — Bridge ActivityPub and Matrix realms

Matrix-Appservice-CommonsPub is a bridge between two decentralized protocols: Matrix and ActivityPub. The development includes polishing CommonsPub, an Elixir generic ActivityPub implementation, and creating an Elixir library to build Matrix bridges. We will first focus on private messages between Matrix users and users of an ActivityPub-enabled platform, like PeerTube or Funkwhale, then explore the possibilities of synchronizing ActivityPub feeds (e.g. "toots" feeds) in Matrix. The bridge comes as an easy-to-deploy, secure and scalable solution.

>> Read more about Kazarma

Keyoxide — Self-hostable identity proofs with bidirectional linking verification

How do you discover which other online accounts across different services and service providers actually belong to the same person? Keyoxide is a secure, privacy-friendly and decentralized platform to manage online identities, uncompromisingly driven by what the user herself wants to share.

Keyoxide is a new type of service to allow proving linked account ownership on a variety of platforms. Keyoxide levers existing and battle-tested cryptographic primitives. The goal is to give users more control over their online presence, independent from dominant internet actors - without in fact having to depend on any centralised services or third parties. The project will improve the usability of the current Keyoxide, and its emerging underlying technology (Decentralized OpenPGP Identity Proofs). More service providers will be added and additional tools to provide proofs will be developed, to create a smooth and easy onboarding process for less tech-savvy people.

>> Read more about Keyoxide

Collabora Online and LibreOffice — Improved visual document search for cloud service

Today it’s usually easier to use a search engine for information than find it locally, which is not optimal from a digital sovereignty point of view. Part of the problem is that we lack good open source tools to provide context and graphical search of local documents. These tools present plain-text lists for search results, which means people with good graphical memory find information slower. We think it’s a huge opportunity to show the context of search hits in a graphical form to find information faster. Technically, this will mean taking an existing file synchronization and sharing (FSS) solution, hosting your documents on-site. Then improving LibreOffice to index content in documents with their context. We will build a secure REST API on top of this in Collabora Online which provides good performance. Finally we will integrate with a search engine, e.g. Apache Solr to create a proof-of-concept search page that allows searching in all documents hosted in a FSS solution. This will serve as an example how to integrate our solution to other projects like Nextcloud.

>> Read more about Collabora Online and LibreOffice

lemmur — A Lemmy mobile client

Lemmur is a multi-platform client for Lemmy - a federated link aggregator. It aims to bring the fediverse to the hands of regular people by providing a seamless experience across different instances. Currently lemmur implements the majority of functionalities provided by Lemmy making it competitive with existing social media apps. In this project lemmur will expand to support more Quality of Life features such as live comment updates and notifications with websockets, caching, theming system, and custom feeds. Additionally lemmur will expand its and Lemmy's reach by internationalizing the whole app, creating adaptive UI for different platforms, and creating an onboarding experience that will work as an introduction to both lemmur and the fediverse. Lastly lemmur will continue improving the seamless instance experience reducing the need of changing instances to the minimum.

>> Read more about lemmur

Lemmy — ActivityPub for link aggregation

Lemmy is an open-source, easily self-hostable link aggregator that you can use to share and discover interesting new ideas - and discuss them with the world. Its designed to work in the Fediverse, and communicate natively with other ActivityPub services, such as Mastodon, Funkwhale and Peertube.

Lemmy aim to create a decentralized alternative to widely used proprietary services like Reddit. For a link aggregator, this means a user registered on one server can subscribe to communities on any other server, and have discussions with users registered elsewhere. The front page of popular link aggregators is where many people get their daily news, so Lemmy has the potential to help alter the social media landscape.

>> Read more about Lemmy

Lemmy Federation — Lemmy Federation and ActivityPub compliance

Lemmy is an open-source, easily self-hostable link aggregator that you can use to share and discover interesting new ideas - and discuss them with the world. Its designed to work in the Fediverse, and communicate natively with other ActivityPub services, such as Mastodon, Funkwhale and Peertube.

Lemmy aim to create a decentralized alternative to widely used proprietary services like Reddit. For a link aggregator, this means a user registered on one server can subscribe to communities on any other server, and have discussions with users registered elsewhere. The front page of popular link aggregators is where many people get their daily news, so Lemmy has the potential to help alter the social media landscape. In this project, the team focuses on standards compliance, interoperability, internationalisation features, private communities and improving moderation.

>> Read more about Lemmy Federation

XMPP-ActivityPub gateway — XMPP, ActivityPub and E2EE Pubsub

XMPP (aka Jabber) is the vendor-netural internet standard for instant messaging. ActivityPub is a web standard for federated social networking, used in software like Mastodon, Pleroma, PeerTube, Pixelfed and Funkwhale. The project consists of two components: an ActivityPub-XMPP gateway, which will be a component bridging these protocols - enabling ActivityPub users to access XMPP blogs, comments and other features, and vice versa. And adding state of the art end-to-end encryption (E2EE) for PubSub and filesharing, which entails proposing a new XMPP standard which can provide a secure way to publish, retrieve and subscribe to all sorts of data over XMPP.

The project is built on Libervia (previously known as "Salut à Toi"), a communication ecosystem based on XMPP. Libervia offers several interfaces (web, desktop, mobile, command line, text UI) and explores the XMPP protocol beyond instant messaging. Libervia features chat, blogging, file sharing, photo albums, events, forums, etc. Libervia's goal is to develop an all-in-one, easy to use "familial and personal social network", i.e. a tool to communicate with the people close to you securely - and that lets your personal data stay within your control (as it should be).

>> Read more about XMPP-ActivityPub gateway

LibreOffice P2P — Encrypted collaborative editing in the browser

LibreOffice Online is the online version of the popular open source office application, and a leading implementation of the ISO/IEC 26300 OpenDocument Format standard. During the project this free software application will be modified so it can run fully client-side inside a regular browser - meaning you can view and edit office documents without an install required. This provides the technical foundations to support true P2P editing of complex office documents. The ability to remove the entire dependency on a server means that document collaboration is moving towards zero-knowledge implementations – where no single-point of architectural failure exists and no data is required to sit unencrypted on a non-user owned (or trusted) server instance. The improved LibreOffice Online will be able to provide end-to-end encryption – both for the peer2peer use case, as well as securely keeping documents encrypted when at rest. That means data is safe when the user is disconnected, whether it is stored on an untrusted server or in the local Web storage.

>> Read more about LibreOffice P2P

Librecast Live — Live streaming with multicast

The Librecast Live project contributes to decentralizing the Internet by enabling multicast. Multicast is a major network capability for a secure, decentralized and private by default Next Generation Internet. The original design goals of the Internet do not match today's privacy and security needs, and this is evident in the technologies in use today. There are many situations where multicast can already be deployed on the Internet, but also some that are not. This project will build transitional protocols and software to extend the reach of multicast and enable easy deployment by software developers. Amongst others it will produce a C library and POC code using a tunneling method to make multicast available to the entire Internet, regardless of upstream support. We will then use these multicast libraries, WebRTC and the W3C-approved ActivityPub protocol to build a live streaming video service similar to twitch.tv. This will be a complement to the existing decentralised Mastodon and Peertube projects, and will integrate with these services using ActivePub. By doing so we can bring live video streaming services to these existing decentralised userbases and demonstrate the power of multicast at the same time. Users will be able to chat and comment in realtime during streaming (similar to YouTube live streaming). This fills an important gap in the Open Source decentralised space. All video and chat messages will be transmitted over encrypted channels.

>> Read more about Librecast Live

LinkedDataHub — Framework to handle Linked Data at scale

LinkedDataHub is a Knowledge Graph explorer, or in technical terms, a rich Linked Data client combined with a personal RDF dataspace (triplestore). It provides a number of features for end-users: browsing Linked Data, cloning RDF resources to the personal dataspace, searching and querying SPARQL endpoints, creating collections from SPARQL queries, editing remote and local RDF documents, creating and transcluding structured content with visualizations of SPARQL results, charts etc. LinkedDataHub is a standalone product as well as a framework – its data-driven architecture allows extension and customization of at every level from the APIs up to the UI.

We expect LinkedDataHub to become a go-to tool for end-users working with Linked Data and SPARQL: researchers, data scientists, domain experts – regardless of whether they work in the digital humanities, life-sciences or any other domain. We strive to provide an unparalleled Knowledge Graph user experience that is enabled by the RDF stack, with the focus on discovery, exploration and personalization.

>> Read more about LinkedDataHub

MaDada — Using LinkedData to improve FOI processes

MaDada is a free open source platform that simplifies and opens up the process of access by the general public to data and information held by the French government. Making use of the Freedom Of Information (FOI) law, the platform guides citizens to file requests, but also acts as an open data archive and platform for right-to-know or transparency campaigns, by publishing the whole process : the requests history, the resulting correspondence, and the data obtained through it. Launched in October 2019 by Open Knowledge Foundation France members, MaDada has helped 250+ users make over 1200 FOI requests to French public bodies, and is beginning to play an important role in the right-to-know, need for transparency and open government problems.

MaDada is based on the open source software Alaveteli (https://alaveteli.org), which has been adapted and deployed to more than 25 countries in 20 different languages and jurisdictions. Alaveteli offers efficient functions for users to request and manage FOI requests. The NLnet funding will help the project develop and improve discovery and search features of public bodies on madada.fr and Alaveteli software - for instance, in France alone there are more than 60,000 public authorities. This will take advantage of existing digital commons such as Wikidata, and open standards such as schema.org and DCAT.

>> Read more about MaDada

Mailpile Search Integration — Personal email search engine

Mailpile is an e-mail client and personal e-mail search engine, with a strong focus on user autonomy and privacy. This project, "Mailpile Search Integration", will adapt and enhance Mailpile so other applications can make use of Mailpile's built-in search engine and e-mail store. This requires improving Mailpile in three important ways: First, the project will add fine-grained access control, so the user can control which data is and isn't exposed. Second, enabling remote access will be facilitated, allowing a Mailpile running on a personal device to communicate with applications elsewhere on the network (such as smartphones, or services in "the cloud"). And finally, the interoperability functions themselves (the APIs) need to be defined (building on existing standards wherever possible), implemented and documented.

>> Read more about Mailpile Search Integration

Mangaki — Advanced group recommendations

Within a set of search results, what should you do to find the optimal solution for not just a single user but a group? Mangaki is building an open source library for privacy-preserving group recommendations of items. While many content providers suggest recommendations at a personal level, these are often directed to a single user, or are restricted to a generic “family” category. Whenever say a group of friends want to watch a movie, it is often hard to decide what to watch, because people can have really different tastes.

Recommendations are also very privacy-sensitive. A straightforward way might be to share our complete viewing history, but that certainly can lead to embarrassing and awkward situations. So how can we collectively compute a list of relevant items without disclose all of our data unencrypted. The Mangaki project is making an open source library for group recommendations that works in a scalable and distributed way.

>> Read more about Mangaki

Mastodon - groups, filtering, moderation — Group support with ActivityPub

Mastodon is a decentralized open-source social network built on the ActivityPub protocol. It allows users to launch their own instances of social networks, while allowing the instances to connect over the Fediverse. The project foresees the development of groups, advanced filtering, and improved moderation functionality. Groups functionality gives users the option to communicate with a smaller subset of their connections; improved moderation functionality will give admins a toolkit to efficiently deal with reported cases, e.g. with batch actions; advanced filtering adds more sophisticated ways to filter posts.

>> Read more about Mastodon - groups, filtering, moderation

Mepo — Lightweight mobile map search

Mepo is a fast, simple, and hackable OSM map viewer for desktop linux & mobile linux devices (like the Pinephone, Librem 5, and postmarketOS devices) and both environments' various user interfaces (Wayland & X inclusive). Mepo works both offline and online, features a minimalist both touch/mouse and keyboard compatible interface, and offers a UNIX-philosophy inspired underlying design, exposing a powerful command language called mepolang capable of being scripted to provide and customize functionality such as bounding-box search scripts, bookmarks, routing, and more.

>> Read more about Mepo

Practical Decentralised Search and Discovery — Search and discovery inside mesh/adhoc networks

Internet search and service discovery are invaluable services, but are reliant on an oligopoly of centralised services and service providers, such as the internet search and advertising companies. One problem with this situation, is that global internet connectivity is required to use these services, precisely because of their centralised nature. For remote and vulnerable communities stable, affordable and uncensored internet connectivity may simply not be available. Prior work with mesh technology clearly shows the value of connecting local communities, so that they can call and message one another, even in the absence of connectivity to the outside world. The project will implement a system that allows such isolated networks to also provide search and advertising capabilities, making it easier to find local services, and ensuring that local enterprises can promote their services to members of their communities, without requiring the loss of capital from their communities in the form of advertising costs. The project will then trial this system with a number of pilot communities, in order to learn how to make such a system best serve its purpose.

>> Read more about Practical Decentralised Search and Discovery

Meta-Press.es — A press search engine in your browser

Meta-Press.es is a press search engine, in the shape of a browser add-on. When using it, everything happens between the user's computer and the queried newspapers. Using Meta-Press.es, there is no data sent to third party (including our servers). We're not asking the users to believe that we respect their privacy, it's a matter of verifiable fact that we do. That means there is no single point of failure, of surveillance or of censorship.

>> Read more about Meta-Press.es

Meta-Press.es — Retrieve news feeds and search locally

Meta-Press.es is a addon (in the standard WebExtension format) which gives super powers to your web browser. Meta-press.es equips your browser with the capacity to query hundreds of online presss sources in a few seconds and get you the relevant results. It is a drop-in replacement for centralised services like Google News, and in addition helps you to create press reviews (via selection and export of results from automatized searches).

Using Meta-Press.es, it's your web browser that does the work, without any middleman between information sources and you. Your privacy is respected even against the ad or social trackers of the newspapers (as those mechanisms aren't triggered by Meta-Press.es searches). Unlike its news portal competitors, Meta-Press.es transparently shows what was queried and what was not - and you can choose your own information sources (via source selection filters and even source selection pick-up). Everything happens directly on the user device and under control of the user, avoiding single points of censorship and in support of Freedom of the Press and media diversity.

>> Read more about Meta-Press.es

Mobilizon — Find, create and organize events

Mobilizon is a free, libre and federated groups and events management platform. Most proprietary social medias collect behavioral data and social graphs by hosting groups and events management tools (such as Facebook events, MeetUp, etc.). This can become a problem, even more when your group works on topics like activism, raising awareness and empowering citizens. Mobilizon allows for a federation of interconnected hosts, that decentralize by design data concentration while permitting interactions between users across the federation. This group and event management tool has been designed by asking and considering the needs of mobilized citizens. It includes features that has been since implemented as well by mainstream social medias (multiple profiles for each account), and does not reproduces mechanisms driven by the attention economy. As such, Mobilizon is not a social media, it does not pander to egos, but focuseson being a toolkit tomanagecommunities. On top of the eventpublishingtool, it features a group discussion tool (akin to a minimalist forum), a group page management tool (that can be used as a one-page website), a group public and private posts tool (similar to a blog), and a group link directory (to organize links to online documents, resources, etc.). With this grant, Framasoft aims to improve Mobilizon's search results (within an instance as well as throughout the federation) and recommendations. We also want to help people find groups and events close to their interests or their location, as well as allow them to import their events from other platforms when possible (Facebook, MeetUp, etc.).

>> Read more about Mobilizon

MoboSearch — Providing an alternative view on the Android App ecosystem

Mobile phones play a major role in our society, yet they still suffer from severe limitations in how they handle apps. As a result, most people are unaware of the dangers of privacy leaks and are typically offered very constrained search capabilities within one single source of information, the app store. MoboSearch is a new search engine and information portal for apps, empowering users beyond the existing app stores. The system exposes privacy and security information, like app permissions, and gives users new easy and flexible search capabilities that allow to make an informed choice and to increase people's awareness. Openness and interoperability ensure that the system can offer and receive data, so to cooperatively enable a better and healthier app ecosystem.

>> Read more about MoboSearch

Mynij — Portable indexing and search engine for mobile

People feel lost when their connection to the internet is cut. All of a sudden, they cannot search for some reference or quickly look up something online. At the other end, hundreds of millions of servers are 'always on', awaiting the user to come online. Of course, this is neither very resilient nor economic. And it is also not necessary. In the 60s, computers used to occupy a large room. Nowadays, with smartphones, they fit in your hand. A complete copy of the Web (10 PB) already fits on 100 SSDs of 100 TB occupying a volume similar to an original IBM PC. A partial copy of the Web optimised for a single person will thus soon fit on a smartphone.

Mynij believes that Web search will eventually run offline for legal, technical and economic rationale. This is why it is building a general purpose Web search engine that runs offline and fits into a smartphone. It can provide fast results with better accuracy than online search engines. It protects privacy and freedom of expression against recent forms of digital censorship. It reduces the cost of online advertising for small businesses. It brings search algorithms and information presentation under end-user control. And you control its availability: as long as you have a copy and a working device, it can work.

>> Read more about Mynij

NEFUSI — NEFUSI: A novel NEuroFUzzy approach for semantic SImilarity assessment

The challenge of determining the degree of semantic similarity between two expressions of a textual nature has become increasingly important in recent times. The great importance it has in many modern computing areas and the latest advances in neural computation have made the solutions better. NEFUSI (which stands for "NEuroFUzzy approach for semantic SImilarity assessment") aims to go a step further with the design and development of a novel neurofuzzy approach for semantic textual similarity based on neural networks and fuzzy logics. We intend to benefit from the outstanding capabilities of the latest neural models to work with text and, at the same time, from the possibilities that fuzzy logic offers to aggregate and decode numerical values in a personalized way. In this way, the project will build an approach intended to effectively determine the degree of semantic similarity of textual expressions with high accuracy in a wide range of scenarios concerning Search and Discovery.

>> Read more about NEFUSI

Namecoin: ZeroNet and Packaging — Make ZeroNet work with Namecoin

Namecoin provides a decentralized naming system and trust anchor. Its flagship use-case is a decentralized top-level domain (TLD) which is the cornerstone of a domain name system that is resistant to hijacking and censorship. Among other things, this provides a decentralized trust anchor for Public Key Infrastructure that does not require third party trust. It operates independent from the DNSSEC root trust chain, and can thus offer additional security under some circumstances. ZeroNet is a decentralized web-like network of peer-to-peer users, which provides an alternative to TOR hidden services. In the project, Zeronet will be adapted to support a local Namecoin client, and provide additional assurances such as a Host Header-like mechanism to protect users from spoofing. Namecoin will be used as a human-readable naming layer for Tor onion services and ZeroNet sites. This eliminates the user problem of pseudorandom, unmemorable website addresses for onion services and ZeroNet sites, which can facilitate phishing attacks.

>> Read more about Namecoin: ZeroNet and Packaging

Namecoin: Core Infrastructure — Alternative domain name system

Namecoin is a blockchain project that provides a decentralized naming system and trust anchor. Our flagship use-case is a decentralized top-level domain (TLD) which is the cornerstone of a domain name system that is resistant to hijacking and censorship. This project is meant to improve the security and usability of core components of Namecoin.

>> Read more about Namecoin: Core Infrastructure

neuropil — Privacy by design P2P search including IoT

Neuropil is an open-source de-centralized messaging layer that focuses on security and privacy by design. Persons, machines, and applications first have to identify their respective partners and/or content before real information can be sent. The discovery is handled internally and is based on so called "intent messages" that are secured by cryptographic primitives. This project aims to create distributed search engine capabilities based on neuropil, that enable the discovery and sharing of information with significantly higher levels of trust and privacy and with more control over the search content for data owners than today's standard.

As of now large search engines have implemented "crawlers", that constantly visit webpages and categorize their content. The only way to somehow influence the information that is used by search engines is by using a file called „robots.txt“. Other algorithms are only known to the search engine provider. By using a highly standardized "intents" format that protects the real content of users, this model is reversed: data owners define the searchable public content. As an example we seek to implement the neuropil messaging layer with its extended search capabilities into a standard web server to become one actor and to handle and maintain the search index contents of participating data owners. By using the Neuropil messaging layer it is thus possible to build a distributed search engine database that is able to contain and reveal any kind of information in a distributed, concise and privacy preserving manner, without the need for any central search engine provider.

>> Read more about neuropil

Nextcloud — Unified and intelligent search within private cloud data

The internet helps people to work, manage, share and access information and documents. Proprietary cloud services from large vendors like Microsoft, Google, Dropbox and others cannot offer the privacy and security guarantees users need. Nextcloud is a 100% open source solution where all information can stay on premise, with the protected users choose themselves. The Nextcloud Search project will solve the last remaining open issue which is unified, convenient and intelligent search and discoverability of data. The goal is to build a powerful but user friendly user interface for search across the entire private cloud. It will be possible to select data date, type, owner, size, keywords, tags and other metadata. The backend will offers indexing and searching of file based content, as well as integrated search for other contents like text chats, calendar entries, contacts, comments and other data. It will integrate with the private search capabilities of Searx. As a result the users will have the same powerful search functionalities they know and like elsewhere, but respecting the privacy of users and strict regulations like the GDPR.

>> Read more about Nextcloud

Nominatim — Multi-lingual support in address search

Nominatim is an open-source geographic search engine (geocoder). It makes use of the data from OpenStreetMap to built up a database and API that allows to search for any place on earth and lookup addresses for any given geographic location. It is used as the main search engine on the OpenStreetMap website where it serves millions of requests per day but it can also be installed locally. You can easily set it up for a small country on your laptop. Nominatim has always aimed to be usable world-wide for any place in any language. To that end it has used generic, language-agnostic algorithms that assume a uniform data model. This has served us especially well while the OpenStreetMap database was in its early stages of development and changing fast. Now that it has matured, it is time to further improve the search experience by taking into account the particularities of different languages and the different practises when it comes to geographic addressing. We aim to restructure the part of the software that parses the place names and search queries to make it more configurable and make it easier to take into account languages and regional peculiarities.

>> Read more about Nominatim

Nyxt — A programmable browser with advanced search integration

Nyxt is a new type of web browser designed to empower users to find and filter information on the Internet. Web browsers today, largely compete on performance in rendering, all whilst maintaining similar UIs. The common UI they employ is easy to learn, though unfortunately it is not effective for traversing the internet due to its limited capabilities. This presents itself as a problem when a user is trying to navigate the large amounts of data on the Internet and in their open tabs. To deal with this problem, Nyxt offers a set of powerful tools to index and jump around one's open tabs, through search results and the wider Internet. For example, Nyxt offers the ability for the user to filter and process their open tabs by semantic content search. Because each workflow and discipline is unique, the real advantage of Nyxt is in its fully programmable and open API. The user is free to modify Nyxt in any way they wish, even whilst it is running.

>> Read more about Nyxt

Nyxt — Browser integration of federated, distributed platforms

Nyxt is a new type of web browser designed to empower users to find and filter information on the Internet. The information available to browsers is limited by the protocols they understand; the languages they speak. Most browsers only speak HTTP(S), a protocol designed for client/server interactions.

In its latest generation, Nyxt plans to open up access to an Internet beyond HTTP, a larger, more decentralized Internet. The new versions of Nyxt will feature support for XMPP, ActivityPub, and IPFS. Together, these decentralized technologies will power much of the next generation of Internet technologies, and Nyxt will speak their language!

>> Read more about Nyxt

Open Know-How Search — Search Open Hardware Projects

Open Know-How Search is a project to create a search engine for the open source hardware designs. We are building a modern, clean and accessible search experience for makers. Our index will span the entire internet and all existing ways to share designs. Users and platforms will be able to make use of the Open Know-How meta-data standard to help get their projects into the index and surface those that are in advanced stages of development and worth looking at and attempting to re-build. The front page and top results in the search will be a useful resource to someone looking for a new open source hardware project to build and contribute to.

>> Read more about Open Know-How Search

OSF Crawler Cooperation — Support Infrastructure for Open Search initiatives

The Open Search Foundation (OSF) attempts to build a European main stream search engine alternative, under European regulations like privacy and fair participation. Our project builds on the foundations of that OSF search engine to be, in an attempt to combine existing crawling efforts of OSF participants. This is implemented on the real internet scale: petabytes of data, billions of webpages, a hundred million websites with terabytes of communication between the components per day. The scale and regulations call for a concept which has not been implemented before. Existing web-search related projects are invited to contribute their ideas into our larger concept, which could become not just an alternative for Google Search but also has many other uses - even in early stages.

>> Read more about OSF Crawler Cooperation

OpenStreetMap Speed Limits — Infer default speed limits for better quality OpenStreetMap-based routing

OpenStreetMap (OSM) is the worlds largest open geodata set, created and maintained collaboratively by millions of users. Of course there are many other purposes beyond creating a map, for instance finding the best route from A to B. Such usage needs to take into account incomplete data, as coverage of speed limits varies greatly across OSM. Currently, only about 12% of roads in OSM have speed limits set. However, default legal speed limits can often be inferred from other data, such as whether the road is within an urban zone, whether the carriage way is segregated, how many lanes it has, whether it is paved etc.

The goal of this project is to extract the default speed limits for different road and vehicle types for all state legislations, map these to OSM and provide these in a machine-readable form so that it can be consumed by open source routing software such as GraphHopper, Valhalla or OSRM. Further, a reference implementation that interprets this data will be provided.

>> Read more about OpenStreetMap Speed Limits

Offen — Privacy-respecting site analytics

Transparently handling data in the open creates mutual trust: Offen is a fair web analytics software that gives users insights into the data they are generating by giving them access to the same suite of analytics tools site operators themselves are using. One unique aspect of Offen is requiring user consent before collecting any data. Especially in countries that are governed by GDPR and its siblings this is a real world requirement for many websites. This is not only about collecting data, but also about embedding third party content or similar.

Usage metrics come with explanations about their meaning, relevance, usage and possible privacy implications, and also details which kind of data is not being collected. Users can expect full transparency and are encouraged to make autonomous and informed decisions regarding the use of their data, and operators are being enabled to collect needed usage statistics while fully respecting their users' privacy and data. No user data is being collected until the user has explicitly opted-in. All data can be deleted either selectively or in its entirety by the users.

>> Read more about Offen

Omnom — Self-hosted bookmarking and snapshotting with search

Omnom is a webpage bookmarking and snapshotting service. It consists of two parts, a web application which stores and serves the snapshots and the other part is a browser addon to create and save bookmarks. Snapshots created by Omnom are searchable, secure and exact copies of the rendered webpages, even with front-end heavy sites which require multiple actions to reach the relevant content. Omnom also provides functionality to tag bookmarks and highlight key information to be able to organize and efficiently search in your bookmarks and snapshots.

Omnom is a self-hosted free software which can handle multiple users with their own private and publicly visible bookmarks & snapshots. Public bookmarks are available in various formats to support feed creation or programmatic processing.

>> Read more about Omnom

Personal Food Facts — Privacy protecting personalized information about food

Open Food Facts is a collaborative database containing data on 1 million food products from around the world, in open data. This project will allow users of our website, mobile app and our 100+ mobile apps ecosystem, to get personalized search results (food products that match their personal preferences and diet restrictions based on ingredients, allergens, nutritional quality, vegan and vegetarian products, kosher and halal foods etc.) without sacrificing their privacy and having to send those preferences to us.

>> Read more about Personal Food Facts

Open Hospitality Network — Federated hospitality with ActivityPub

Hospitality is part of human tradition, practiced long before any software infrastructure existed. People share with others their homes, and exchange life’s stories and adventures - often without even mention of money. The internet age allowed hosts and travelers from all around the world to find each other more easily, and spontaneous communities emerged online. Nowadays, many hospitality exchange platforms exist which help travelers and hosts find each other.

Open Hospitality Network wants to unify hospitality exchange communities into one federated system conveniently serving travelers and hosts. We envision a variety of platforms to exist, united in diversity, where each of them is built around their own unique culture, yet they all communicate with each other in federation. We'd like them together to create a resilient ecosystem outlasting any particular founders and exchange platforms. Following a collaborative process, we are building software from the community for the community, software that on the one hand helps connect existing communities and on the other enables new federated communities to spring up and flourish.

>> Read more about Open Hospitality Network

Openki.net — Make local events and meetups discoverable

How do you discover what you can learn from the people around you? How do you search what other people in the same region have to offer, like a training course or a debating event?

Openki is an interface between technology and culture. It provides an interactive web platform developed with the goal to remove barriers for universal education for all. The platform makes it simple to organise and manage "peer-to-peer" courses. The platform can be self-hosted, and integrates with OpenStreetMap. At the moment Openki is focused on facilitating learning groups and workshops. The project will improve the tool, so it can be used not only to organise courses (with the collaboration of many different actors, in a more participatory way) but much broader,for bottom-up project initiation, for grassroot organizations and facilitating societal dialogue.

>> Read more about Openki.net

Owncast — ActivityPub powered Livecasting

Owncast is a self-hosted, open source live streaming platform for people to easily host and manage their own live streams. It has become an increasingly popular option for many people to break away from the large centralized services. The project will add Fediverse (ActivityPub) integration in order to provide better means of discovery, increase engagement, and to have interoperability with other applications. The goal is for Owncast to become a fully fledged member of the Fediverse, focusing on people's streams being discovered with existing timelines and search indexes. This would allow people to for instance contribute comments directly from their own ActivityPub powered website or ActivityPub-powered link aggegators like Lemmy.

>> Read more about Owncast

P2Pcollab — Decentralised social search and discovery

This project is working towards creating a more decentralized, privacy-preserving, collaborative internet based on the end-to-end principle where users engage in peer-to-peer collaboration and have full control over their own data, enabling them to collaborate on, publish & subscribe to content in a decentralized way, as well as to discover & disseminate content based on collaborative filtering, while allowing local, offline search of all subscribed & discovered content. The project is researching & developing P2P gossip-based protocols and implementing them as composable libraries and lightweight unikernels with a focus on privacy, security, robustness, and scalability.

>> Read more about P2Pcollab

PRESC Classifier Copies Package — Implementing Machine Learning Copies as a Means for Black Box Model Evaluation and Remediation

The ubiquitous use over the Internet, and in particular in search engines, of often proprietary black-box machine learning models and APIs in the form of Machine Learning as a Service, makes it very difficult to control and mitigate their potential harmful effects (such as lack of transparency, privacy safeguards, robustness, reusability or fairness). Machine Learning Classifier Copying allows us to build a new model that replicates the decision behaviour of an existing one without the need of knowing its architecture nor having access to the original training data. A suitable copy allows to audit the already deployed model, mitigate its shortcomings, and even introduce improvements, without the need to build a new model from scratch, which requires access to the original data.

This project aims to implement a practical solution of this innovative technique into PRESC, an existing free software tool for the evaluation of machine learning classifiers, so that classifier copies are automated and can be easily created by developers using machine learning, in order to reuse, evaluate, mitigate and improve black-box models, ensure a personal data privacy safeguard into their machine learning models, or for any other application.

>> Read more about PRESC Classifier Copies Package

The PeARS app — Building low-resource Web search applications from cognitive models

It is widely believed that Web search engines require immense resources to operate, making it impossible for individuals to explore alternatives to the dominant information retrieval paradigms. The PeARS project aims at changing this view by providing search tools that can be used by anyone to index and share Web content on specific topics. The focus is specifically on designing algorithms that will run on entry-level hardware, producing compact but semantically rich representations of Web documents. In this project, we will use a cognitively-inspired algorithm to produce queryable representations of Web pages in a highly efficient and transparent manner. The proposed algorithm is a hashing function inspired by the olfactory system of the fruit fly, which has already been used in other computer science applications and is recognised for its simplicity and high efficiency. We will implement and evaluate the algorithm on the task of document retrieval. It will then be integrated into a Web application aimed at supporting the growing practice of 'digital gardening', allowing users to research and categorise Web content related to their interests, without requiring access to centralised search engines.

>> Read more about The PeARS app

PeerDB Search — Search for semantic and full-text data

PeerDB Search is an opinionated but flexible open source search system incorporating best practices in search and user interfaces and experience to provide intuitive, fast, and easy to use search over both full-text data and semantic data exposed as facets. The goal of the user interface is to allow users without technical knowledge to easily find results they want, without having to write queries. The system will also allow multiple data sources to be used and merged together. As a demonstration PeerDB will deploy a public instance as a search service for Wikipedia articles and Wikidata data.

>> Read more about PeerDB Search

PeerTube — A decentralised streaming video platform

PeerTube is a free, libre and federated video platform. Video is a very popular class of content and meanwhile accounts for a signicant share of internet traffic, but the choice of hosting has a lot of implications - if you send your viewers to some proprietary platform because you want to avoid cost, what happens after they watch your video? And who watches them watch? PeerTube allows for a federation of interconnected hosts (so more choice of videos wherever you go to see them) while containing the risk of exposing users to profiling, algorithmic pressure that favors extreme content, censorship and other negative aspects of centralised services like YouTube or Vimeo. PeerTube implements the ActivityPub standard and works with peer-to-peer distribution - and therefore viewing. This means no slowing down when a video suddenly goes viral, and much lower distribution costs thanks to shared bandwidth. PeerTube aims to make it easier to host videos on the server side, while remaining practical, ethical and fun on the Internet user side. In this project, Framasoft will work on PeerTube 4.0 with interesting new features such as better search, live streaming, channel customisation and improved accessibility.

>> Read more about PeerTube

Extending PeerTube — Adding advanced search capabailities to PeerTube

This project aims to extend PeerTube to support the availability, accessibility, and discoverability of large-scale public media collections on the next generation internet. Although PeerTube is technically capable to support the distribution of large public media collections, the platform currently lacks practical examples and extensive documentation to achieve this in a timely and cost-efficient way. This project will function as a proof-of-concept that will showcase several compelling improvements to the PeerTube software by [1] developing and demonstrating the means needed for this end by migrating a large corpus of open video content, [2] implementing trustworthy open licensing metadata standards for video publication through the PeerTube platform, [3] and emphasizing the importance of accompanying subtitle files by recommending ways to generate them.

>> Read more about Extending PeerTube

peermaps — Peer to peer cartography

Peermaps is a p2p, offline-friendly way to distribute, view, and embed map data. Instead of fetching data from a centralized tile provider, you fetch data from other peers on the network. Right now we have all of OpenStreetMap processed into a 100GB archive in our p2p spatial database and rendering formats and seeded to hyperdrive and ipfs. This data is hooked up to a proof-of-concept web map viewer.

For this grant, we will build on our proof-of-concept to release a user-oriented map viewer as a web application with search functionality on peermaps.org along with a developer-oriented tool to embed web maps in an iframe. In addition to (p2p) web development, this project will involve research on peer queries for offline and online location-based search, optimizations to the spatial database and p2p layer, webgl graphics improvements in addition to web development in order to produce a usable p2p mapping alternative.

>> Read more about peermaps

A Distributed Software Stack For Co-operation — Facilitating easy ad hoc cooperation

Perspectives aims to be to co-operation, what ActivityPub is to social networks. It provides the conceptual building blocks for co-operation, laying the groundwork for a federated, fully distributed infrastructure that supports endless varieties of co-operation. The declarative Perspectives Language allows a model to translate instantly in an application that supports multiple users to contribute to a shared process, each with her own unique perspective. The project builds a reference implementation of the distributed stack that executes these models of co-operation, and makes the information concerned searchable.

Real life is an endless affair of interlocking activities. Likewise, Perspectives models of services can overlap and build on common concepts, thus forming a federated conceptual space that allows users to move from one service to another as the need arises in a most natural way. Such an infrastructure functions as a map, promoting discovery, decreasing dependency on explicit search. However, rather than being an on-line information source to be searched, such the traditional Yellow Pages, Perspectives models allow their users (individuals and organisations alike) to interact and deal with each other on-line. Supply-demand matching in specific domains (e.g. local transport) integrates readily with such an infrastructure. Other patterns of integrating search with co-operation support form a promising area for further research.

>> Read more about A Distributed Software Stack For Co-operation

PixelDroid — Share and browse photos in the fediverse with a mobile app

PixelDroid is an Android client for Pixelfed, the federated image sharing platform based on W3C ActivityPub. Our goal is to bring the Pixelfed platform to Android and provide a mobile user experience that excites. We aim to provide feature-parity with the Pixelfed web client as well as add additional features - like image and video editing, capturing and uploading directly from the app. During the project we will also make it easy to use multiple accounts, even across different instances. Additionally, we want to contribute to the Pixelfed API with testing and additional documentation.

>> Read more about PixelDroid

Pixelfed Live — Live streaming and other Pixelfed enhancements

Pixelfed is an open source and decentralised photo sharing platform, in the same vein as services like Instagram. The twist is that you can yourself run the service, or pick a reliable party to run it for you. Who better to trust with your privacy and the privacy of the people that follow you? The magic behind this is the ActivityPub protocol - which means you can comment, follow, like and share from other Pixelfed servers around the world as if you were all on the same website. Timelines are in chronological order, and there is no need to track users or sell their data. The platform has many features including Discover, Hashtags, Geotagging, Photo Albums, Photo Filters and a few still in development like Ephemeral Stories. After supporting development of social discovery and a mobile app, NGI Zero funds this project to add a much requested live streaming feature to Pixelfed.

>> Read more about Pixelfed Live

Pixelfed — ActivityPub driven decentralised photo sharing platform

Pixelfed is an open source and decentralised photo sharing platform, in the same vein as services like Instagram. The twist is that you can yourself run the service, or pick a reliable party to run it for you. Who better to trust with your privacy and the privacy of the people that follow you? The magic behind this is the ActivityPub protocol - which means you can comment, follow, like and share from other Pixelfed servers around the world as if you were all on the same website. Timelines are in chronological order, and there is no need to track users or sell their data. The project has many features including Discover, Hashtags, Geotagging, Photo Albums, Photo Filters and a few still in development like Ephemeral Stories. The goal of the project is among others to solidify the technical base, add new features and design and build a mobile app that is compatible with Mastodon apps like Fedilab and Tusky.

>> Read more about Pixelfed

Plaudit — Make good science discoverable through endorsements

Plaudit is open source software that collects endorsements of scholarly content from the academic community, and leverages those to aid the discovery and rapid dissemination of scientific knowledge. Endorsements are made available as open data. The NGI Search & Discovery Grant will be used to simplify the re-use of endorsement data by third parties by exposing them through web standards.

>> Read more about Plaudit

Poliscoops — Make political news and online debate accessible

PoliFLW is an interactive online platform that allows journalists and citizens to stay informed, and keep up to date with the growing group of political parties and politicians relevant to them - even those whose opinions they don't directly share. The prize-winning polical crowdsourcing platform makes finding hyperlocal, national and European political news relevant to the individual far easier. By aggregating the news political parties share on their websites and social media accounts, PoliFLW is a time-saving and citizen-engagement enhancing tool that brings the internet one step closer to being human-centric. In this project the platform will add the news shared by parties in the European Parliament and national parties in all EU member states. , showcasing what it can mean for access to information in Europe. There will be a built-in translation function, making it easier to read news across country borders. PoliFLW is a collaborative environment that helps to create more societal dialogue and better informed citizens, breaking down political barriers.

>> Read more about Poliscoops

PrivateRecSys — Privacy-Friendly Recommendation System

The use of recommender systems has grown significantly in recent years, with users receiving personalised recommendations ranging from products to buy, news to read, movies to watch, people to follow. At the same time, recommender systems have become extremely effective revenue drivers for online business. However, producing personalised recommendations requires collecting of users’ data, which makes conventional recommenders effective at the cost of users' privacy. The PrivacyRecSys project aims to develop an open-source toolkit for delivering accurate recommendations while respecting users' privacy. The toolkit will consist of novel privacy-preserving recommender approaches, which modify the state-of-the-art recommender approaches by applying the principles of differential privacy, homomorphic encryption and federated learning.

>> Read more about PrivateRecSys

Private Searx — Add private resources to the open source Searx metasearch engine

Searx is a popular meta-search engine letting people query third party services to retrieve results without giving away personal data. However, there are other sources of information stored privately, either on the computers of users themselves or on other machines in the network that are not publically accessible. To share it with others, one could upload the data to a third party hosting service. However, there are many cases in which it is unacceptable to do so, because of privacy reasons (including GPPR) or in case of sensitive or classified information. This issue can be avoided by storing and indexing data on a local server. By adding offline and private engines to searx, users can search not only on the internet, but on their local network from the same user interface. Data can be conveniently available to anyone without giving it away to untrusted services. The new offline engines would let users search in local file system, open source indexers and data bases all from the UI of searx.

>> Read more about Private Searx

Re-isearch — Vectorise text with a flexible unit of retrieval

*Project re-isearch: a novel multimodal search and retrieval engine using mathematical models and algorithms different from the all-too-common inverted index (popularized by Salton in the 1960s). The design allows it to have no limits on the frequency of words, term length, number of fields or complexity of structured data and support even overlap--- where fields or structures cross other's boundaries (common examples are quotes, line/sentences, biblical verse, annotations). Its model enables a completely flexible unit of retrieval and modes of search.

Initial project outcome: a freely available and completely open-source (and multiplatform) C++ library, bindings for other languages (such as Python) and some reference sample code using the library in some of these languages.

>> Read more about Re-isearch

Great OCR for SANE — Integrate OCR capabilities into open source scanning tools

We have become dependent on search engines, allowing us to locate a document using some specific words across billions of webpages. However, not every document is born digital - or may reach the web via an indirect way. And users with for instance visual disabilities cannot read documents that are 'just' pixels.

The SANE project is a collection of open-source scanner drivers and related software. SANE tools allow the users to convert their documents, photos and any other similar material from a completely unsearchable and non-discoverable analog form into a digital representation, which can be easily shared and distributed.

The SANE-OCR project enables users to close the gap right at the stage when physical documents are converted from their incoming "analog" form to a searchable digital form - using a completely open-source stack. While the traditional result of scanning is just the visual image (essentially a photo), but in addition contains the recognized text using optical character recognition (OCR). This outputs documents which are searchable and discoverable.

>> Read more about Great OCR for SANE

SCION-RAINS — RAINS, Another Internet Naming Service (or, a DNS alternative)

RAINS (which recursively stands for RAINS, Another Internet Naming Service) is an alternative name resolution protocol that has been designed with the aim to provide an ideal naming service for the SCION Internet architecture. SCION is one of the most ambitious and realistic alternative Internet architectures currently in play, and has interesting traits such as route control, failure isolation, multipath capabilities and explicit trust information for end-to-end communication.

The RAINS architecture is simple but effective, while it resembles the architecture of DNS it also benefits from being a clean-slate design and provides security across all TLD's - where DNS with DNSSEC fails to provide such capabilities across the board. RAINS, unlike DNS, has no relative clocks: the DNS TTL is replaced by the absolute validity timestamps on the signature. All records are signed.

>> Read more about SCION-RAINS

SCION-Pathdiscovery — Secure and reliable decentralized storage platform

With the amount of downloadable resources such as content and software updates available over the Internet increasing year over year, it turns out not all content has someone willing to serve all of it up eternally for free for everyone. And in other cases, the resources concerned are not meant to be public, but do need to be available in a controlled environment. In such situations users and other stakeholders themselves need to provide the necessary capacity and infrastructure in another, collective way.

This of course creates new challenges. Unlike a website you can follow a link to or find through a standard search engine and which you typically only have to vet once for security and trustworthiness, the distributed nature of such a system makes it difficult for users to find the relevant information in a fast and trustworthy manner. One of the essential challenges of information management and retrieval in such a system is the location of data items in a way that the communication complexity remains scalable and a high reliability can be achieved even in case of adversaries. More specifically, if a provider has a particular data item to offer, where shall the information be stored such that a requester can easily find it? Moreover, if a user is interested in a particular information, how does he discover it and how can he quickly find the actual location of the corresponding data item?

The project aims to develop a secure and reliable decentralized storage platform enabling fast and scalable content search and lookup going beyond existing approaches. The goal is to leverage the path-awareness features of the SCION Internet architecture to use network resources efficiently in order to achieve a low search and lookup delay while increasing the overall throughput. The challenge is to select suitable paths considering those performance requirements, and potentially combining them into a multi-path connection. To this end, we aim to design and implement optimal path selection and data placement strategies for a decentralized storage system.

>> Read more about SCION-Pathdiscovery

Geographic tagging of Routing and Forwarding — Geographic tagging and discovery of Internet Routing and Forwarding

SCION is the first clean-slate Internet architecture designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. As a path-based architecture, SCION end-hosts learn about available network path segments, and combine them into end-to-end paths, which are carried in packet headers. By design, SCION offers transparency to end hosts with respect to the path a packet travels through the network. This has numerous applications related to trust, compliance, and also privacy. By better understanding of the geographic and legislative context of a path, users can for instance choose trustworthy paths that best protect their privacy. Or avoid the need for privacy intrusive and expensive CDN's by selecting resources closer to them. SCION is the first to have such a decentralised system offer this kind of transparency and control to users of the network.

>> Read more about Geographic tagging of Routing and Forwarding

SWH package manager Data Ingestion — Add Package managers to Software Heritage

Software Heritage's ambition is to collect, preserve, and share all software that is publicly available in source code form. In this project we improve the SWH scanner tool which compares any set of files with the SWH archive. This is very useful for detecting license violations or security issues. The goal of the project is to take the scanner from a research prototype to a widely available and usable tool. This involves work around its packaging, user interface, robustness and performance. We will be re-purposing the advanced graph-comparison algorithm from the Mercurial DVCS to minimize the load to the SWH archive. We will also expand the list of existing source code origins we will create new listers and loaders for Maven, Go, Packagist, RubyGems, Bower, CPAN and pub.dev/Dart package managers.

>> Read more about SWH package manager Data Ingestion

Storing Efficiently Our Software Heritage — Faster retrieval within Software Heritage

Software Heritage (https://www.softwareheritage.org) is the single largest collection of software artifacts in existence. But how do you store this in a way that you can find something fast enough, taking into account that these are billions of files with a huge spread in file sizes? "Storing Efficiently Our Software Heritage" will build a web service that provides APIs to efficiently store and retrieve the 10 billions small objects that today comprise the Software Heritage corpus. It will be the first implementation of the innovative object storage design that was designed early 2021. It has the ability to ingest the SWH corpus in bulk: it makes building search indexes an order of magnitude faster, helps with mirroring etc. The project is the first step to a more ambitious and general purpose undertaking allowing to store, search and mirror hundreds of billions of small objects.

>> Read more about Storing Efficiently Our Software Heritage

Adera — Relevant scientific research results

The project summary for this project is not yet available. Please come back soon!

>> Read more about Adera

SEARXR — Virtual reality for web search

SearXR brings a beautiful, privacy-respecting search to 2D and 3D devices. Why? Because searching on alternative devices (VR headsets, conference-presentation) is not always easy nor private. SearXR aims to provide alternative search interfaces which are more appropriate for VR, AR and big screens. SearXR aims to progressively enhance these search experiences: better screen-layout, privacy, and WebXR compatibility. All features are based on user preferences and available hardware. Built upon SearX and W3C's WebXR technology, it will enable everybody to search, or add XR-features to their SearX instance. Whether it be state of the art headsets, or a 65” screen: pointing the browser to an SearXR-instance will immediately launch a wonderful, privacy-respecting search experience.

>> Read more about SEARXR

searx — A privacy-respecting, hackable metasearch engine

Searx (/sɜːrks/) is a free metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. Across all categories, Searx can fetch and combine search results from more than 80 different engines. This includes major commercial search engines like Bing, Google, Qwant, DuckDuckGo and Reddit, as well as site-specific searches such as Wikipedia and Archive.is. Searx is a self hosted web application, meaning that every user can run it for themselves and others - and add or remove any features they want. Meanwhile, numerous publicly accessible instances are hosted by volunteer organizations and individuals alike. The project will consolidate the many suggestions and feature requests from users and operators into the first full-blown release (1.0) for Searx, as well as spend the necessary engineering effort in making the technology ready for even wider deployment.

>> Read more about searx

Dynamic indexing for real time graph database — Provide faster query results through algorithmic preprocessing

Based is an open source real time data platform with a suite of features that help developers build more performant applications faster and with more flexibility. It’s built on a self-developed real time graph database and the WebSocket protocol to ensure performance and scaling.

One of the features is an automatic indexing system that keeps track of frequently performed queries by monitoring a set of (real time) parameters and assigning values to queries, that in turn inform which parts of the graph to index. This index has to work with the Based real time graph database and optimise its performance, which means the index also has to be aware of any changes in schema structure or updates in indexed data. This is achieved through the existing subscription engine in Based. Our hope is that this project can lay the groundwork for more efficient indexing systems for all graph databases.

>> Read more about Dynamic indexing for real time graph database

SensifAI — AI driven image tagging

Billions of users manually upload their captured videos and images to cloud storages such as Dropbox, Google Drive and Apple iCloud straight from their camera or phone. Their private pictures and video material are subsequently stored unprotected somewhere else on some remote computer, in many cases in another country with quite different legislation. Users depend on the tools from these service providers to browse their archives of often thousands and thousands of videos and photo's in search of some specific image or video of interest. The direct result of this is continuous exposure to cyber threats like extortion and an intrinsic loss of privacy towards the service providers. There is a perfectly valid user-centric approach possible in dealing with such confidential materials, which is to encrypt everything before uploading anything to the internet. At that point the user may be a lot more safe, but from now on would have a hard time locating any specific videos or images in their often very large collection. What if smart algorithms could describe the pictures for you, recognise who is in it and you can store this information and use it to conveniently search and share? This project develops an open source smart-gallery app which uses machine learning to recognize and tag all visual material automatically - and on the device itself. After that, the user can do what she or he wants with the additional information and the original source material. They can save them to local storage, using the tags for easy search and navigation. Or offload the content to the internet in encrypted form, and use the descriptions and tags to navigate this remote content. Either option makes images and videos searchable while fully preserving user privacy.

>> Read more about SensifAI

Simmel — A wearable contact tracing beacon/scanner

Simmel is a platform that enables COVID-19 contact tracing while preserving user privacy. It is a wearable hardware beacon and scanner which can broadcast and record randomized user IDs. Contacts are stored within the wearable device, so you retain full control of your trace history until you choose to share it.

The Simmel design is open source, so you are empowered to audit the code. Furthermore, once the pandemic is over, you are able to recycle, re-use, or securely destroy the device, thanks to the availability of hardware and firmware design source.

The contact tracing algorithm is programmed using CircuitPython, to facilitate ease of code audit and community participation. The Simmel project does not endorse a specific contact tracing platform, but it is inherently not compatible with contact tracing proposals that rely on the constant upload of data to the cloud.

>> Read more about Simmel

Software Heritage — Collect, preserve and share the source code of all software ever written

Software Heritage is a non profit, multi-stakeholder initiative with the stated goal to collect, preserve and share the source code of all software ever written, ensuring that current and future generations may discover its precious embedded knowledge. This ambitious mission requires to proactively harvest from a myriad source code hosting platforms over the internet, each one having its own protocol, and coping with a variety of version control systems, each one having its own data model. This project will amongst other help ingest the content of over 250000 open source software projects that use the Mercurial version control system that will be removed from the Bitbucket code hosting platform in June 2020.

>> Read more about Software Heritage

Solid Application Interoperability

Solid Application Interoperability specification details how Agents in the Solid ecosystem can read, write, and manage data stored in a Solid pod using disparate Applications, either individually or in collaboration with other Agents. Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone's Pod, they control which people and applications can access it. Solid was initiated and is currently led by the inventor of the World Wide Web, sir Tim Berners-Lee. Solid Application Interoperability provides clear way to create intuitive data boundaries and higher level patterns to manage access to that data following the principle of least privilege. Specification is accompanied by a primer and sample implementations.

>> Read more about Solid Application Interoperability

Solid-NextCloud app — Bridge Nextcloud to Solid

This project connects the world of Solid with the world of Nextcloud. The aim is to develop an open source Nextcloud app that turns a Nextcloud server into a spec-compliant Solid server. It gives every user a WebID profile and allows Solid apps to store data on the user's Nextcloud account. It also exposes some of the user's existing Nextcloud data like contacts and calendar events as Solid user data, so that Solid apps can interact with the user's Nextcloud data, and allow the user to manage which Solid apps can access which specific aspects of the user's personal data. We will make our implementation compatible with the latest version of the Solid spec (including DPop tokens and the WebSockets AUTH command), and contribute the surface tests we create for this as a well-documented independent test-suite, for other Solid server implementers to benefit from. We will also publish a stand-alone version of our PHP components, which can run independently of Nextcloud.

>> Read more about Solid-NextCloud app

Solid-Search — Queries in a pod

Solid-Search aims to provide an open source module that adds full-text search functionality to Solid pods. Solid is an emergent specification initiated by the inventor of the World Wide Web, sir Tim Berners-Lee. Solid aims to decentralize the web by decoupling applications from databases by introducing Solid Pods (personal online datastores that are in full control of the data owner). Having a way to search through your personal data on your Solid Pod is a must-have for the project to become truly successful. However, this requires technology that does not exist yet: a full-text search interface that works with schema-less RDF data. In order to maximize adoption and retain a modular, open approach, we will standardize the way in which data changes are described. By doing so, it will be relatively easy to introduce new search / query systems (such as search by location). The project will will create the open source search back-end, improve linked data synchronisation specs, link the module to two solid implementations, create a front-end for end-users, and write a tutorial for adding data sources.

>> Read more about Solid-Search

Solid Application Interoperability

Solid Application Interoperability specification details how Agents in the Solid ecosystem can read, write, and manage data stored in a Solid pod using disparate Applications, either individually or in collaboration with other Agents. Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone's Pod, they control which people and applications can access it. Solid was initiated and is currently led by the inventor of the World Wide Web, sir Tim Berners-Lee. Solid Application Interoperability provides clear way to create intuitive data boundaries and higher level patterns to manage access to that data following the principle of least privilege. In this follow up project there is a focus on implementing the Authorization Agent service in TypeScript. It will also work on the SAI specification, which needs to provide more details on how the agent who receives access grant gets updated when the access grant is replaced by a new one. The Authorization Agent service will also implement server to server subscription type developed in the Solid Authentication panel.

>> Read more about Solid Application Interoperability

Sonar: a modular peer-to-peer search engine — Modular peer-to-peer search engine

Sonar is a project to research and build a toolkit for decentralized search. Currently, most open-source search engines are designed to work on centralized infrastructure. This proves to be problematic when working within a decentralized environment. Sonar will try to solve some of these problems by making a search engine share its indexes incrementally over a P2P network. Thereby, Sonar will provide a base layer for the integration of full-text search into peer to peer/decentralized applications. Initially, Sonar will focus on integration with a peer-to-peer network (Dat) to expose search indexes securely in a decentralized structure. Sonar will provide a library that allows to create, share, and query search indexes. An user interface and content ingestion pipeline will be provided through integration with the peer to peer archiving tool Archipel.

>> Read more about Sonar: a modular peer-to-peer search engine

sourcehut — Graph query support for software development platform

SourceHut is a free-software platform providing infrastructure for free-software projects, providing hosted repositories, mailing lists, bug trackers, real-time chat tools, and continuous integration infrastructure, among other services, and facilitating collaboration and project discovery via a federated project index. SourceHut focuses on performance, accessibility, and robustness, and since 2018 has provided a reliable platform supporting the thousands of FOSS projects that depend on its services. The NLnet project will expand the integration between SourceHut services, and between SourceHut and independently operated third-party services, primarily through the development of a comprehensive federation of GraphQL APIs.

>> Read more about sourcehut

Spritely — Capability based petname system

Users are currently caught between two worlds of identity solutions: prepackaged centralized identity silos (which also tend to be very phishing-vulnerable) and more decentralized naming systems that awkwardly separate the experience of secure connections from identity. What if instead users could have an experience where decentralized naming was a natural outgrowth of using the application? Spritely is a laboratory project to advance the decentralized social web founded by authors of the popular ActivityPub federated social web protocol. Spritely's approach to decentralized naming systems is to implement a "petnames system", where local meaning is given to "petnames" to otherwise non-human-meaningful decentralized identifiers (such as a hash of cryptographic key material). An important part of this design is that decentralized naming flows should be a natural part of use of the program.

Petnames tend to resemble local contacts in a "contact list", but petnames on their own do not provide a sufficient way to discover, meet, and come to trust new contacts. A complete petname system also provides "edge names": for example "CWebber=>JessicaTallon" would show JessicaTallon as an "edge name" proposed by the petname CWebber. Our system also provides support for contacts introduced in a context with no existing relationships; these are called "self-proposed names" and are rendered in a way distinct from petnames and edge names. This has been under-implemented in existing petname systems; since Spritely is implementing decentralized communication systems, this will be a full implementation of a petname system (including edge names and self-proposed names) in an ergonomic manner that can also be applied to other decentralized systems. In addition to a specification, the project will delivered a usable chat application plus contact list.

>> Read more about Spritely

StreetComplete — Fix open geodata with OpenStreetMap

The project will make collecting data for OpenStreetMap easier and more efficient. OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes. Improving OSM increases quality of services using open data rather than proprietary datasets kept as a trade secret by established companies.

>> Read more about StreetComplete

StreetComplete — Collaborative editing in OpenStreetMap

StreetComplete is a mobile app that makes it easy and fun to contribute to OpenStreetMap while on and about. OpenStreetMap is the largest open data community about maps, and the go-to source for free geographic data when doing a location-based search. This project focuses on making the collection of data to be used in a search more powerful and efficient. More specifically, the main goals are to add the possibility to collect more data with an easy interface and to add a new view in which it shall be more efficient to complete and keep up-to-date certain types of data, such as housenumbers or cycleways.

>> Read more about StreetComplete

StreetComplete UX — Improve usability of StreetComplete

OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes.

The project will make collecting open data for OpenStreetMap easier and more efficient, and lower the threshold for contribution by improving usability and accessibility. Any user should be able to help improve OpenStreetMap data, simply by downloading the app from F-droid or Google store and map as they walk.

>> Read more about StreetComplete UX

URL Frontier 2.0 — Enterprise features for URLFrontier

URLFrontier provides a crawler-neutral API and service implementation for a crawl frontier, which can power various web crawlers independently from their implementation language and scalability. This API defines the operations that a web crawler typically does when communicating with a web frontier e.g. get the next N URLs to crawl, update the information about URLs already processed, change the crawl rate for a particular hostname, get the list of active hosts, get stats, etc… The aim of this project is to turn what is currently a working piece of software (the result of an earlier grant from NGI Zero Discovery) into an enterprise-grade solution. The improvements will mainly concern the service implementation, eg. monitoring/reporting, clustering/discovery and robustness/resilience. The project will improve the usability of the system by adding configurable logging and metrics reporting, improve the performance of the service for very large volumes of data by adding efficient parallelization across multiple nodes; and improve the overall robustness through more graceful failure modes and more efficient restarts .

>> Read more about URL Frontier 2.0

URL Frontier — Develop a API between web crawler and frontier

Discovering content on the web is possible thanks to web crawlers, luckily there are many excellent open source solutions for this; however, most of them have their own way of storing and accessing the information about the URLs. The aim of the URL Frontier project is to develop a crawler-neutral API for the operations that a web crawler when communicating with a web frontier e.g. get the next URLs to crawl, update the information about URLs already processed, change the crawl rate for a particular hostname, get the list of active hosts, get statistics, etcetera. It aims to serve a variety of open source web crawlers, such as StormCrawler, Heritrix and Apache Nutch.

The outcomes of the project are to design a gRPC schema then provide a set of client stubs from the schema as well as a robust reference implementation and a validation suite to check that implementations behave as expected. The code and resources will be made available under Apache License as a sub-project of crawler-commons, a community that focuses on sharing code between crawlers. One of the objectives of URL Frontier is to involve as many actors in the web crawling community as possible and get real users to give continuous feedback on our proposals.

>> Read more about URL Frontier

variation graph (vgteam) — Privacy enhanced search within e.g. genome data sets

Vgteam is pioneering privacy-preserving variation graphs, that allow to capture complex models and aggregate data resources with formal guarantees about the privacy of the individual data sources from which they were constructed. Variation graphs relate collections of sequences together as walks through a graph. They are traditionally applied to genomic data, where they support the compression and query of very large collections of genomes.

But there are many types of sensitive data that can be represented in a variation graph form, including geolocation trajectory data - the trajectories of individuals and vehicles through transportation networks. Epidemiologists can use a public database of personal movement trajectories to for instance do geophylogenetic modeling of a pandemic like SARS-CoV2. The idea is that one cannot see individual movements, but rather large scale flows of people across space that would be essential for understanding the likely places where a outbreak might spread. This is essential information to understand at scientific and political level how to best act in case of a pandemic, now and in the future.

The project will apply formal models of differential privacy to build variation graphs which do not leak information about the individuals whose data was used to construct them. For genomes, the techniques allow us to extend the traditional models to include phenotype and health information, maximizing their utility for biological research and clinical practice without risking the privacy of participants who shared their data to build them. For geolocation trajectory data, people can share data in the knowledge that their social graph is not exposed. The tools themselves are not limited to the above use cases, and open the doors to many other types of applications both online (web browsing histories, social media usage) and offline. .

>> Read more about variation graph (vgteam)

Web Annotation — Building blocks for interoperable annotation systems

The idea of web annotation is to support the creation and exchange of annotations on any visited page; thereby enabling people to make, share, and discover corrections, rebuttals, side-notes, or other contextually relevant resources. Using the W3C’s Web Annotation standard, and contributing to the incubating Apache Annotator project, this project works on modules and tools that facilitate a diverse ecosystem of interoperable annotation systems.

>> Read more about Web Annotation

WebXray Discovery — Expose tracking mechanism in search hubs

WebXray intends to build a filter extension for the popular and privacy-friendly meta-search Searx that will show users what third party trackers are used on the sites in their results pages. Full transparency of what tracker is operated by what company is provided to users, who will be able to filter out sites that use particular trackers. This filter tool will be built on the unique ownership database WebXray maintains of tracking companies that collect personal data of website visitors.

Mapping the ownership of tracking companies which sell behavioural profiles of individuals, is critical for all privacy and trust-enhancing technologies. Considerable scrutiny is given to the large players who conduct third party tracking and advertising whilst little scrutiny is given to large numbers of smaller companies who collect and sell unknown volumes of personal data. Such collection is unsolicited, with invisible beneficiaries. The ease and speed of corporate registration provides the opportunity for data brokers to mitigate their liability when collecting data profiles. We must therefore establish a systematic database of data broker domain ownership.

The filter extension that will be the output of the project will make this ownership database visible and actionable to end users, and to curate the crowdsourced data and add it to the current database of ownership (which is already comprehensive, containing detailed information on more than 1,000 ad tech tracking domains).

>> Read more about WebXray Discovery

XWiki — Bring wiki capabilities into the Fediverse

XWiki is a modern and extensible open source wiki platform. Up until now, XWiki had been focusing on providing the best collaboration experience and features to its users. We're now taking this to the next level by having XWiki be part of the larger federation of collaboration and social software (a.k.a. fediverse), thus allowing users to collaborate externally. XWiki is embracing the W3C ActivityPub specification. Specifically we're implementing the server part of the specification, to be able to both view activity and content happening in external services inside XWiki itself and to make XWiki's activity and content available from these other services too. A specific but crucial use case, is to allow content collaboration between different XWiki servers, sharing content and activity.

>> Read more about XWiki

WikiRate Insights — Transforming WikiRate ESG Platform User Experience to Maximise Reliable Data Insights

For too long actionable data about the behavior of companies has been hidden behind the paywalls of commercial data providers. As a result only those with sufficient resources were able to advocate and shape improvements in corporate practice. Since launching in 2016, WikiRate.org has become the world’s largest open source registry of ESG (Environmental, Social, and Governance) data with nearly 1 million data points for over 55,000 companies. Through the open data platform anyone can systematically gather, analyze and discuss publicly available information on company practices, joining current debates on corporate responsibility and accountability.

By bringing this information together in one place, and making it accessible, comparable and free for all, we aim to provide society with the tools and evidence it needs to spur corporations to respond to the world's social and environmental challenges. Homing in on the usability of the platform, this project will tackle some of the most crucial barriers for users when it comes to gathering and extracting the data, whilst boosting reuse of the open source platform for other purposes.

>> Read more about WikiRate Insights

WikiRate Insights 2 — Dedicated text search architecture for environmental, social and corporate governance platform

The project summary for this project is not yet available. Please come back soon!

>> Read more about WikiRate Insights 2

WordPress ActivityPub — Bring ActivityPub social networking to the widely used Wordpress

WordPress ActivityPub is a plugin that allows your site users to interact with other users in the fediverse. Currently the plugin supports Follows by remote users, sending out pubilc posts to followers, and receiving remote users public Comments on local posts. This project will develop features allowing for a more rich and typical social experience with Direct messages, Followers only posts, and Threaded comments to and from the fediverse. Moderation tools will be included and user privacy features will also be developed.

>> Read more about WordPress ActivityPub

XWiki ActivityPub — First class ActivityPub support in XWiki

XWiki is a modern and extensible open source wiki platform. XWiki is the first wiki that is part of the larger federation of collaboration and social software (a.k.a. fediverse), allowing users to collaborate externally. XWiki is embracing the W3C ActivityPub specification. Specifically we're implementing the server part of the specification, to be able to both view activity and content happening in external services inside XWiki itself and to make XWiki's activity and content available from these other services too. A specific but crucial use case, is to allow content collaboration between different XWiki servers, sharing content and activity.

>> Read more about XWiki ActivityPub

YaCy Grid SaaS

YaCy Grid Search-as-a-Service creates document crawling indexing functionality for everyone. Users of this new platform will be able to create their custom search portal by defining their own document corpus. Such a service is an advantage as a privacy or branding tool, but also allows scientific research and annotation of semantic content. User-group specific domain knowledge can be organized for custom applications such as fueling artificial intelligence analysis. This should be a benefit i.e. for private persons, journalists, scientists and large groups of people in communities like universities and companies. Instances of the portal should be able to self-support themselves financially: there is turn-key infrastructure to handle payments for crawling/indexing amounts as a subscription on a periodical basis while search requests are free for everyone. The portal will consist of free software, and users can download the portal software itself together with the acquired search index data - so everyone can start running a portal for themselves whenever they want.

>> Read more about YaCy Grid SaaS

dweb-search — Index DHT based distributed webs

dweb-search is a Free and Open Source (FOSS) search engine for directories, documents, videos, music on the Interplanetary Filesystem (IPFS), supporting the creation of a decentralized web where privacy is possible, censorship is difficult, and the internet can remain open to all. This project implements a publicly accessible IPFS thumbnail service and creaties a UI specifically to explore music or videos.

>> Read more about dweb-search

elRepo.io - Resilient, distributed content sharing — Resilient, human-centered, distributed content sharing and discovery.

In this project AlterMundi and NetHood collaborate to develop a critical missing part in decentralized and distributed p2p systems: content search. More specifically, this project will implement advanced search for elRepo.io, the self-hosted and distributed culturesharing platform currently under active development by AlterMundi and partners. Search functionalities will expand on the already proven coupling of thelibxapian searching and indexing library and turtle routing. The distributed search functionality will be implemented to be flexible and modular. It will become the meeting point of three complementary threads of on-going work: Libre technology and tools for building Community Networks (LibreRouter & LibreMesh), fully decentralized, secure and anonymous Friend2Friend software (Retroshare), and a transdisciplinary participatory methodology for local applications in Community Networks (netCommons).

>> Read more about elRepo.io - Resilient, distributed content sharing

fediverse.space — Find your way in the Fediverse

Fediverse.space is a tool for understanding decentralized social networks, and searching through them. The fediverse, or federated universe, is the set of social media servers, hosted by individuals across the globe, forming a libre and more democratic alternative to traditional social media. When displaying these servers in an intuitive visualization, clusters quickly emerge. For instance, servers with the same primary language will be close to each other. There are more subtle groupings, too: topics of discussion, types of users (serious vs. ironic), and political leanings all play a role. fediverse.space aims to be the best tool for understanding and discovering communities on this emerging social network.

>> Read more about fediverse.space

fwupd — Automatic Firmware updates for BSD operating systems

Security holes in the equipment we run are discovered all the time, and firmware is continuously upgraded as a result. But how do users discover what they need to upgrade to protect themselves? The goal of the "fwupd/LFVS integration in the BSD distributions" is to reuse the effort done by the fwupd/LVFS project and make it available in the BSD-based systems as well. The fwupd is available on Linux-based systems since 2015. It is an open-source daemon for managing the installation of firmware updates from LVFS. The LVFS (Linux Vendor Firmware Service) is a secure portal which allows hardware vendors to upload firmware updates. Over the years, some major hardware vendors (e.g. Dell, HP, Intel, Lenovo) have been uploading their firmware images to the LVFS so they can be later installed on the Linux-based systems. The integration of the fwupd in the BSD-based systems would allow reusing the well-established infrastructure so more users can take advantage of it.

>> Read more about fwupd

Handling Data from IPv6 Scanning — Scanning tools for scaling up IPv6 scans

Scanning is state of the art to discover hosts on the Internet. Today’s scanning relies on IPv4 and simply probes all possible addresses. But global IPv6 adoption will render brute-forcing useless due to the sheer size of the IPv6 address space, and demands more sophisticated ways of target generation. Our team developed such an approach that generally allows to probe all subnets in the currently deployed IPv6 Internet within reasonable time. Positive responses are however scarce in the IPv6 Internet; thus, we include error messages in our analysis as they provide meaningful insight into the current deployment status of networks. First experiments covering only parts of the Internet were promising and at least 5% of our probes trigger error messages. However, a full scan would lead to approx. 10^14 responses causing Petabytes of data, and demands an adequate solution of data handling. In this project, we will develop a data storage and analysis solution for high-speed IPv6 scanning. It will process the high amount of received data concurrently with scanning, and provide continuous results while scanning for long periods. This effort enables full scans of the IPv6 Internet.

>> Read more about Handling Data from IPv6 Scanning

Minedive — P2P search over webRTC

The minedive project is building several components: first, minedive is a browser extension aiming to allow users to search the web while preserving their anonymity and privacy. The second is an open source reference implementation of its rendez-vous server. minedive instances connect each-other (via WebRTC data channels) forming a two layered P2P network. The lower layer (L1) provides routing, the upper layer (L2) provides anonymous and encrypted communication among peers acting as a MIX network. This architecture guarantees that peers which know your IP address (L1) do not know search data for (L2) and vice-versa. A central (websocket) rendez-vous server is needed to find and connect with L1 peers, and to exchange keys with L2 peers, but no search goes through it. We are running a default server which can be overridden by users who want to run their own (using our reference implementation or a custom one). Users can also set the extension to pick peers from a given community (identified by an opaque tag). Currently all requests are satisfied by letting L2 peers return results from the 1st page of mainstream search engines (as they see it, in an attempt to escape the search bubble). While this will stay as a fallback, we plan to implement web crawling on peers, doing keyword extraction from URLs in local bookmarks and history and ranking with open algorithms, being transparent with users about which techniques are used and open to suggestions.

>> Read more about Minedive

Software vulnerability discovery — Automating discovery of software update and vulnerabilities

nixpkgs-update automates the updating of software packages in the nixpkgs software repository. It is a Haskell program. In the last year, about 5000 package updates initiated by nixpkgs-update were merged. This project will focus on two improvements: One, developing infrastructure so that the nixpkgs-update can run continuously on dedicated hardware to deliver updates as soon as possible, and Two, integrating with CVE systems to report CVEs that are addressed by proposed updates. I believe these improvements will increase the security of nixpkgs software and the NixOS operating system based on nixpkgs.

>> Read more about Software vulnerability discovery

openEngiadina — Platform for creating, publishing and using open local knowledge

OpenEngiadina is developing a platform for open local knowledge - a mashup between a semantic knowledge base (like Wikipedia) and a social network using the ActivityPub protocol. openEngiadina is being developed with small municipalities and local organizations in mind, and wants to explore the intersection of Linked Data and social networks - a 'semantic social network'.

openEngiadina started off as a platform for creating, publishing and using open local knowledge. The structured data allows for semantic queries and intelligent discovery of information. The ActivityPub protocol enables decentralized creation and federation of such structured data, so that local knowledge can be created by indepent actors in a certain area (e.g. a music association publishes concert location and timing). The project aims to develop a backend allowing such a platform, research ideas into user interfaces and strengthen the ties between the Linked Data and decentralized social networking communities.

>> Read more about openEngiadina

Privacy Preserving Disease Tracking — Research into contact tracing privacy

In case of a pandemic, it makes sense to share data to track the spread of a virus like SARS-CoV2. However, that very same data when gathered in a crude way is potentially very invasive to privacy - and in politically less reliable environments can be used to map out the social graph of individuals and severely threaten civil rights, free press. Unless the whole process is transparent, people might not be easily convinced to collaborate.

The PPDT project is trying to build a privacy preserving contact tracing mechanism that allows to notify users if they have come in contact with potentially infected people. This should happen in a way that is as privacy preserving as possible. We want to have the following properties: the users should be able to learn if they got in touch with infected parties, ideally only that - unless they opt in to share more information. The organisations operating servers should not learn anything besides who is infected, ideally not even that. The project builds a portable library that can be used across different mobile platforms, and a server component to aggregate data and send this back to the participants.

>> Read more about Privacy Preserving Disease Tracking

Search and Displace — Find and redact privacy sensitive information

The goal of this project is to establish a workflow and toolchain which can address the problem of mass search and displacement for document content where the original documents are in a range of forms, including a wide variety of digital document formats, both binary and more modern compressed XML forms, and potentially even encompassing older documents where the only surviving form is printed or even handwritten. The term "displacement" is meant to encompass actions taken on the discovered content that are beyond straight replacement, including content tagging and redaction, as well as more complex contextual and user-refined replacement on an iterative basis. It is assumed that this process will be a server application with documents uploaded as needed, on either an individual or bulk upload basis. The solution would be built in a modular fashion so that future deployments could deploy and/or modify only the parts needed. In practical terms this involves the creation of an open source tool chain that facilitates searching for private and confidential content inside documents, for instance attachments to email messages or documents that are to be published on a website. The tool can subsequently be used for the secure and automated redaction of sensitive documents; by building this as a modular solution enables the solution to be used “standalone” with a simple GUI, or used via command line, or embedded within 3rd party systems such as document management systems, content management systems and machine learning systems. In addition a modular approach will facilitate the use of the solution both with different languages (natural and programming) and different specialities e.g. government archives, winning tenders, legal contracts, court documents etc..

>> Read more about Search and Displace

Free Software Vulnerability Database — A resource to aggregate software updates

"Using Components with Known Vulnerabilities" is one of the OWASP Top 10 Most Critical Web Application Security Risks. Identifying such vulnerable components is currently hindered by data structure and tools that are (1) designed primarily for commercial/proprietary software components and (2) too dependent on the National Vulnerability Database (from US Dept. of Commerce). With the explosion of Free and Open Source Software (FOSS) usage over the last decade we need a new approach in order to efficiently identify security vulnerabilities in FOSS components that are the basis of every modern software system and applications. And that approach should be based on open data and FOSS tools. The goal of this project is create new FOSS tools to aggregate software component vulnerability data from multiple sources, organize that data with a new standard package identifier (Package URL or PURL) and automate the search for FOSS component security vulnerabilities. The expected benefits are to contribute to the improved security of software applications with open tools and data available freely to everyone and to lessen the dependence on a single foreign governmental data source or a few foreign commercial data providers.

>> Read more about Free Software Vulnerability Database