Data and AI

This page contains a concise overview of projects funded by NLnet foundation that belong to Data and AI (see the thematic index). There is more information available on each of the projects listed on this page - all you need to do is click on the title or the link at the bottom of the section on each project to read more. If a description on this page is a bit technical and terse, don't despair — the dedicated page will have a more user-friendly description that should be intelligible for 'normal' people as well. If you cannot find a specific project you are looking for, please check the alphabetic index or just search for it (or search for a specific keyword).

AI Horde — Collaborative infrastructure for running generative AI models

The AI Horde is a crowdsourced, free, libre and open sourced service with the aim to truly democratise access to Generative AI. It supports both generating text via Large Language Models and images via Stable Diffusion via a simple REST API, allowing everyone to integrate this technology to any product.

One of the biggest challenges with Generative AI is the amount of resources required to run even simple models, leaving the vast majority of humanity without access to this technology. The AI Horde delivers a groundbreaking smart-queuing clearing house where enthusiasts can volunteer their idle compute for everyone in the world to generate images or text without any further commitments or budget.

>> Read more about AI Horde

AI-VPN — Local machine-based learned analysis of VPN trafffic

Our security decreases significantly especially when we are outside our offices. Current VPNs encrypt our traffic, but they do not protect our devices from attacks or detect if there is an infection. The AI-VPN project proposes a new solution joining the VPN setup with a local AI-based IPS. The AI-VPN implements a state-of-the-art machine learning based Intrusion Prevention System in the VPN, generating alerts and blocking malicious connections automatically. The user is given a summary of the traffic of the device, showing dectected malicious patterns, privacy leaked data and security alerts, in order to protect and educate the users about their security status and any risks they are exposed to.

>> Read more about AI-VPN

AVantGaRDe — Reliable Foundations of Local-first Graph Databases

The *AVantGaRDe* (Verified highly-Available and Reliable Distributed Graph Databases) project aims to develop a framework for reliably supporting local-first connectivity. Graph databases have recently been introduced to efficiently manage interconnected, heterogeneous, and semi-structured data. These leverage native graph storage, an expressive property graph model, and dedicated graph query languages. Still, scalably and reliably managing large graphs, while ensuring availability, low latency, and consistency is challenging. While cloud graph databases try to address this, local-first solutions allow users to preserve ownership and agency over their data. Unfortunately, no local-first graph databases exist, as these require customized replicated data types (CRDTs) and compositionally preserving graph invariants. Moreover, as CRDTs are already notoriously difficult to construct, ensuring the correctness of complex graph CRDTs is challenging. The project aims to develop a novel framework for designing foundational models for local-first graph databases, with built-in trustworthiness and reliability guarantees. *AVantGaRDe* sets to design a unified framework for prototyping and extracting correct-by-construction horizontally scaled property graph CRDTs that can preserve complex invariants.

>> Read more about AVantGaRDe

OCCRP Aleph disambiguation — OCCRP Aleph: disambiguating different people and companies

Aleph is an investigative data platform that searches and cross-references global databases with leaks and public sources to find evidence of corruption and trace criminal connections. The project will improve the way that Aleph connects data across different data sources and how it ranks recommendations and searches for reporters. Our goal is to establish a feedback loop where users train a machine learning system that will predict if results showing a person or company refer to the same person or company. If successful this means journalists can conduct more efficient research and investigations, finding key information more quickly and wasting less time trawling through irrelevant documents and datasets.

>> Read more about OCCRP Aleph disambiguation

Atomic Data — Typesafe handling of LinkedData

Atomic Data is a modular specification for sharing, modifying and modeling graph data. It uses links to connect pieces of data, and therefore makes it easier to connect datasets to each other - even when these datasets exist on separate machines. Atomic Data is especially suitable for knowledge graphs, distributed datasets, semantic data, p2p applications, decentralized apps and linked open data. It is designed to be highly extensible, easy to use, and to make the process of domain specific standardization as simple as possible. It is type-safe linked data (a strict subset of RDF), which is also fully compatible with regular JSON. In this project, we'll work on the MIT licensed atomic-server and atomic-data-browser, which are a graph database server and a modular web-gui that enable users to model, share and edit atomic data. We'll add functionality, improve stability and testing, improve documentation and create materials that help developers to get started.

>> Read more about Atomic Data

Atomic Tables — Self-hostable tabular structured data solution

Atomic Tables is a new extension to the open source Atomic Data ecosystem, which aims to make the web more interoperable. In Atomic Tables, users can easily create their own data models using a tables interface, which people know and love from tools like Excel, Notion and Airtable. Having a self-hostable alternative to the existing SAAS offerings helps users retain control over their own data. What makes this project unique, is that the data models created in Atomic Tables are retrievable by a URL and can easily be re-used on other machines. This keeps costs of transforming or mapping data at an absolute minimum. Maintaining a standardized data model suddenly becomes trivial, instead of costing countless of man hours. Additionally, the software is not just designed to be a clean, intuitive end-user facing application, but also a powerful developer API that brings incredible performance and flexibility, making it highly usable as a database in other applications.

>> Read more about Atomic Tables

CRAVEX — Cyber Resilience Application for Vulnerability Exploitability Exchange

There is no free and open source vulnerability exploitability management application centered on software packages. Vulnerability management applications traditionally serve the needs of security teams first. There is a fundamental disconnect between the package-centric mindset of a developer and the vulnerability-centric mindset of a security analyst.

Developers need modern tools to manage, triage, rate, review, and determine exploitability of package vulnerabilities in a package-centric world. They are the primary stakeholders and best positioned to tackle open source package vulnerabilities at the root. With the impending requirements of the CRA, open source projects and small businesses urgently need a free and open solution to comply with these new emerging mandates with minimal friction and costs.

The Cyber Resilience Application for Vulnerability Exploitability (CRAVEX) is a web-based app designed to fulfill these requirements for better software supply chain integrity and security. CRAVEX will make it easier for any organization to comply with the emerging CRA and other regulatory requirements, efficiently, and improve the overall security posture of organizations of all sizes, especially for SMEs.

CRAVEX will collect, track, and triage FOSS package vulnerabilities, determine their exploitability in a portfolio of software products and projects, and provide reporting with SBOMs and VEX statements to share with stakeholders.

>> Read more about CRAVEX

CityBikes — Open access API for bike sharing information

Citybikes is the most comprehensive open access API for bike sharing information, with support for more than 700 cities all around the world. The goal of the project is to promote open data policies and showcase the benefits of open data to city councils and companies that provide public services to society.

Less than 25% of Citybikes data comes from open data standard feeds—for every city in citybikes publishing their bike sharing information in a reusable format, there are at least three more that do not use a standard format. Citybikes aims to change that by providing developers, researchers and organizations with a standard resource to bridge this gap and contribute towards an interoperable open data ecosystem for mobility services.

>> Read more about CityBikes

Condensation Data System — CRDT-driven data store that guarantees data ownership

Condensation (CDS) is a general-purpose data distribution system for asynchronous client-to-client data collaboration implementing inherent end-to-end data confidentiality and traceability.

CDS brings the logic of data synchronization and encryption to the edge device. While application makers can still leverage the availability and speed of the Cloud to transfer data - the risk of breaches and interferences happening on the network vanishes.

CDS offers light-weight tools to build a distributed system, from client logic to protocols, accounts, and storage. All with flexibility and elasticity, thanks to binary files, CRDT structures, Merkle trees, hash tables, transactional operations, automatic conflict resolution, actor system, PGP-like asymmetrical encryption, public-private keys, and asynchronous communication.

The system runs on IoT, Mobile, and Web applications, with multiple implementations. In this project, we will create a production ready implementation in Rust and webassembly (WASM) to make deployment in applications and devices easier, faster and more reliable.

>> Read more about Condensation Data System

Conzept encyclopedia — An alternative encyclopedia

The Conzept encyclopedia is an attempt to create an encyclopedia for the 21st century. A modern topic-exploration tool based on: Wikipedia, Wikidata, the Open Library, Archive.org, YouTube, the Global Biodiversity Information Facility and many other information sources. A semantic web app build for fun, education and research. Conzept allows you to explore any of the millions of topics on Wikipedia from many different angles - such as science, art, digital books and education - both as a defined semantic entity ("thing") as well as a string. Client-side topic-classification in addition allows for a fast, higher-level logic throughout the whole user experience. Conzept also has an uniquely integrated user-interface, which gives you a single well-designed view of all this information (in any of the 300+ Wikipedia languages), without cognitive overload.

>> Read more about Conzept encyclopedia

Dat Private Network — Private storage in DAT

The dat private network is a self-hosted server that is easy to deploy on cloud or home infrastructure. Key features include a web-based control panel for administration by non-developers, as well as on-disk encryption. These no-knowledge storage services will ensure backup and high availability of distributed datasets, while also providing trust that unauthorized third-parties won’t have access to content.

By creating a turnkey backup solution, we’ll be able to address two of our users’ most pressing questions about dat: who serves my data when I’m offline, and how do I archive and secure important files? The idea for this module came from the community, and reflects a dire need in the storage space -- no-knowledge backup and sync across devices. A properly-designed backup service will provide solutions to both of these questions, and will do so in a privacy-preserving way.

This deliverable will put resources into bringing this work to a production-ready state, primarily through development towards updates that make use of the latest performance and security updates from the dat ecosystem, such as NOISE support. We plan to maintain the socio-technical infrastructure through an open working group that creates updates for the network as it matures.

>> Read more about Dat Private Network

DATALISP — Universal data interchange format using canonical S-expressions

As society moves digital the need for thorough fundamentals becomes more prominent. Datalisp is a laboratory for decentralized collaboration built on a few well understood ideas which imply a certain architecture. The central thesis of datalisp is: "If we agree to use a theoretically sound data interchange format then we will be able to efficiently express increasingly complicated coordination problems", but in order to move the web to a different encoding we will need incentives on our side. A substantial improvement in user experience is needed and we aim to provide it. Ultimately our goal is to give peers the tools they need to protect themselves, and others, by collaboratively measuring the legitimacy of information and locally; by assessing whether data can be trusted as code or whether it requires user attention. Datalisp is the convergence point for all these tools (none of which is named "datalisp") rather than a language, join us in figuring out how to reach it!

>> Read more about DATALISP

DatamiPods — Visualisations for (federated) Solid data

Datami is a tool to edit, visualize and share your data. It allows to transform datasets into discoverable, understandable and reusable data. ActivityPods is a collective data space solution based on Solid and ActivityPub.

The DatamiPods project creates a bridge between these two existing open source tools, and aims to simplifies the use of the datasets involved - also for less technical users.

>> Read more about DatamiPods

DeviceCode — Structured technical information about consumer devices

This project is about reusing crowdsourced technical data about devices. This data is useful for researchers and tinkerers, but it is typically not the data that vendors are willing to give, let alone under a license that allows reuse. Think of: chipset information, serial port layout & speeds, amount of memory, and so on. Several groups of people have collected this data in several places (mostly wikis) under an open data license, but they are hard to reuse by other projects that could be interested in this data. The goal of "DeviceCode" is to collect this information, rework it into a format that is easy to reuse by other projects without having to resort to Wiki scraping, and also clean up the data (as humans make data entry mistakes and put useful data in places where it shouldn't be), cross-correlate different sources and automatically enrich the data where possible.

>> Read more about DeviceCode

Dokieli — Decentralised article publishing, annotations and social interactions

Dokieli empowers users with full control and ownership of their content through self-publishing capabilities. As a decentralised authoring, annotation, and notification tool, dokieli enables users to create and share human-readable and machine-processable content.

Users can author and annotate a wide range of creative works, including articles, reviews, technical specifications, research and academic works, resumes, journals, and slideshows. They can link significant units of information from various open sources, store their content using their preferred storage systems, and share it with their contacts.

Dokieli is committed to leveraging open internet and web standards to ensure interoperability and universal access. Content produced by dokieli is decoupled from the application, allowing users the autonomy to switch to any other standards-compliant application and storage system.

The project's goal is to make it usable and accessible for all. To this end, we will replace several key libraries; improve the UI; expand test coverage (including accessibility tests); increase support for offline use; perform security audits; and expand implementation of web standards, and provide implementation experience feedback to technical standards bodies.

>> Read more about Dokieli

Encoding for Robust Immutable Storage (ERIS) — Encrypted and content-addressable data blocks

The Encoding for Robust Immutable Storage (ERIS) is an encoding of content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier (a URN). The content can be reassembled from the encrypted blocks only with this identifier (the read capability). ERIS is a form of content-addressing. The identifier of some encoded content depends on the content itself and is independent of the physical location of where the content is stored (unlike content addressed by URLs). This enables content to be replicated and cached, making systems relying on the content more robust.

Unlike other forms of content-addressing (e.g. IPFS), ERIS encrypts content into uniformly sized blocks for storage and transport. This allows peers without access to the read capability to transport and cache content without being able to read the content. ERIS is defined independent of any specific protocol or application and decouples content from transport and storage layers.

The project will release version 1.0.0 after handling feedback from security audit, provide implementations in popular languages to facilitate wider usage (e.g. C library, JS library on NPM), perform a number of core integrations into various transport and storage layers (e.g. GNUNet, HTTP, CoAP, S3), and deliver Block Storage Management (quotas, garbage collection and synchronization for caching peers).

>> Read more about Encoding for Robust Immutable Storage (ERIS)

Earthstar (Encryption, Safety, and Local Sync) — Improve security, encryption and sync capabilities in Earthstar CRDT

Storing and collaborating digital data is an essential part of every day computing, from photo-sharing amongst family members, to document co-authoring between colleagues. Earthstar is a tool for building undiscoverable, offline-first shared data storage. Users decide which devices their data are stored on, what the infrastructure of their network looks like, the shape of their data, and how they can interact with it. The proposed project adds a number of useful features, notably end-to-end encryption (including metadata), P2P discovery in local networks and efficient data synchronisation.

Etebase - protocol and encryption enhancements — Redesign EteSync protocol and encryption scheme

Etebase is an open-source and end-to-end encrypted software development kit and backend. Think of it as a tool that developers can use to easily build encrypted applications. Etebase is the new name for the protocol that powers EteSync, an open source, end-to-end encrypted, and privacy respecting sync solution for contacts, calendars, notes, tasks and more across all major platforms.

Many people are well aware of the importance of end-to-end encryption. This is evident by the increasing popularity of end-to-end encrypted messaging applications. However, in today's cloud-based world, there is much more (as important!) information that is just left exposed and unencrypted, without people even realising. Calendar events, tasks, personal notes and location data ("find my phone") are a few such examples. This is why the overarching goal of Etebase is to enable users to end-to-end encrypt all of their data.

While the Etebase protocol served EteSync well, there are a number of improvements that could be made to better support EteSync's current and long-term requirements, as well as enabling other developers to build a variety of encrypted applications.

EteSync - iOS application — Encrypted synchronisation for calendars, addressbook, etc

EteSync is an open source, end-to-end encrypted, and privacy respecting sync solution for contacts, calendars and tasks with more data types planned for the future. It's currently supported on Android, the desktop (using a DAV adapter layer) where it seamlessly integrates with existing apps, and on the web for easy access from everywhere.

Many people are well aware of the importance of end-to-end encryption. This is evident by the increasing popularity of end-to-end encrypted messaging applications. However, in today's cloud-based world, there is much more (as important!) information that is just left exposed and unencrypted, without people even realising. Calendar events, tasks, personal notes and location data ("find my phone") are a few such examples. This is why the overarching goal of EteSync is to enable users to end-to-end encrypt all of their data.

The purpose of this project is to create an EteSync iOS client which will seamlessly integrate with rest of the system and let the many currently uncatered for iOS users securely sync their data.

>> Read more about EteSync - iOS application

Every Door — Efficient and customizable mobile OpenStreetMap editor

Every Door is an open-source OpenStreetMap editor for Android and iOS devices. It focuses on efficient on-the-ground surveying, mainly on points of interest and addresses. With the app, one can fully map an entire shopping mall or an entire village in a matter of hours. The next steps for the editor are vector tiles and customization: tailoring Every Door for focused mapping and adding interoperability with third-party services.

>> Read more about Every Door

Explain — Deep search on open educational resources

The Explain project aims to bring open educational resources to the masses. Many disparate locations of learning material exist, but as of yet there isn’t a single place which combines these resources to make them easily discoverable for learners. Using a broad array of deep content metadata extraction techniques developed in conjunction with the Delft University of Technology, the Explain search engine indexes content from a wide variety of sources. With this search engine, learners can then discover the learning material they need through a fine-grained topic search or through uploading their own content (eg. exams, rubrics, excerpts) for which learners require additional educational resources. The project focuses on usability and discoverability of resources.

>> Read more about Explain

Friendly Forge Format (F3) — Proposed Standard for secure communication between software forges

The Friendly Forge Format (abbreviated F3) is an Open File Format for storing the information from a forge such as issues, pull/merge requests, milestones, release assets, etc. as well as the associated VCS (Git, Mercurial, etc.). F3 is designed to exchange the state of a software project between GitHub, GitLab, Gitea, etc. for backup, mirroring or federation. F3 is essential for a forge to provide key requirements. (i) Portability: the entire state of a software project can be dumped and restored at a later time, on a different development environment (ii) Versatility: when published and updated as a F3 archive, a software project effectively is Open Data on which an unlimited range of applications can rely, even outside of the forge domain (iii) Consistency: it provides a common language to use when talking about the forge related domains (iv) Trust: cryptographic signatures on each F3 dump guard against malicious or unintentional tampering that could compromise the integrity of a software project.

>> Read more about Friendly Forge Format (F3)

FastScan — Performance improvements for ScanCode Toolkit/ScanCode.io

The project summary for this project is not yet available. Please come back soon!

>> Read more about FastScan

Software metadata — Decentralized, federated metadata about software applications

Modern software systems (and the organizations building and using them) rely on reusing free and open source software (FOSS), which requires quality metadata. Existing FOSS metadata databases are centralized and "too big to share" with locked metadata behind gated APIs promoting lock-in and prohibiting privacy-preserving offline usage.

FederatedCode is a new decentralized and federated system for FOSS metadata, enabling social review and sharing of curated metadata along with air-gapped, local usage to preserve privacy and confidentiality. FederatedCode's distributed metadata collection process includes metadata crawling, curation and sharing, and its application to open source software package origin, license and vulnerabilities. The project strives to implement the concepts outlined in "Federated and decentralized metadata system" (Ombredanne 2023).

>> Read more about Software metadata

Federated Task-Tracking with Live Data — Track tasks and issues in a federated way

Applications and data are tightly coupled: the format, structure, and meaning of data are almost inseparable from the application generating and using them, hindering the data's portability. Sharing data between applications entails mastering complex and proprietary APIs or export formats, and transforming output data into the necessary structure and meaning for use elsewhere, time-consuming and error-prone activities. Federation is a way of linking different systems together so users can share data by being 'connected, but sovereign'. The precursor Federated Timesheets project successfully pioneered this approach for time-tracking data, bringing together WikiSuite, timeld, and Prejournal such that timesheet data entered into one are easily disseminated to the others. Federated Task-Tracking builds ambitiously on that foundation, with a more complex data model applicable to a broader range of real-world scenarios, introduces live collaborative editing of latency-critical data shared between participating systems.

>> Read more about Federated Task-Tracking with Live Data

First Classify Documents — Categorise different types of official documents

With governments all over the world turning to digital filing systems, millions of paper files still wait to be digitized. One major challenge in this process is a structured approach to classifying and ordering documents. It is an unfortunate fact that many public documents are bitmap images of texts. For instance, tenders are published digitally but the actual resulting contracts are not published in a way that allows them to be indexed and queried - which hinders civil society in their ability to access these documents. Open source OCR software needs to become better to get good results with this. This project developed a system for models to distinguish between different types of official documents. able to classify state documents according to structure, keywords, document name, word and page count, metadata and context.

>> Read more about First Classify Documents

Fleetbase on Solid: A production-ready supply chain solution — Federated open source supply chain solution using Solid

One of the most exciting features of Solid is its ability to set up a knowledge graph that connects the data with different owners. This is useful for connecting personal data, but it's even more useful for connecting business data. As such, supply chain management is a field with a high potential for disruption with Solid. Individual companies can share supply chain data with their clients and suppliers, allowing for more insights across the entire supply chain. Building a supply chain solution on top of Solid doesn't only take knowledge of Linked Data, it requires partners who are experts in supply chain management. Fleetbase is an MIT licensed, open-source logistics platform serving companies around the world. The "Fleetbase on Solid: A production-ready supply chain solution" project seeks to make Fleetbase solid compatible and flesh out a real-world use-case that relies on the power of linked data sharing enabled by Solid. By the end of the project, shipping companies will be able to used Fleetbase on Solid to sharing information and coordinate with third party delivery companies.

Wikirate Frameworks — Open corporate data in Wikirate through the lens of standards

Wikirate.org is the largest open-source open-data registry of Environmental, Social and Governance (ESG) data about companies. The project, “A Frameworks Framing: Open Corporate data through the lens of standards”, aims to enhance Wikirate.org by integrating ESG standards and frameworks as key navigational and analytical tools. The enhancements will make it easy for diverse stakeholders – such as researchers, CSOs and investors – to navigate the many existing frameworks conceived to organize ESG data. It can be very difficult to wrap one’s head around any single ESG framework, much less to see how all the frameworks interrelate. There is, however, quite a lot of interrelation. Frameworks end up needing the answers to overlapping questions (or, in Wikirate terms, metrics). The functionality developed in this grant will enable users to see how Wikirate metrics and datasets align with one or more frameworks. The project will facilitate better understanding and use of corporate data for stakeholders by streamlining the organization of ESG topics, advancing open standards, and making frameworks central to exploring metrics.

>> Read more about Wikirate Frameworks

Data packages — Specification + improved tooling for external data set descriptions

Frictionless Standards are lightweight yet comprehensive open standards to help data publishers and consumers to create and use data. The standards include Data Package to describe a dataset, Data Resource to describe a data resource, File Dialect to describe a file format, and Table Schema to describe tabular data. They can be used together within a data package, like when providing a data API within an open data portal, or separately as building blocks for other standards or metadata catalogues, like Table Schema catalogue for public data models. The ultimate goal of Frictionless Standards is fully aligned with the FAIR principles: Findability, Accessibility, Interoperability, and Reuse of digital assets.

>> Read more about Data packages

Geolexica reverse — Reverse Semantic Search and Ontology Discovery via Machine Learning

Ever forgotten a specific word but could describe its meaning? Internet search engines more than often return unrelated entries. The solution is reverse semantic search: given an input of the meaning of the word (search phrase), provide an output with dictionary words that match the meaning. The key to accurate reverse search lies in the machine’s ability to understand semantics. We employ deep learning approaches in natural language processing (NLP) to enable better comparison of meanings between the search phrases with word definitions. Accuracy will be significantly increased. The project outcome will be employed on Geolexica as a pilot application and testbed for evaluation. The ability to identify entities with similar semantics facilitates ontology discovery in the Semantic Web and in Technical Language Processing (TLP).

>> Read more about Geolexica reverse

ISCC-CORE typescript implementation library — Decentralised content identifiers through ISO 24138.

The goal of this project is to implement core functions of the new ISCC standard ISO 24138:2024 (“International Standard Content Code”) in Typescript, resulting in a library will be useful for the javascript ecosystem and developers to use and work with this new standard in their project.

ISCC is a similarity preserving fingerprint and identifier for digital media assets. ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash). This supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and other use-cases.

>> Read more about ISCC-CORE typescript implementation library

Icosa Gallery — Open, decentralised platform for 3D assets

Icosa Gallery is an open source 3D model sharing platform, designed to give users total control over their 3D creations. Powered by ActivityPub, users are free to choose their own instance that suits their needs, while still being able to share their creations with the wider fediverse. Users have access to a versatile 3D viewer for the browser, can upload in a wide choice of formats, and have complete control over publishing, licencing, and terms of their own assets. 3D portfolios are made simple for sharing with clients. A powerful API, search, and tagging system allows users to easily integrate their creations into any 3D environment. Instance admins have a versatile toolbox for managing data, including multiple large file storage backends depending on their hosting needs.

>> Read more about Icosa Gallery

In-document search — Interoperable Rich Text Changes for Search

There is a relatively unexplored layer of metadata inside the document formats we use, such as Office documents. This allows to answer queries like: show me all the reports with edits made within a timespan, by a certain user or by a group of users. Or: Show me all the hyperlinks inside documents pointing to a web resource that is about to be moved. Or: list all presentations that contain this copyrighted image. Such embedded information could be better exposed to and used by search engines than is now the case. The project expands the ODF toolkit library to dissect file formats, and will potentially have a very useful side effect of maturing the understanding of document metadata at large and for collaborative editing of documents in particular.

>> Read more about In-document search

Practical Tools to Build the Context Web — Declarative setup of P2P collaboration

In a nutshell, the Perspectives project makes collaboration behaviour reusable, and workflows searchable. It provides the conceptual building blocks for co-operation, laying the groundwork for a federated, fully distributed infrastructure that supports endless varieties of co-operation and reuse. The declarative Perspectives Language allows a model to translate instantly in an application that supports multiple users to contribute to a shared process, each with her own unique perspective.

The project will extend the existing Alpha version of the reference implementation into a solid Beta, with useful models/apps, aspiring to community adoption to further the growth of applications for citizen end users. Furthermore, necessary services such as a model repository will be provided. This will bring Perspectives out of the lab, and into the field. For users, it will provide support in well-known IDE's for the modelling language, providing syntax colouring, go-to definition and autocomplete.

Real life is an endless affair of interlocking activities. Likewise, Perspectives models of services can overlap and build on common concepts, thus forming a federated conceptual space that allows users to move from one service to another as the need arises in a most natural way. Such an infrastructure functions as a map, promoting discovery, decreasing dependency on explicit search. However, rather than being an on-line information source to be searched, such the traditional Yellow Pages, Perspectives models allow their users (individuals and organisations alike) to interact and deal with each other on-line. Supply-demand matching in specific domains (e.g. local transport) integrates readily with such an infrastructure. Other patterns of integrating search with co-operation support form a promising area for further research.

>> Read more about Practical Tools to Build the Context Web

Inventaire Self-hosted — Self-hosted book inventories that share the wikidata-powered bibliographic database

The Inventaire Association supports and promotes the use of libre/free software and open knowledge to share information on resources. This ideal results in inventaire.io: a libre book sharing webapp, inviting everyone to make the inventory of their physical books, say what they want to do with it (giving, sharing, selling) and who may see it (friends, groups, or everyone). To provide data on books, inventaire.io reuses, extends, and facilitate contribution to wikidata.org. This allows users to build their inventories on top of a huge open multilingual knowledge graph, connected to Wikipedia, national libraries, the fediverse, and many other resources.

As the inventaire software becomes more mature, it is now time to deliver on a promise made years ago: decentralization. Installing and maintaining a self-hosted data-federated inventaire server should soon be as easy as (cyber-)cake! This would allow association libraries, privacy-concerned collectives, or anyone preferring self-hosting, to run their own instance: they would fully control their inventory data ("We have this book"), while still having the possibility to benefit from a mutualized bibliographic database ("This author wrote this book").

>> Read more about Inventaire Self-hosted

Threat intelligence sharing — Privacy-Preserving Sharing of Threat Intelligence in Trusted Adversarial Environments

Iris P2P is a peer to peer system for sharing security detections and threat intelligence with trusted models resilient to manipulation attacks

Most P2P systems are designed for file sharing, storage, chat, etc. but they are not prepared to share security detections, threat intelligence data and alerts. The security world needs better ways to automatically share intelligence data with trusted organizations and peers. This sharing is better decentralized so no single organization has control or can censor, sell or modify the data. Especially due to privacy concerns of what is done with your data.

Iris is the first global P2P system that is designed to solve this problem. It implements: automatic sharing of threat intelligence data when you are attacked, controlling the spread in the P2P to spread slowly, alerting the network of a new attacker. Controlling the spread in the P2P to be fast, asking peers about the reputation of other peers, and defining ‘organizations’ in the P2P network using the DHT and private/public keys. Organizations can publish their keys in conventional communication systems to attest ownership (social media, etc.) All communication is encrypted with private/public keys. You can control the privacy of your data by defining to which organizations and peers you want to share your data. You can also control the transfer of data with epidemic algorithms. All data is evaluated according to the trust in the other peers.

Defining trust of each peer in the network with a new protocol (Fides) which computes the trust in each peer by balancing the direct interactions with peers and reputation of peers according to the rest of the peers. Fides implements a mathematical model to guarantee that no adversarial peer can lie to manipulate the reputation and the trust.

>> Read more about Threat intelligence sharing

Knowledge Graph Portal Generator — Automatically generate custom web interfaces for structured data

The Knowledge Graph Portal Generator is a toolkit designed to create user-friendly web portals for Knowledge Graph (KG) datasets, making data from public SPARQL endpoints accessible to users without expertise in semantic technologies. Built on the LinkedDataHub framework, our solution will feature paginated collections, faceted search, and detailed entity views. It will extract RDF ontologies from datasets, generate content configurations, and use these to extend the default LinkedDataHub into a dataset-specific web application.

>> Read more about Knowledge Graph Portal Generator

LabPlot — Scientific and engineering data analysis and visualisation

LabPlot is a free, open source and cross-platform data visualisation and analysis software. It focuses on ease of use and performance. It provides high quality data visualisation and plotting capabilities, as well as reliable and easy data analysis, without requiring any programming skills from the user. Data import and export to and from a variety of formats is supported. LabPlot also allows calculations to be performed in various open source computer algebra systems and languages via an interactive notebook interface.

In this project the team will work on extending the current feature set of the application to reach a wider audience. This includes scripting capabilities (in Python only in the initial implementation) to script and automate repetitive data visualisation and analysis workflows and to allow control of LabPlot from external applications via a public interface. The second feature that will be worked on is the ability to apply analysis functions such as FFT, smoothing, etc. to live/streaming data (data imported into LabPlot and modified externally). And thirdly, statistical analysis including common hypothesis tests, correlations, regressions and data panning.

>> Read more about LabPlot

LiberaForms — End tot End Encrypted Forms

Cloud services that offer handling of online forms are widely used by schools, associations, volunteer organisations, civil society, and even families to publish questionnaires and collect the results. While these cloud services (such as Google Forms and Microsoft Forms) can be quite convenient to create forms with, for the constituency which has to fill out these forms such practices can actually be very invasive because forms may not only include personal details such as their name, address, gender or age, but also more intimate questions including medical details, political information and life style background. In many situations there is a power asymmetry between the people creating the form and the users that have to supply the data through that form. Often there is significant time pressure. No wonder that users feel socially coerced to comply and hand over their data, even though they might be perfectly aware that their own data might be used against them.

LiberaForms is a transparent alternative for proprietary online forms that you can easily host yourself. In this project, LIberaForms will add end-to-end encryption with OpenPGP, meaning that the data is encrypted on the client device and only the final recipient of the form data can read it (and not just anyone with access to a server). Also, the team will add real-time collaboration on forms, in case users need to fill out forms together.

>> Read more about LiberaForms

LinkedDataHub — Framework to handle Linked Data at scale

LinkedDataHub is a Knowledge Graph explorer, or in technical terms, a rich Linked Data client combined with a personal RDF dataspace (triplestore). It provides a number of features for end-users: browsing Linked Data, cloning RDF resources to the personal dataspace, searching and querying SPARQL endpoints, creating collections from SPARQL queries, editing remote and local RDF documents, creating and transcluding structured content with visualizations of SPARQL results, charts etc. LinkedDataHub is a standalone product as well as a framework – its data-driven architecture allows extension and customization of at every level from the APIs up to the UI.

We expect LinkedDataHub to become a go-to tool for end-users working with Linked Data and SPARQL: researchers, data scientists, domain experts – regardless of whether they work in the digital humanities, life-sciences or any other domain. We strive to provide an unparalleled Knowledge Graph user experience that is enabled by the RDF stack, with the focus on discovery, exploration and personalization.

>> Read more about LinkedDataHub

LumoSQL — Create more reliable, distributed embedded databases

The most widely-used database (SQLite) is not as reliable as it could be, and is missing essential features like encryption and safe usage in networked environments. Billions of people unknowingly depend on SQLite in their applications for critical tasks throughout the day, and this embedded database is used in many internet applications - including in some core internet and technology infrastructure. This project wants to create a viable alternative ('rip and replace'), using the battle tested LMDB produced by the LDAP community. This effort allow to address a number of other shortcomings, and make many applications more trustworthy and by means of adding cryptography also more private. Given the wide range of use cases and heavy operational demands of this class of embedded databases, a serious effort is needed to execute this plan in a way where users can massively switch. The project will extensively test, and will validate its efforts with a number of critical applications.

>> Read more about LumoSQL

MWoffliner — Software to make Wikipedia and other Mediawiki content available offline

Wikipedia aims to make the Sum of All Human Knowledge available to all and for free. But with three to four billion people around the world lacking connectivity (because of cost, infrastructure or censorship) we need a solution to bridge the digital divide and bring this great tool to everyone.

Mediawiki offliner packages and compresses any wiki into a portable ZIM archive that can then be browsed offline and on any device, no matter where their users are located. In short, this allows everyone and everyone to carry the largest encyclopaedia ever on their phone and in their pocket.

>> Read more about MWoffliner

MaDada — Using LinkedData to improve FOI processes

MaDada is a free open source platform that simplifies and opens up the process of access by the general public to data and information held by the French government. Making use of the Freedom Of Information (FOI) law, the platform guides citizens to file requests, but also acts as an open data archive and platform for right-to-know or transparency campaigns, by publishing the whole process : the requests history, the resulting correspondence, and the data obtained through it. Launched in October 2019 by Open Knowledge Foundation France members, MaDada has helped 250+ users make over 1200 FOI requests to French public bodies, and is beginning to play an important role in the right-to-know, need for transparency and open government problems.

MaDada is based on the open source software Alaveteli (https://alaveteli.org), which has been adapted and deployed to more than 25 countries in 20 different languages and jurisdictions. Alaveteli offers efficient functions for users to request and manage FOI requests. The NLnet funding will help the project develop and improve discovery and search features of public bodies on madada.fr and Alaveteli software - for instance, in France alone there are more than 60,000 public authorities. This will take advantage of existing digital commons such as Wikidata, and open standards such as schema.org and DCAT.

>> Read more about MaDada

Manas — Rust modules for Solid clients and servers

Manas project aims to make Solid ubiquitous by creating an ecosystem with well-tested, reusable components in rust and js, with which one can assemble customized, feature rich Solid storage servers, clients, and applications, and digital-commons with data-sovereignty collaboration at the core.

Using rust, the servers could be run on low resource raspberry-pies to low latency serverless clouds, or as lightweight developer test servers. Can use custom storages from filesystem, object-stores, or consumer cloud storages like google-drive as backends. Support for WAC, ACP authorization systems, Solid-OIDC, HTTPSig authentication schemes, multi pod management, solid-notifications, etc will be provided as reusable layers. And the layered architecture enables adding customized validation, or any other custom features.

For clients, a rust client, and other helper crates will be developed for Solid protocol, Solid-notifications, etc, with probable bindings to other languages, that enables small CLIs, and other server-side/client side applications.

For the applications, a reusable crate will be created to package them as native applications using tauri, and Manas. This could make Solid an attractive storage api to code web & native apps with a single code base. It can be extended to offer sync solutions, native-first apps, etc in future.

>> Read more about Manas

MapComplete — Thematics OpenStreetMap-viewer and editor.

OpenStreetMap is a libre and free online database of geodata which can be edited by everyone and is used by millions of people. However, contributing can be challenging or intimidating to non-technical users. MapComplete is a webapp whose goal is to make it trivial to see and update information on OpenStreetMap. This is achieved by showing only features related to a single topic of interest on the map - from playgrounds, public toilets and bicycle rental places to charging stations and public tap water spots.

MapComplete contains many thematic maps, each built for a certain community of users and use cases. By focusing on a single topic, contributors are not distracted by objects not relevant to them. Furthermore, this allows to show (and ask for) attributes that are highly specialized (e.g. a widget that determines tree species based on pictures) but also to reuse common attributes and elements (such as showing and adding opening hours or pictures). Within this project, performance will be improved and a user interface to create a new topical map will be built, which will allow for more people to contribute on more topics.

>> Read more about MapComplete

NEFUSI — NEFUSI: A novel NEuroFUzzy approach for semantic SImilarity assessment

The challenge of determining the degree of semantic similarity between two expressions of a textual nature has become increasingly important in recent times. The great importance it has in many modern computing areas and the latest advances in neural computation have made the solutions better. NEFUSI (which stands for "NEuroFUzzy approach for semantic SImilarity assessment") aims to go a step further with the design and development of a novel neurofuzzy approach for semantic textual similarity based on neural networks and fuzzy logics. We intend to benefit from the outstanding capabilities of the latest neural models to work with text and, at the same time, from the possibilities that fuzzy logic offers to aggregate and decode numerical values in a personalized way. In this way, the project will build an approach intended to effectively determine the degree of semantic similarity of textual expressions with high accuracy in a wide range of scenarios concerning Search and Discovery.

>> Read more about NEFUSI

Nextcloud — Unified and intelligent search within private cloud data

The internet helps people to work, manage, share and access information and documents. Proprietary cloud services from large vendors like Microsoft, Google, Dropbox and others cannot offer the privacy and security guarantees users need. Nextcloud is a 100% open source solution where all information can stay on premise, with the protected users choose themselves. The Nextcloud Search project will solve the last remaining open issue which is unified, convenient and intelligent search and discoverability of data. The goal is to build a powerful but user friendly user interface for search across the entire private cloud. It will be possible to select data date, type, owner, size, keywords, tags and other metadata. The backend will offers indexing and searching of file based content, as well as integrated search for other contents like text chats, calendar entries, contacts, comments and other data. It will integrate with the private search capabilities of Searx. As a result the users will have the same powerful search functionalities they know and like elsewhere, but respecting the privacy of users and strict regulations like the GDPR.

>> Read more about Nextcloud

NextGraph — Interlinked data graphs, with privacy, security, data locality, and interoperability in mind

NextGraph brings about the convergence between P2P and Semantic Web technologies, towards a decentralized, secure and privacy-preserving cloud, based on CRDTs. This open source ecosystem provides solutions for end-users and software developers alike, wishing to use or create decentralized apps featuring: live collaboration on rich-text documents, peer to peer communication with end-to-end encryption, offline-first, local-first, portable and interoperable data, total ownership of data and software, security and privacy. Centered on repositories containing semantic data (RDF), rich text, and structured data formats like JSON, synced between peers belonging to permissioned groups of users, it offers strong eventual consistency, thanks to the use of operation-based CRDTs. Documents can be linked together, signed, shared securely, queried using the SPARQL language and organized into sites and containers. Long-term goals include developing or integrating wikis, knowledge bases, search engines, groupware, productivity tools, supply chain solutions, marketplaces and e-commerce solutions, social networks, smart contracts and DAOs. With NextGraph, users can now create and access freely their own interlinked data graphs, while preserving privacy, security, data locality, and interoperability.

>> Read more about NextGraph

Nominatim as a library — Self-hostable address/location retrieval for OpenStreetMap

Nominatim is an open-source geographic search engine (geocoder). It makes use of the data from OpenStreetMap to built up a database and API that allows to search for any place on earth and lookup addresses for any given geographic location. The conventional wisdom is that geocoding is such a computationally heavy task that it can only be done through a webservice. So far, Nominatim has been following this convention. While it is easy to install your own instance, it is still expected to be run as a service. However, if you care about privacy, then location data is not something you would want to regularly send to an external geocoding provider because it allows to create detailed movement profiles. We need the possibility to do geocoding directly on the device. The goal of this project is to transform Nominatim's code base so that it cannot be only be used as a web service but also as a local application or as a library inside another application. In the first phase, the PHP code of the search frontend will be ported to Python, which is much better suited for such a multi-use task. In the second phase, we explore if the rather heavy-weight PostgreSQL database can be transformed into an SQLite database to even further simplify using Nominatim as a library.

>> Read more about Nominatim as a library

OpenStreetMap Speed Limits — Infer default speed limits for better quality OpenStreetMap-based routing

OpenStreetMap (OSM) is the worlds largest open geodata set, created and maintained collaboratively by millions of users. Of course there are many other purposes beyond creating a map, for instance finding the best route from A to B. Such usage needs to take into account incomplete data, as coverage of speed limits varies greatly across OSM. Currently, only about 12% of roads in OSM have speed limits set. However, default legal speed limits can often be inferred from other data, such as whether the road is within an urban zone, whether the carriage way is segregated, how many lanes it has, whether it is paved etc.

The goal of this project is to extract the default speed limits for different road and vehicle types for all state legislations, map these to OSM and provide these in a machine-readable form so that it can be consumed by open source routing software such as GraphHopper, Valhalla or OSRM. Further, a reference implementation that interprets this data will be provided.

>> Read more about OpenStreetMap Speed Limits

Ontogen — From datasets in DCAT catalogs to knowledge graphs

Data Catalogs are an important building block for a knowledge graph. Most available open-source data cataloging solutions, however, are tailored either to the needs of dataset publishers or to bigger companies with existing data warehouses or data lakes. Open data communities or smaller-sized companies do have not many options to choose from when it comes to lightweight solutions to catalog their existing data assets or collect existing metadata about relevant datasets for their needs. K-Gen will be such a lightweight data catalog solution. It will be based on DCAT, the W3C standard for data catalogs, which has been widely adopted in the public sector for the publishing of open datasets.

In the first development phase, the milestone of a basic data catalog to collect metadata about datasets of a user and a general data processing pipeline to import existing metadata about datasets from various sources and various formats, including ways to keep them in sync with the original source should be developed. Further development should then provide tools to build a knowledge graph over the content of the datasets of the data catalog.

>> Read more about Ontogen

Ontogen and Mud — Advanced versioning and identity management for RDF datasets

Ontogen is a specialized version control system for RDF datasets, addressing unique challenges in semantic web data management. In this project, we aim to significantly enhance Ontogen's capabilities and usability. A key improvement is extracting and expanding Ontogen's configuration language into Mud, a standalone RDF preprocessing language for comprehensive identity management. Mud will extend beyond configuration, offering expanded identity management for all resources in RDF datasets and providing extensible support for other common operations when working with RDF data, like RDF smushing for example. Also a robust synchronization protocol should be implemented in Ontogen, enabling a complete repository copy in the file system, allowing seamless use of text editors and other file-based utilities for working with the versioned dataset, as well as integration with Git or other file-based version control systems. Additionally, support for datasets with multiple graphs should be extended. These advancements will make Ontogen more flexible, accessible, and secure, paving the way for its adoption in production environments and opening up new possibilities in RDF data management.

>> Read more about Ontogen and Mud

Open Cloud Mesh — Improved specs and test suite for Open Cloud Mesh protocol

The Open Cloud Mesh protocol, at its core, defines a wonderfully simple JSON payload to notify another server when a user wants to share a folder or file with a user on that server. It is implemented by some major Enterprise File Sync and Share (EFSS) vendors, and used in production by several serious organisations - including major National Research and Education Networks (NRENs). But its specification and test suite are still lacking in substance and quality. In this project we will improve the specification text, flesh it out to a more strictly defined (RFC-style) text that addresses all aspects and considerations of the protocol. In addition we improve the test suite so that it can be run in Continuous Integration (CI) instead of requiring frequent manual intervention, and clarify any incompatibilities we find between implementations.

>> Read more about Open Cloud Mesh

Open Everything Facts — Powering consumer choice on anything with a bar code

When we started Open Food Facts, it already seemed like a bold endeavour to compile comprehensive food product data into a single database, with far-reaching positive impacts, and the rest is history. Why not extend this concept further? Why should consumers not have the same level of informed decision-making power for products beyond food, like their shampoo, bicycles, refrigerators, or ventilation systems? Our ambition is to integrate our existing product databases — Open Food Facts, Open Product Facts, Open Beauty Facts, and Open Pet Food Facts — into one unified, easy-to-navigate mobile application. This will include a universal scan, a new unified versatile and simplified product page, simplified personal and private preferences, as well as the matching contribution experience. Ultimately, this project is a stride towards a world where transparency and informed choices are the norms, not the exception, in every aspect of consumer goods.

>> Read more about Open Everything Facts

Personal Food Facts — Privacy protecting personalized information about food

Open Food Facts is a collaborative database containing data on 1 million food products from around the world, in open data. This project will allow users of our website, mobile app and our 100+ mobile apps ecosystem, to get personalized search results (food products that match their personal preferences and diet restrictions based on ingredients, allergens, nutritional quality, vegan and vegetarian products, kosher and halal foods etc.) without sacrificing their privacy and having to send those preferences to us.

>> Read more about Personal Food Facts

OpenStreetMap-NG — Alternative implementation of OpenStreetMap

OpenStreetMap-NG is an innovative rethinking of how open mapping platforms can be built and maintained, as an alternative to the current openstreetmap.org setup. Leveraging Python and other widely used technologies and guided by user-centric design principles, this project creates a more accessible, privacy-respecting, and developer-friendly mapping platform. By prioritizing both solid technical foundations and ease of use, OpenStreetMap-NG wants to make open-source mapping more approachable while pushing the boundaries of what's possible.

>> Read more about OpenStreetMap-NG

Open Web Calendar Stack — Aggregate public and private web calendars

The Open Web Calendar stack is an open-source set of Python libraries and programs which read and write calendars based on the iCalendar standard. The Open Web Calendar displays a highly configurable website that can be embedded to show a calendar. Currently, ICS URLs are supported and a goal is to also support CalDAV.

Amongst the used libraries is the popular icalendar library to parse and write iCalendar (RFC5545) information. This cornerstone of Python's ecosystem requires some work to be up-to-date with common practice such as updating the timezone implementation. The updates to the icalendar library will be tested and also pushed up the stack to the Open Web Calendar.

The recurrence calculation of events is done by the python-recurring-ical-events library. Changes to icalendar will be tested against this library to find compatibility issues. As the iCalendar standard has been updated, recurrence calculation is affected, too. These updates need to be evaluated and possibly implemented for both icalendar and the recurrence calculation.

By implementing changes at the base, the whole stack is improved. We can use the Open Web Calendar project to make sure that possible transitions and updates are mapped out and communicated to other projects in the ecosystem. Improving a FOSS solution thus spreads the accessibility of iCalendar.

>> Read more about Open Web Calendar Stack

Organic Maps сonvergent UI with Qt Quick/Kirigami — Declarative cross-platform UI for navigation

Maps navigation software is a crucial part of computer systems today, be it on Mobile, Desktop, Automotive and so on. For quite a lot time already, we have a brilliant open-source maps application, now named Organic Maps. It's features make it strong competitor to commercial-grade software, among them are: privacy, fully offline maps, low battery consumption, navigation, points of interest (POI) and much more. Currently, the application shows it's strength on mainstream mobile operating systems only. On other systems, it's ability is quite limited, mainly because of lack of proper User Interface for them.

This project aims to create an Organic Maps convergent touch-friendly User Interface for Linux, backed by featured Qt Quick/QML application framework, perfectly suitable for this task. This would allow feature-parity for Mobile and Desktop Linux systems, and also creates solid ground for further unification of the User Interface among other platforms.

p2panda: group encryption and capabilities — Add group encryption and capabilities to peer-to-peer SDK

p2panda is a protocol and SDK for building decentralised applications with authenticated data, which is stored and synced between computers. Most p2p protocols, including p2panda, face problematic security and privacy challenges, where sensitive data is distributed in a trust-less network. This application aims at the integration of a secure data encryption and fine-grained capability layer to give users more control and protection of their data.

Scaleable data encryption for large groups in a decentralised network is hard and has always involved a trade-off between UX and security. We believe that MLS is the first Internet Engineering Task Force (IETF) standard to tackle some of these challenges. p2p applications of all kinds, will benefit from a protocol that gives them a distributed, strongly encrypted database stack. MLS assures Post-Compromise Security (PCS) and Forward Secrecy (FS) and still stays performant for large groups. While MLS is capable of working in a decentralised environment it hasn’t been explicitly specified for it. With p2panda we have all the building blocks to realize MLS in a fully decentralised setting.

Highly collaborative p2p and offline-first applications require a robust capability system which facilitates giving and revoking permissions to/from identities on the network. With such a system it becomes possible to give permissions for certain actions to other authors or link devices which should be grouped under a single identity.

>> Read more about p2panda: group encryption and capabilities

PRESC Classifier Copies Package — Implementing Machine Learning Copies as a Means for Black Box Model Evaluation and Remediation

The ubiquitous use over the Internet, and in particular in search engines, of often proprietary black-box machine learning models and APIs in the form of Machine Learning as a Service, makes it very difficult to control and mitigate their potential harmful effects (such as lack of transparency, privacy safeguards, robustness, reusability or fairness). Machine Learning Classifier Copying allows us to build a new model that replicates the decision behaviour of an existing one without the need of knowing its architecture nor having access to the original training data. A suitable copy allows to audit the already deployed model, mitigate its shortcomings, and even introduce improvements, without the need to build a new model from scratch, which requires access to the original data.

This project aims to implement a practical solution of this innovative technique into PRESC, an existing free software tool for the evaluation of machine learning classifiers, so that classifier copies are automated and can be easily created by developers using machine learning, in order to reuse, evaluate, mitigate and improve black-box models, ensure a personal data privacy safeguard into their machine learning models, or for any other application.

>> Read more about PRESC Classifier Copies Package

Panoramax — Digital, collaborative immersive street level imagery

Panoramax is an immersive views project. It is a digital, collaborative, free and open community. Access to the photos is free. Panoramax operates as an instance or federation of instances for hosting images. Today, most contributions are made using web interfaces that are not suitable for smartphones. However, this is an important lever for increasing the number of contributions. The aim of the “A mobile app for Panoramax” project is to enable contributions from smartphones, while making them easy for everyone. The application will enable geolocated and sequenced photos to be taken and uploaded to the various community instances.

>> Read more about Panoramax

PeerDB Search — Search for semantic and full-text data

PeerDB Search is an opinionated but flexible open source search system incorporating best practices in search and user interfaces and experience to provide intuitive, fast, and easy to use search over both full-text data and semantic data exposed as facets. The goal of the user interface is to allow users without technical knowledge to easily find results they want, without having to write queries. The system will also allow multiple data sources to be used and merged together. As a demonstration PeerDB will deploy a public instance as a search service for Wikipedia articles and Wikidata data.

>> Read more about PeerDB Search

Poliscoops — Make political news and online debate accessible

PoliFLW is an interactive online platform that allows journalists and citizens to stay informed, and keep up to date with the growing group of political parties and politicians relevant to them - even those whose opinions they don't directly share. The prize-winning polical crowdsourcing platform makes finding hyperlocal, national and European political news relevant to the individual far easier. By aggregating the news political parties share on their websites and social media accounts, PoliFLW is a time-saving and citizen-engagement enhancing tool that brings the internet one step closer to being human-centric. In this project the platform will add the news shared by parties in the European Parliament and national parties in all EU member states. , showcasing what it can mean for access to information in Europe. There will be a built-in translation function, making it easier to read news across country borders. PoliFLW is a collaborative environment that helps to create more societal dialogue and better informed citizens, breaking down political barriers.

>> Read more about Poliscoops

Polyglot jaq — Data wrangling tool focusing on correctness, speed, and simplicity.

Data often needs to be processed going from one tool to another. Doing that is potentially a point of failure, as 'quick and dirty' solutions often fail to take into account edge cases. This project will build on top of Jaq, a Rust re-implementation of the widely popular jq syntax with rigorously defined semantics, and extend its approach to other data formats - from legible formats such as XML, YAML, TOML, CSV and Markdown to binary formats. For the latter, the project builds on the versatile parsing toolbox of Kaitai Struct.

>> Read more about Polyglot jaq

Pomme d’API — Improvements around the Open Food Facts API

Open Food Facts is an open and collaborative database of 3.5M food products from around the world. This project will improve the Open Food Facts API to make it easier for the 250+ apps and services that use it daily to access and contribute food products data. In particular, it will focus on providing easier means to contribute photos and data, better structured data, OpenAPI specifications, and extensive documentation.

>> Read more about Pomme d’API

Private Searx — Add private resources to the open source Searx metasearch engine

Searx is a popular meta-search engine letting people query third party services to retrieve results without giving away personal data. However, there are other sources of information stored privately, either on the computers of users themselves or on other machines in the network that are not publically accessible. To share it with others, one could upload the data to a third party hosting service. However, there are many cases in which it is unacceptable to do so, because of privacy reasons (including GPPR) or in case of sensitive or classified information. This issue can be avoided by storing and indexing data on a local server. By adding offline and private engines to searx, users can search not only on the internet, but on their local network from the same user interface. Data can be conveniently available to anyone without giving it away to untrusted services. The new offline engines would let users search in local file system, open source indexers and data bases all from the UI of searx.

>> Read more about Private Searx

PyCM — Evaluate the performance of ML algorithms

The outputs and results of machine learning algorithms are usually in the form of confusion matrices. PyCM is an open source python library for evaluating, quantifying, and reporting the results of machine learning algorithms systematically. PyCM provides a wide range of confusion matrix evaluation metrics to process and evaluate the performance of machine learning algorithms comprehensively. This open source library allows users to compare different algorithms in order to determine the optimal one based on their preferences and priorities. In addition, the evaluation can be reported in different formats. PyCM has been widely used as a standard and reliable post-processing tool in the most reputed open-source AI projects like TensorFlow similary, Google's scaaml, torchbearer, and CLaF.

>> Read more about PyCM

PyCM — Machine learning post-processing and analysis

PyCM is an open-source Python library designed to systematically evaluate, quantify, and report the performance of machine learning algorithms. It offers an extensive range of metrics to assess algorithm performance comprehensively, enabling users to compare different models and identify the optimal one based on their specific requirements and priorities. Additionally, PyCM supports generating evaluation reports in various formats. Widely recognized as a standard and reliable post-processing tool, PyCM has been adopted by leading open-source AI projects, including TensorFlow, Google’s scaaml, Torchbearer, and CLaF. In this grant, the team will implement several new features, such as data distribution analysis, dissimilarity / distance matrices and curve analysis. In addition the project will improve benchmarking and confidence, and introduce an API and GUI for wider adoption.

>> Read more about PyCM

Re-isearch Schmate — Extending re-Isearch with a flat vector datatype for embeddings

Schmate is the development name for the evolving next iteration of re-Isearch adding vector datatypes for embeddings and applications like retrieval augmented generation (RAG). Schmate (pronounced "SHMAH-teh") is Yiddish for rag (שמאטע).

In contrast to typical vector stores the proposed re-Isearch+ shall offer a full passage information retrieval system (index and retrieval) using a combination of dense and sparse vectors as well as structure. It is dense passage retrieval (DPR) and a whole lot more. It addresses the stumbling blocks of chunking, has a tight integration of ingest, tokenisation, a number of alternative vector stores and similarity algorithms and, above all, uses a novel combination of understanding document structure (implicit and explicit) to provide a better contextual passage retrieval to solve the problem of misaligned context. This builds on the observation that meaning is also communicated through structure so needs to be viewed in the context of structure. Since structure like the words are meant by the sender (writer) to be received and understood (reader) our approach is to exploit the original author's organization of content to determine appropriate passages rather than relying solely on the chunks.

>> Read more about Re-isearch Schmate

SCION-enabled IPFS and libp2p — Enhancing IPFS Performance and Resilience through SCION's Path-Aware Networking

SCION is a clean-slate Next-Generation Internet (NGI) architecture which offers a.o. multi-path and path-awareness capabilities by design. Moreover, SCION was designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. As a result, the SCION architecture provides strong resilience and security properties as an intrinsic consequence of its design. The goal in this project is to leverage the path-awareness in SCION to align the storage and lookup in IPFS with the underlying network in an optimal manner, while at the same time using SCION to establish trust between the entities.

>> Read more about SCION-enabled IPFS and libp2p

SCION-Pathdiscovery — Secure and reliable decentralized storage platform

With the amount of downloadable resources such as content and software updates available over the Internet increasing year over year, it turns out not all content has someone willing to serve all of it up eternally for free for everyone. And in other cases, the resources concerned are not meant to be public, but do need to be available in a controlled environment. In such situations users and other stakeholders themselves need to provide the necessary capacity and infrastructure in another, collective way.

This of course creates new challenges. Unlike a website you can follow a link to or find through a standard search engine and which you typically only have to vet once for security and trustworthiness, the distributed nature of such a system makes it difficult for users to find the relevant information in a fast and trustworthy manner. One of the essential challenges of information management and retrieval in such a system is the location of data items in a way that the communication complexity remains scalable and a high reliability can be achieved even in case of adversaries. More specifically, if a provider has a particular data item to offer, where shall the information be stored such that a requester can easily find it? Moreover, if a user is interested in a particular information, how does he discover it and how can he quickly find the actual location of the corresponding data item?

The project aims to develop a secure and reliable decentralized storage platform enabling fast and scalable content search and lookup going beyond existing approaches. The goal is to leverage the path-awareness features of the SCION Internet architecture to use network resources efficiently in order to achieve a low search and lookup delay while increasing the overall throughput. The challenge is to select suitable paths considering those performance requirements, and potentially combining them into a multi-path connection. To this end, we aim to design and implement optimal path selection and data placement strategies for a decentralized storage system.

>> Read more about SCION-Pathdiscovery

Geographic tagging of Routing and Forwarding — Geographic tagging and discovery of Internet Routing and Forwarding

SCION is the first clean-slate Internet architecture designed to provide route control, failure isolation, and explicit trust information for end-to-end communication. As a path-based architecture, SCION end-hosts learn about available network path segments, and combine them into end-to-end paths, which are carried in packet headers. By design, SCION offers transparency to end hosts with respect to the path a packet travels through the network. This has numerous applications related to trust, compliance, and also privacy. By better understanding of the geographic and legislative context of a path, users can for instance choose trustworthy paths that best protect their privacy. Or avoid the need for privacy intrusive and expensive CDN's by selecting resources closer to them. SCION is the first to have such a decentralised system offer this kind of transparency and control to users of the network.

>> Read more about Geographic tagging of Routing and Forwarding

SES - SimplyEdit Spaces — SimplyEdit Spaces - collaborative presentations

SimplyPresent allows users to collaboratively create and deliver good looking presentation using CRDT's through Hyper Hyper Space - another project supported by NGI Assure. SimplyPresent is itself based on top of the open source SimplyEdit tool, adding advanced user-friendly presentation features. SimplyPresent allows team members to live edit a presentation and the presenter notes while the presentation is being given, control the presentation from any phone without complicated setup: all that is needed on the presenting system or with remote viewers is a URL which will sync through Hyper Hyper Space.

>> Read more about SES - SimplyEdit Spaces

SOLID Data Workers — Toolkit to ingest data into SOLID

Solid Data Workers is a toolkit to leverage the Solid platform (an open source project led byTim Berners-Lee) into a viable, convenient, open and interoperable alternative to privacy-hungry data silos. The aim is to use Solid as a general purpose storage for all of the user's private information, giving them a linked-data meaning to enrich the personal graph and provide a first-class semantic web experience. The project involves a PHP and a NodeJS implementation of the "Data Workers" toolkit to easy the "semantification" of the data collected from external services (SPARQL queries build, metadata retrieval and storage, relationships creation...), some sample software component to import existing data into the semantic graph and keep it synchronized with back-end sources (primarily: emails and calendars), and a proof-of-concept application to showcase the potentials of the semantic web applied to personal linked data. As Solid may be self-hosted or hosted by third-party providers, Solid Data Workers may be attached to any of those instances and to different back-end services.

>> Read more about SOLID Data Workers

SWH package manager Data Ingestion — Add Package managers to Software Heritage

Software Heritage's ambition is to collect, preserve, and share all software that is publicly available in source code form. In this project we improve the SWH scanner tool which compares any set of files with the SWH archive. This is very useful for detecting license violations or security issues. The goal of the project is to take the scanner from a research prototype to a widely available and usable tool. This involves work around its packaging, user interface, robustness and performance. We will be re-purposing the advanced graph-comparison algorithm from the Mercurial DVCS to minimize the load to the SWH archive. We will also expand the list of existing source code origins we will create new listers and loaders for Maven, Go, Packagist, RubyGems, Bower, CPAN and pub.dev/Dart package managers.

>> Read more about SWH package manager Data Ingestion

Storing Efficiently Our Software Heritage — Faster retrieval within Software Heritage

Software Heritage (https://www.softwareheritage.org) is the single largest collection of software artifacts in existence. But how do you store this in a way that you can find something fast enough, taking into account that these are billions of files with a huge spread in file sizes? "Storing Efficiently Our Software Heritage" will build a web service that provides APIs to efficiently store and retrieve the 10 billions small objects that today comprise the Software Heritage corpus. It will be the first implementation of the innovative object storage design that was designed early 2021. It has the ability to ingest the SWH corpus in bulk: it makes building search indexes an order of magnitude faster, helps with mirroring etc. The project is the first step to a more ambitious and general purpose undertaking allowing to store, search and mirror hundreds of billions of small objects.

>> Read more about Storing Efficiently Our Software Heritage

SeedVault Integrity — Add integrity checking and WebDAV support to SeedVault Android backups

SeedVault Backup is an independent open-source app data backup application for Android and derived mobile operating systems. By storing Android users' data and files in a place the user chooses, and by using client-side encryption to protect backed-up data, SeedVault offers users maximum data privacy and resilience with minimal hassle.

SeedVault uses Android's storage access framework (SAF) to read and write encrypted app data. This allows it to backup and restore application data on a wide range of platforms (such as Nextcloud) and even USB flash drives.

The project will improve the current implementation to allow storing files also on generic WebDAV-based storage without the SAF abstraction layer for improved performance and reliability. It will be possible to decide what apps and files should be restored and to verify the integrity of the backups made.

>> Read more about SeedVault Integrity

SensifAI — AI driven image tagging

Billions of users manually upload their captured videos and images to cloud storages such as Dropbox, Google Drive and Apple iCloud straight from their camera or phone. Their private pictures and video material are subsequently stored unprotected somewhere else on some remote computer, in many cases in another country with quite different legislation. Users depend on the tools from these service providers to browse their archives of often thousands and thousands of videos and photo's in search of some specific image or video of interest. The direct result of this is continuous exposure to cyber threats like extortion and an intrinsic loss of privacy towards the service providers. There is a perfectly valid user-centric approach possible in dealing with such confidential materials, which is to encrypt everything before uploading anything to the internet. At that point the user may be a lot more safe, but from now on would have a hard time locating any specific videos or images in their often very large collection. What if smart algorithms could describe the pictures for you, recognise who is in it and you can store this information and use it to conveniently search and share? This project develops an open source smart-gallery app which uses machine learning to recognize and tag all visual material automatically - and on the device itself. After that, the user can do what she or he wants with the additional information and the original source material. They can save them to local storage, using the tags for easy search and navigation. Or offload the content to the internet in encrypted form, and use the descriptions and tags to navigate this remote content. Either option makes images and videos searchable while fully preserving user privacy.

>> Read more about SensifAI

Smart lookup & inference for Semantic Data — Knowledge mapping within a postgresql database

Semantic knowledge representations have not evolved since the Semantic Web was proposed during the 1990s. Modern graph databases offer new possibilities for knowledge representation, but the methods are poorly developed and require the use of specialized query languages and clumsy outdated formats. This project aims to make semantic maps easy for general use, using standard SQL databases and modern lightweight data formats. A user workflow starts from a simple note-taking language, then ingesting into a database using a graph model based on the causal semantic spacetime model, to the use of a simple web application for supporting graph searches and data presentation. The aim is to make a generally useful library for incorporating into other applications, or running as a standalone notebook service.

>> Read more about Smart lookup & inference for Semantic Data

Software Heritage — Collect, preserve and share the source code of all software ever written

Software Heritage is a non profit, multi-stakeholder initiative with the stated goal to collect, preserve and share the source code of all software ever written, ensuring that current and future generations may discover its precious embedded knowledge. This ambitious mission requires to proactively harvest from a myriad source code hosting platforms over the internet, each one having its own protocol, and coping with a variety of version control systems, each one having its own data model. This project will amongst other help ingest the content of over 250000 open source software projects that use the Mercurial version control system that will be removed from the Bitbucket code hosting platform in June 2020.

>> Read more about Software Heritage

Peer-to-Peer Access to Our Software Heritage — Access Software Heritage data via IPFS DHT

Peer-to-Peer Access to Our Software Heritage (SWH × IPFS) is a project aimed at supporting Software Heritage’s mission to build a universal source code archive and preserve it for future generations by leveraging IPFS’s capabilities to share and replicate the archive inadecentralized, peer-to-peer manner. The project will build a bridge between the existing Software Heritage (SWH) API and the IPFS network to transparently serve native IPFS requests for SWH data. In the short term, this allows users using IPFS to form their own Content Distribution Network for SWH data. Longer term, we hope this will serve as a foundation fora decentralized network of copies that, together, ensure that the loss of no one repository, however large, results in the permanent destruction of any part of our heritage. The end product would be a perfect application of IPFS’s tools and a step in the direction of a decentralized internet services infrastructure.

>> Read more about Peer-to-Peer Access to Our Software Heritage

Solid NC 2024 — Add more Solid capabilities to Nextcloud

The Solid Nextcloud project implemented a server component with the Solid specification for Nextcloud, which makes ones Nextcloud server a Solid server as well. This allows user to user their existing server for identity and storage within the Solid eco-system.

To enhance security and to enable easier cooperation and release of new versions we need to improve a number of things. The CI/CD of the project will be improved. Based on an earlier audit, we will implement a number of security enhancing features and we will release a PHP Solid Server next to the Solid Nextcloud module. These servers share a lot of code, which makes maintenance easier. The advantage is that PHP has a security maintenance cycle of three years, making it easier for users to stay secure when using a Solid server.

>> Read more about Solid NC 2024

Solid Compound — A software library/framework to simplify designing for W3C Solid

Solid Compound is an innovative library designed to streamline the integration of web applications into the Solid ecosystem. It provides functionality to Solid App developers to make their Solid Apps usable without end-users needing a Solid Pod or a WebID. This lowers the barrier of entry for new end-users and allows everyone to use newly crafted and innovative Solid applications.

Solid Compound offers a hybrid data storage approach, allowing for data to be stored either in the application's datastore (but Solid-ready) or in the user's Solid pod. It also enables user authentication (either done by the application or Solid-OIDC). This merging of traditional web development with Solid-compatible systems also extends the functionality to include a feature that enables data and identity migration from an application's datastore to a user's pod when they are ready.

The hybrid approach ensures a smooth transition towards a more decentralized web, while simultaneously broadening the reach of Solid developers to users who may not yet be familiar with the Solid ecosystem.

>> Read more about Solid Compound

Solid Data Modules — Improve data accessibility and prevent data corruption in Solid Pods

The Solid Project enables a "Bring your own Data" architecture, but this is only useful if apps understand the data they find on the pod.

Client-client specs are the crucial but underdeveloped core part of the Solid project which needs urgent attention now. Solid Data Modules will build on the existing remoteStorage modules work and the Solid Application Interoperability spec. They will support the data types already documented in the PDS Interop (https://pdsinterop.org/conventions/overview) and Shaperepo (https://shaperepo.com) initiatives.

Apart from making data more easily accessible, reliably updating index files, and preventing data corruption, the Solid Data Modules will also automatically show the app developer which fine-grained Data Grants to request. That way, we hope to finally stop the bad practice of even demo apps that request root access to your pod.

>> Read more about Solid Data Modules

Solid Application Interoperability —

Solid Application Interoperability specification details how Agents in the Solid ecosystem can read, write, and manage data stored in a Solid pod using disparate Applications, either individually or in collaboration with other Agents. Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone's Pod, they control which people and applications can access it. Solid was initiated and is currently led by the inventor of the World Wide Web, sir Tim Berners-Lee. Solid Application Interoperability provides clear way to create intuitive data boundaries and higher level patterns to manage access to that data following the principle of least privilege. In this follow up project there is a focus on implementing the Authorization Agent service in TypeScript. It will also work on the SAI specification, which needs to provide more details on how the agent who receives access grant gets updated when the access grant is replaced by a new one. The Authorization Agent service will also implement server to server subscription type developed in the Solid Authentication panel.

>> Read more about Solid Application Interoperability

Solid Usable App Tools Project — Improve developer experience for W3C Solid

The Solid project is one of the best known efforts promising to bring individual data ownership to the people of Europe and the world. While Solid has many use cases, a common example is an alternative to Facebook, Instagram, and Twitter where a user can own their own social media data. But, Solid's current specification, implementations, and developer tools are not yet able to support a full-fledged social media alternative. This project will aide the ongoing specification and developer tool development for Solid by filling in the gaps that are currently preventing a "home-run" app from being created on Solid.

Particular areas of concern for this project are: Authentication for Mobile Apps and Bots, Real-Time Notifications, and Easier Devtools (which caters also for developer that lack much prior knowledge of linked data). In addition, the project will produce a tutorial series to make developing apps on Solid as easy as learning how to use more mainstream technologies like React.

>> Read more about Solid Usable App Tools Project

Secure User Interfaces (Spritely) — Usability of decentralised social media

Spritely is a project to advance the federated social network by adding richer communication and privacy/security features to the network. This particular sub-project aims to demonstrate how user interfaces can and should play an important role in user security. The core elements necessary for secure interaction are shown through a simple chat interface which integrates a contact list as an easy-to-use implementation of a "petname interface". Information from this contact list is integrated throughout the implementation in such a way that helps reduce phishing risk, aids discovery of meeting other users, and requires no centralized naming authority. As an additional benefit, this project will demonstrate some of the asynchronous network programming features of the Spritely development stack.

>> Read more about Secure User Interfaces (Spritely)

Standards Grammar Catalog/Toolchain — Open Standards Grammar Catalog/Toolchain

The Open Standards Grammar Catalog/Toolchain makes it easier to implement a format or protocol by translating its machine-readable definition, usually in a language such as ABNF, into forms readily compatible with popular programming languages, like regular expressions, YACC, ANTLR, and native code. By providing a toolchain for making these translations, assembling a catalog of commonly used formats & protocols, and publishing a developer-friendly website for browsing the grammars and generating translations, these tools will reduce the need to manually write a parser, ultimately reducing errors due to hand-written code, and enhancing interoperability.

>> Read more about Standards Grammar Catalog/Toolchain

Stencila v2 for ERA and EPP — Add editable, runnable code to scientific publications

Stencila offers a platform for collaborating on, and publishing, dynamic, data-driven content with the aim of lowering the barriers for creating data-driven documents and making it easier to create beautiful, interactive, and semantically rich, articles, web pages and applications from them. The latest version, a rewrite in Rust, is aimed at leveraging two relatively recent and impactful innovations: conflict-free replicated data types (CRDTs), for de-centralized collaboration and version control, and large language models (LLMs) for assisting in writing and editing prose and code. These technologies used together provide an advance in scholarly communication of research findings by powering the Enhanced Preprint Platform and Executable Research Articles at publishing venues such as eLife and GigaScience.

>> Read more about Stencila v2 for ERA and EPP

StreetComplete — Fix open geodata with OpenStreetMap

The project will make collecting data for OpenStreetMap easier and more efficient. OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes. Improving OSM increases quality of services using open data rather than proprietary datasets kept as a trade secret by established companies.

>> Read more about StreetComplete

StreetComplete/AllThePlaces — Ingest data from AllThePlaces into StreetComplete

This project will contribute to more accurate data about shops and other businesses in OpenStreetMap, by suggesting mappers at which places shops might be missing. The detection of places where a shop may exist but nothing is mapped in OpenStreetMap will be powered by the All The Places project, which crawls store location webpages across of many businesses. Mappers will thus be able to quickly add a shop to OpenStreetMap, after adjusting location as needed.

>> Read more about StreetComplete/AllThePlaces

StreetComplete — Collaborative editing in OpenStreetMap

StreetComplete is a mobile app that makes it easy and fun to contribute to OpenStreetMap while on and about. OpenStreetMap is the largest open data community about maps, and the go-to source for free geographic data when doing a location-based search. This project focuses on making the collection of data to be used in a search more powerful and efficient. More specifically, the main goals are to add the possibility to collect more data with an easy interface and to add a new view in which it shall be more efficient to complete and keep up-to-date certain types of data, such as housenumbers or cycleways.

>> Read more about StreetComplete

StreetComplete UX — Improve usability of StreetComplete

OpenStreetMap is the best source of information for general purpose search engines that need a geographic data about locations and properties of various objects. The objects vary from cities and other settlements to shops, parks, roads, schools, railways, motorways, forests, beaches etc etc etc. The search engine can use the data to answer queries such as "route to nearest wheelchair accessible greengrocer", "list of national parks near motorways" or "London weather". Full OpenStreetMap dataset is publicly available on an open license and already used for many purposes.

The project will make collecting open data for OpenStreetMap easier and more efficient, and lower the threshold for contribution by improving usability and accessibility. Any user should be able to help improve OpenStreetMap data, simply by downloading the app from F-droid or Google store and map as they walk.

>> Read more about StreetComplete UX

GNU Taler Wallet ID Lookup Service — Optional discovery of TALER wallet addresses linked to digital identities

GNU Taler is a payment system that makes privacy-friendly online transactions fast and easy. This project will facilitate the support of peer-to-peer payments (P2P) for the GNU Taler payment system between users by implementing a privacy- friendly directory service and lightweight inbox service (TALer DIRectory). The services will allow users to securely associate their online identities (such as email addresses, phone numbers, X/Twitter/Mastodon handles or other suitable verifiable addresses and accounts) with their wallet public keys and the URL of an inbox service and use it for P2P payments. Storage and retrieval may also be offloaded to distributed directory services such as DNS or GNS (RFC 9498) instead of a database and web service while maintaining the respective privacy guarantees.

>> Read more about GNU Taler Wallet ID Lookup Service

TypeCell — CRDT-based collaborative block-based editor

TypeCell aims to make software development more open, simple and accessible. TypeCell integrates a live-programming environment as a first-class citizen in an end-user block-based document editor, forming an open source application platform where users can instantly inspect, edit and collaborate on the software they’re using. TypeCell spans a number of different projects improving and building on top of Matrix, Yjs and Prosemirror to advance local-first, distributed and collaborative software for the web.

>> Read more about TypeCell

URL Frontier — Develop a API between web crawler and frontier

Discovering content on the web is possible thanks to web crawlers, luckily there are many excellent open source solutions for this; however, most of them have their own way of storing and accessing the information about the URLs. The aim of the URL Frontier project is to develop a crawler-neutral API for the operations that a web crawler when communicating with a web frontier e.g. get the next URLs to crawl, update the information about URLs already processed, change the crawl rate for a particular hostname, get the list of active hosts, get statistics, etcetera. It aims to serve a variety of open source web crawlers, such as StormCrawler, Heritrix and Apache Nutch.

The outcomes of the project are to design a gRPC schema then provide a set of client stubs from the schema as well as a robust reference implementation and a validation suite to check that implementations behave as expected. The code and resources will be made available under Apache License as a sub-project of crawler-commons, a community that focuses on sharing code between crawlers. One of the objectives of URL Frontier is to involve as many actors in the web crawling community as possible and get real users to give continuous feedback on our proposals.

>> Read more about URL Frontier

variation graph (vgteam) — Privacy enhanced search within e.g. genome data sets

Vgteam is pioneering privacy-preserving variation graphs, that allow to capture complex models and aggregate data resources with formal guarantees about the privacy of the individual data sources from which they were constructed. Variation graphs relate collections of sequences together as walks through a graph. They are traditionally applied to genomic data, where they support the compression and query of very large collections of genomes.

But there are many types of sensitive data that can be represented in a variation graph form, including geolocation trajectory data - the trajectories of individuals and vehicles through transportation networks. Epidemiologists can use a public database of personal movement trajectories to for instance do geophylogenetic modeling of a pandemic like SARS-CoV2. The idea is that one cannot see individual movements, but rather large scale flows of people across space that would be essential for understanding the likely places where a outbreak might spread. This is essential information to understand at scientific and political level how to best act in case of a pandemic, now and in the future.

The project will apply formal models of differential privacy to build variation graphs which do not leak information about the individuals whose data was used to construct them. For genomes, the techniques allow us to extend the traditional models to include phenotype and health information, maximizing their utility for biological research and clinical practice without risking the privacy of participants who shared their data to build them. For geolocation trajectory data, people can share data in the knowledge that their social graph is not exposed. The tools themselves are not limited to the above use cases, and open the doors to many other types of applications both online (web browsing histories, social media usage) and offline. .

>> Read more about variation graph (vgteam)

VersaTiles — Simplify vector map tile creation, hosting, and interaction

VersaTiles provides vital digital infrastructure for web maps, offering a free, flexible alternative to commercial services. Web maps are essential in fields like data journalism, research, and emergency response, but current commercial solutions are often costly, proprietary, and pose privacy concerns. VersaTiles addresses this by dividing the complex process of map creation, distribution, and visualization into manageable layers, ensuring interoperability and scalability. With its open, transparent approach, VersaTiles promotes digital sovereignty in Europe, empowering public institutions, media, and developers with an accessible, high-quality map infrastructure that avoids vendor lock-in and supports free access to geospatial data.

>> Read more about VersaTiles

Vouivre — A dependent type system for machine learning in Lisp

Current machine learning frameworks are built around relatively weak type systems. This is a problem because, at scale, machine learning applications are exceedingly intricate and computationally expensive, therefore making costly runtime errors unavoidable. This is where Vouivre comes into play. Using a dependent-type system, the project aims at enabling users to write machine-learning applications that solve real-world problems with compile-time validation of their correctness, thus preventing runtime errors at a reasonable computational cost.

>> Read more about Vouivre

Independent captions and transcript augmentation — Speech-to-text integration for Waasabi

Waasabi is a highly customizable platform for self-hosted video streaming (live broadcast) events. It is provided as a flexible open source web framework that anyone can host and integrate directly into their existing website. By focusing on quick setup, ease of use and customizability Waasabi aims to lower the barrier of entry for hosting custom live streaming events on one's own website, side-stepping the cost, compromises and limitations stemming from using various "batteries-included" offerings, but also removing the hassle of having to build everything from scratch.

In this project the team seeks to integrate tools for transcript augmentation, augmented human captioning and automatic machine-generated captions using open-source software based on machine learning and royalty-free training data and models. The primary use case is live captioning for live internet broadcasts (primarily video streaming). With such tools online event organizers will be able to create interactive transcripts and better live captions for their events anytime everywhere - and without external dependencies.

WebXray Discovery — Expose tracking mechanism in search hubs

WebXray intends to build a filter extension for the popular and privacy-friendly meta-search Searx that will show users what third party trackers are used on the sites in their results pages. Full transparency of what tracker is operated by what company is provided to users, who will be able to filter out sites that use particular trackers. This filter tool will be built on the unique ownership database WebXray maintains of tracking companies that collect personal data of website visitors.

Mapping the ownership of tracking companies which sell behavioural profiles of individuals, is critical for all privacy and trust-enhancing technologies. Considerable scrutiny is given to the large players who conduct third party tracking and advertising whilst little scrutiny is given to large numbers of smaller companies who collect and sell unknown volumes of personal data. Such collection is unsolicited, with invisible beneficiaries. The ease and speed of corporate registration provides the opportunity for data brokers to mitigate their liability when collecting data profiles. We must therefore establish a systematic database of data broker domain ownership.

The filter extension that will be the output of the project will make this ownership database visible and actionable to end users, and to curate the crowdsourced data and add it to the current database of ownership (which is already comprehensive, containing detailed information on more than 1,000 ad tech tracking domains).

>> Read more about WebXray Discovery

WikiRate: More Sites, More Cites — Persistent citation for Dekko-based open source data collections

WikiRate.org is the largest open source registry of ESG data in the world with more than 3.5 million data points for over 100,000 companies. By bringing this information together in one place and making it accessible, comparable and free for all, we aim to provide society with the tools and evidence needed to help and encourage companies to respond to the world's social and environmental challenges.

To achieve this systemic change we need corporate accountability at scale. Focusing on the top 10, 100, or even 1000 companies, is not sufficient. Rather we need to monitor and understand impacts at industry and value chain levels, whilst leveraging individual corporate accountability to transform companies into positive agents of change.

This follow-up project is focused on adding functionality to the underlying tool (Decko) which will allow in a fine-grained way to point at specific data slices, as well as a history of any updates and corrections to such data.

>> Read more about WikiRate: More Sites, More Cites

WikiRate Insights — Transforming WikiRate ESG Platform User Experience to Maximise Reliable Data Insights

For too long actionable data about the behavior of companies has been hidden behind the paywalls of commercial data providers. As a result only those with sufficient resources were able to advocate and shape improvements in corporate practice. Since launching in 2016, WikiRate.org has become the world’s largest open source registry of ESG (Environmental, Social, and Governance) data with nearly 1 million data points for over 55,000 companies. Through the open data platform anyone can systematically gather, analyze and discuss publicly available information on company practices, joining current debates on corporate responsibility and accountability.

By bringing this information together in one place, and making it accessible, comparable and free for all, we aim to provide society with the tools and evidence it needs to spur corporations to respond to the world's social and environmental challenges. Homing in on the usability of the platform, this project will tackle some of the most crucial barriers for users when it comes to gathering and extracting the data, whilst boosting reuse of the open source platform for other purposes.

>> Read more about WikiRate Insights

WikiRate Insights 2 — Dedicated text search architecture for environmental, social and corporate governance platform

The project summary for this project is not yet available. Please come back soon!

>> Read more about WikiRate Insights 2

Winden/Magic Wormhole dilation — Improving Magic-Wormhole by implementing dilation and multiple file support for the web

Winden is an open-source web app built on the Magic-Wormhole protocol, which allows two devices to connect and exchange data without requiring identity information. We are building Winden to make file-transfers for the web secure and private. With Winden, we are giving users control over their data without them needing to trust us. This project adds support for reconnection (referred to as the ‘Dilation’ protocol) and multiple file-transfers into both Winden and wormhole-william, the Go implementation of Magic-Wormhole used by Winden and other projects. Magic-Wormhole file-transfers require both parties to be online at the same time. Dilation allows for reconnection and changing networks during a transfer. This reduces the risks of connection interruptions during these synchronous transfers. Multiple file support is a much sought after need for transferring data, which requires Dilation (and Dilation’s sub-channels).

>> Read more about Winden/Magic Wormhole dilation

iTowns — Visualise 2D and 3D geospatial data on virtual globes & maps

iTowns is an open-source framework designed for web-based visualisation, navigation and interaction with 2D and 3D geospatial data on globes and maps. Built on Open Geospatial Consortium (OGC) open standards, it is developed with data and service interoperability in mind. It seamlessly integrates with geographical services, offering support of standard raster and vector data, including aerial imagery and terrain models. The framework supports large, heterogeneous 3D datasets such as OGC's 3D Tiles, making it ideal to build application for urban-planning and environmental monitoring. It can be easily extended to support other open formats, offering a highly customizable platform for developers.

iTowns is a geographic commons, developed collectively by a diverse community of contributors, comprising independent developers, public organizations, research laboratories and private companies. It aims to provide an European alternative to Big Tech products which often overlook a broad class of users. Instead, iTowns offers a modular framework to build a wide range of use cases, including visualisation, GIS, environmental and educational applications, making it versatile and adaptable for different geospatial projects.

>> Read more about iTowns

jaq — Implementation of jq in Rust with formal semantics

JSON is a data format that is frequently used to publish Open Data. jq is a widely used programming language that allows citizens to easily process JSON data. There are several tools to run jq programs, including jq, gojq, and jaq. Of these three tools, jaq is the fastest (judging from several benchmarks), despite having the smallest code base. This project centers on improving jaq and the wider jq ecosystem: First, we want to advance the development of jaq, in particular to support more features of jq. Next, we want to make jaq more accessible, by creating JavaScript bindings for jaq. This will allow developers to integrate jaq into websites. Furthermore, this will allow users to run jaq from a browser, respecting their privacy by processing data on their machines. Finally, we want to create formal semantics for jq, based on jaq's execution approach. This will allow users to better understand how jq programs behave.

>> Read more about jaq

openEngiadina — Platform for creating, publishing and using open local knowledge

OpenEngiadina is developing a platform for open local knowledge - a mashup between a semantic knowledge base (like Wikipedia) and a social network using the ActivityPub protocol. openEngiadina is being developed with small municipalities and local organizations in mind, and wants to explore the intersection of Linked Data and social networks - a 'semantic social network'.

openEngiadina started off as a platform for creating, publishing and using open local knowledge. The structured data allows for semantic queries and intelligent discovery of information. The ActivityPub protocol enables decentralized creation and federation of such structured data, so that local knowledge can be created by indepent actors in a certain area (e.g. a music association publishes concert location and timing). The project aims to develop a backend allowing such a platform, research ideas into user interfaces and strengthen the ties between the Linked Data and decentralized social networking communities.

>> Read more about openEngiadina

Privacy Preserving Disease Tracking — Research into contact tracing privacy

In case of a pandemic, it makes sense to share data to track the spread of a virus like SARS-CoV2. However, that very same data when gathered in a crude way is potentially very invasive to privacy - and in politically less reliable environments can be used to map out the social graph of individuals and severely threaten civil rights, free press. Unless the whole process is transparent, people might not be easily convinced to collaborate.

The PPDT project is trying to build a privacy preserving contact tracing mechanism that allows to notify users if they have come in contact with potentially infected people. This should happen in a way that is as privacy preserving as possible. We want to have the following properties: the users should be able to learn if they got in touch with infected parties, ideally only that - unless they opt in to share more information. The organisations operating servers should not learn anything besides who is infected, ideally not even that. The project builds a portable library that can be used across different mobile platforms, and a server component to aggregate data and send this back to the participants.

>> Read more about Privacy Preserving Disease Tracking

PurlValidator — Check validity of software package identifiers online and offline

Package-URL, or PURL, is the de-facto standard for identifying software packages, used by open source SCA tools, SBOM and VEX specs, and vulnerability databases. But using a standard syntax does not prevent errors: A recent (not yet published) study on the quality of software bill of materials (SBoM) revealed that for too often PURLs in SBOMs are still inconsistent, fake, incorrect, or misleading. This is a major impairment to any application of SBOMs, and industry-wide cybersecurity and application security.

The PurlValidator project is a public service, based on PurlDB, to validate all the PURLs. An extension of the purl2all project, PurlValidator validates the PURL syntax against any known PURLs by exposing PurlDB's reference data of 20M+ PURLs. PurlValidator also provides decentralized libraries for offline use that can be integrated in multiple tech stacks for all major ecosystems, beyond what is already available for PURL tools. The goal of this project is to provide an accessible, single source of truth to the security and SBOM ecosystem at large and improve the quality and accuracy of PURLs in use, imperative for CRA compliance.

>> Read more about PurlValidator

uMap — Collaborative custom mapping with OpenStreetMap data

uMap is an online open source application to make custom maps. It aims to make creating maps easy for anyone in a few clicks. It’s simple for basic use cases, whether you want to prepare a bike travel with your friends or communicate the current roadworks for your city. But it’s also flexible and extendable for more complex or custom ones: drawing or importing data, customizing style and interface, sharing access to a map… uMap is also easy to install and to maintain to enforce a decentralized model. It is already deployed in several European countries, and is translated in dozen of languages. Plus, it also allows to create maps anonymously. In this project, we will adding real-time collaboration on maps with local-first support - which will for instance help a lot with live events and mapping sprints - and clean up the user interface.

>> Read more about uMap

uMap Vector Tiles — Use vector tiles to build custom maps with OpenStreetMap data

uMap is a web application which lets you quickly build custom maps with OpenStreetMap’s background layers and integrate them on your own website. Vector tiles allow two main things: less duplicated content, and data transmitted at the same time as the tiles, enabling scenarii where data and background could be styled according to the user needs, which required previously to serve custom tiles.

>> Read more about uMap Vector Tiles

Search

Navigate projects

By theme

Want to help?

Help us by protecting open source and its users with 5 minutes of your time.

Donate today

Your donation can help grow the future!