Send in your ideas. Deadline February 1, 2025
Grant
Theme fund: NGI0 Discovery
Period: 2019-04 — 2022-10
More projects like this
Verticals + Search

Transparency Toolkit

A decentralized hosted archiving service with search

This project is archived. Due to circumstances, the project as planned did not take place. This page is left as a placeholder, for transparency reasons and to perhaps inspire others to take up this work.

Transparency Toolkit is building a decentralized hosted archiving service that allows journalists, researchers, and activists to create censorship-resistant searchable document archives from their browser. Users can upload documents in many different file formats, run web crawlers to collect data, and manually contribute research notes from a usable interface. The documents are then OCRed (when needed) and indexed in a searchable database. Transparency Toolkit provides a variety of tools to help analyze and understand the documents with text mining, searching/filtering, and manual collaborative analysis. Once users are ready, they can make some or all of the documents available in a public searchable archive. These archives will be automatically mirrored across multiple instances of the software and the raw data will be stored in a distributed fashion.

Why does this actually matter to end users?

When you get up in the morning, and read a fine piece of investigative news about a financial scandal, you don' t really stop to think much about how news is produced and what the human cost of its production is. Every year, dozens of journalists around the world get killed, because of what they write and who they talk to. Even in democratic countries, people can run the risk of intimidation and retribution. If you happen to be a courageous journalist writing about corruption, gangs or some other social wrong, protecting your sources is more than a matter of principle - it can be a matter of life and death for all parties concerned. So journalists and other vulnerable groups like civil society groups need to be very careful.

But at the same time they of course need to collaborate. Investigative reporting it is often the combined intelligence and data gathering of many that allows them to see otherwise invisible or indiscernible patterns. That means people will have to deal with significant if not massive amounts of documents and data. As a collective, they need to find their way inside these materials to discover the information they need. But of course no conventional search engine can help them, because the resources they have are not all public and could actually cause real trouble to for instance whistleblowers inside corrupt institutions should they leak to the wrong people.

Transparency Toolkit provides journalists, activists and other actors that need to control their communication with a closed off searchable database within their browser. Users can setup their own database and fill it with various documents and file formats which contents can be further analyzed and searched. To make these databases even more resistant to censorship, the archived documents will be stored across various locations to avoid central points of failure.

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.