Send in your ideas. Deadline June 1, 2024
Theme fund: NGI0 Discovery
Start: 2020-12
End: 2022-05
More projects like this
Data and AI

OCCRP Aleph disambiguation

OCCRP Aleph: disambiguating different people and companies

Aleph is an investigative data platform that searches and cross-references global databases with leaks and public sources to find evidence of corruption and trace criminal connections. The project will improve the way that Aleph connects data across different data sources and how it ranks recommendations and searches for reporters. Our goal is to establish a feedback loop where users train a machine learning system that will predict if results showing a person or company refer to the same person or company. If successful this means journalists can conduct more efficient research and investigations, finding key information more quickly and wasting less time trawling through irrelevant documents and datasets.

Why does this actually matter to end users?

When you get up in the morning, and read a fine piece of investigative news about a financial scandal, you don' t really stop to think much about how news is produced and what the cost of production can really be. Besides the dangers that journalist (increasingly) run to simply do their job, the massive growth of digital information has not made their investigations any easier.

Just like we need intuitive, advanced technology to index and search the internet, journalists need the right tools to search and verify news, leaks and interesting connections. That is precisely what Aleph is trying to provide: connecting data across global databases with leaks and public sources to find possible evidence of corruption and criminal connections. Instead of relying on only manual labor, which in the time of digital information and fake news can be difficult to maintain, the project will add machine learning capabilities that predict if results that show a particular individual or company actually refer to the same person or entity. This will make the lives of journalists a bit easier and focus their clever minds on actual reporting.

Run by Organized Crime and Corruption Reporting Project

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.