Send in your ideas. Deadline October 1, 2024
Theme fund: NGI0 Discovery
Start: 2021-08
End: 2022-10

Storing Efficiently Our Software Heritage

Faster retrieval within Software Heritage

Software Heritage ( is the single largest collection of software artifacts in existence. But how do you store this in a way that you can find something fast enough, taking into account that these are billions of files with a huge spread in file sizes? "Storing Efficiently Our Software Heritage" will build a web service that provides APIs to efficiently store and retrieve the 10 billions small objects that today comprise the Software Heritage corpus. It will be the first implementation of the innovative object storage design that was designed early 2021. It has the ability to ingest the SWH corpus in bulk: it makes building search indexes an order of magnitude faster, helps with mirroring etc. The project is the first step to a more ambitious and general purpose undertaking allowing to store, search and mirror hundreds of billions of small objects.

Run by Eeaster-Eggs

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.