Storing Efficiently Our Software Heritage
Faster retrieval within Software Heritage
Software Heritage (https://www.softwareheritage.org) is the single largest collection of software artifacts in existence. But how do you store this in a way that you can find something fast enough, taking into account that these are billions of files with a huge spread in file sizes? "Storing Efficiently Our Software Heritage" will build a web service that provides APIs to efficiently store and retrieve the 10 billions small objects that today comprise the Software Heritage corpus. It will be the first implementation of the innovative object storage design that was designed early 2021. It has the ability to ingest the SWH corpus in bulk: it makes building search indexes an order of magnitude faster, helps with mirroring etc. The project is the first step to a more ambitious and general purpose undertaking allowing to store, search and mirror hundreds of billions of small objects.
- The project's own website: https://wiki.softwareheritage.org/wiki/A_practical_approach_to_efficiently_store_100_billions_small_objects_in_Ceph
Run by Eeaster-Eggs
This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.