Theme fund: NGI0 Discovery
Start: 2019-12
End: 2022-10
Software Heritage

Collect, preserve and share the source code of all software ever written

Software Heritage is a non profit, multi-stakeholder initiative with the stated goal to collect, preserve and share the source code of all software ever written, ensuring that current and future generations may discover its precious embedded knowledge. This ambitious mission requires to proactively harvest from a myriad source code hosting platforms over the internet, each one having its own protocol, and coping with a variety of version control systems, each one having its own data model. This project will amongst other help ingest the content of over 250000 open source software projects that use the Mercurial version control system that will be removed from the Bitbucket code hosting platform in June 2020.

Why does this actually matter to end users?

How do you preserve a piece of software for prosperity? You might have a box of floppy disks in your attic somewhere, a treasure trove of games and programs you fired up daily in your childhood. Physical memory can be a great way to store digital data, but do you know anyone that still has a computer with a floppy drive? Better yet, does your laptop or computer even have a CD drive? The internet can provide a better data archive, but this still requires maintenance: everything that is online needs to be physically stored somewhere and once data is lost, it is lost forever. So how do you organize archiving data and software?

Software Heritage is an organized effort to preserve all the software ever written. The programs we use everyday say something about how we interact with our devices and connect with each other through technology. What can our software do, how can we use it, understand it, make it work for us? Preserving these programs is a constant and challenging mission, as software hosting never is a given.

For this project the software preserving community will focus on making sure certain open source software is saved from the digital black hole, as a particular code version control system will be discontinued soon. Software Heritage will make sure these programs are preserved so we can learn from them, to make even better and more human software in the future.

Run by Software Heritage

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.