Great OCR for SANE
Integrate OCR capabilities into open source scanning tools
We have become dependent on search engines, allowing us to locate a document using some specific words across billions of webpages. However, not every document is born digital - or may reach the web via an indirect way. And users with for instance visual disabilities cannot read documents that are 'just' pixels.
The SANE project is a collection of open-source scanner drivers and related software. SANE tools allow the users to convert their documents, photos and any other similar material from a completely unsearchable and non-discoverable analog form into a digital representation, which can be easily shared and distributed.
The SANE-OCR project enables users to close the gap right at the stage when physical documents are converted from their incoming "analog" form to a searchable digital form - using a completely open-source stack. While the traditional result of scanning is just the visual image (essentially a photo), but in addition contains the recognized text using optical character recognition (OCR). This outputs documents which are searchable and discoverable.
- The project's own website: https://gitlab.com/sane-project/frontend/sanescan
Run by Kodo Baitas, MB
This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.