Theme fund: NGI0 Discovery
Start: 2021-04
End: 2022-07
Integrate OCR capabilities into open source scanning tools

We have become dependent on search engines, allowing us to locate a document using some specific words across billions of webpages. However, not every document is born digital - or may reach the web via an indirect way. And users with for instance visual disabilities cannot read documents that are 'just' pixels.

The SANE project is a collection of open-source scanner drivers and related software. SANE tools allow the users to convert their documents, photos and any other similar material from a completely unsearchable and non-discoverable analog form into a digital representation, which can be easily shared and distributed.

The SANE-OCR project enables users to close the gap right at the stage when physical documents are converted from their incoming "analog" form to a searchable digital form - using a completely open-source stack. While the traditional result of scanning is just the visual image (essentially a photo), but in addition contains the recognized text using optical character recognition (OCR). This outputs documents which are searchable and discoverable.

Run by Kodo Baitas, MB

