First Classify Documents
Categorise different types of official documents
With governments all over the world turning to digital filing systems, millions of paper files still wait to be digitized. One major challenge in this process is a structured approach to classifying and ordering documents. It is an unfortunate fact that many public documents are bitmap images of texts. For instance, tenders are published digitally but the actual resulting contracts are not published in a way that allows them to be indexed and queried - which hinders civil society in their ability to access these documents. Open source OCR software needs to become better to get good results with this. This project developed a system for models to distinguish between different types of official documents. able to classify state documents according to structure, keywords, document name, word and page count, metadata and context.
Run by Open Knowledge Foundation
This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.