First Classify Documents

Categorise different types of official documents

With governments all over the world turning to digital filing systems, millions of paper files still wait to be digitized. One major challenge in this process is a structured approach to classifying and ordering documents. It is an unfortunate fact that many public documents are bitmap images of texts. For instance, tenders are published digitally but the actual resulting contracts are not published in a way that allows them to be indexed and queried - which hinders civil society in their ability to access these documents. Open source OCR software needs to become better to get good results with this. This project developed a system for models to distinguish between different types of official documents. able to classify state documents according to structure, keywords, document name, word and page count, metadata and context.

Run by Open Knowledge Foundation

This project was funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement N^o 825322.

Navigate projects

Job openings

NGI Zero is looking for regional representatives.

Currently open for proposals:

Donate today

And help us support the open internet!

First Classify Documents

Navigate projects

Job openings

Currently open for proposals:

Donate today

Search