Send in your ideas. Deadline June 1, 2024
End: 2002-01


parser generator system for natural languages

The goal of this project is to make the AGFL (Affix Grammars over a Finite Lattice) linguistic parser generator system publicly available as a tool for the development of NLP (Natural Language Processing) applications.

The AGFL formalism for the syntactic description of Natural Languages has been developed by the Department of Software Engineering, University of Nijmegen. It is a formalism in which large context free grammars can be described in a compact way. AGFLs belong to the family of two level grammars, along with attribute grammars: a first, context-free level is augmented with set-valued features for expressing agreement between parts of speech.

The AGFL parser generation system for Natural Languages generates efficient parsers from AGFL grammars. It includes a lexicon system that is suitable for the large lexica needed in real life NLP applications.

Natural Language Processing (NLP) is an important enabling technology for future web-based applications: from filtering and narrowcasting to more intelligent search machines and services based on the automatic interpretation of the contents of documents. The state-of-the-art in search machines on the web is based mainly on the use of keywords and applying linguistic techniques to enhance recall. An example is the Linguistix software library, incorporated in commercial search machines like Altavista and Askjeeves, which performs tagging, lemmatization and fuzzy semantic matching.

Besides individual keywords, some use is also made of phrases, but this is mostly limited to those noun phrases which can easily be extracted. An important step forward in precision is to be expected from the use of more complicated linguistic phrases, including the verb phrase. Progress in this respect is hampered by the lack of parsers for natural languages, which can extract and normalize all phrases suitable for Information Retrieval applications with sufficient speed and precision.

Present day natural language parsing technologies are still of limited value to applications:

  • most sophisticated parsers have been developed for mechanical translation rather than for retrieval purposes
  • most parsers are developed using proprietary software, and few are in the public domain, so there is little synergy between projects;
  • parsing speeds are generally low in relation to the speed of the Internet.

That is why there is a need for tools that are available in the public domain, and suitable for the development of efficient parsers for Information Retrieval applications. The AGFL system is such a tool.

Project AGFL

Navigate projects