Send in your ideas. Deadline December 1, 2025
Grant
Theme fund: NGI0 Commons Fund
Start: 2025-11
More projects like this
Data and AI

Spacylize

Use LLMs to train more efficient and reliable NLP models

Small, task-specific language models remain essential for efficient, interpretable and privacy-preserving NLP, even as large language models dominate the field. Spacylize enables the distillation of LLM capabilities into compact spaCy models by generating, validating, and iteratively refining training data to improve model performance. The software can be used both through a simple command-line interface and as a Python library, allowing seamless integration into diverse workflows. By automating LLM-based data creation for tasks such as named entity recognition and text classification, Spacylize strengthens the spaCy ecosystem and supports sustainable, open-source NLP development.

    Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

    This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).