Send in your ideas. Deadline December 1, 2024
Grant
Theme fund: NGI0 Commons Fund
Start: 2024-06
More projects like this
Verticals + Search
Data and AI

Re-isearch Schmate

Extending re-Isearch with a flat vector datatype for embeddings

Schmate is the development name for the evolving next iteration of re-Isearch adding vector datatypes for embeddings and applications like retrieval augmented generation (RAG). Schmate (pronounced "SHMAH-teh") is Yiddish for rag (שמאטע).

In contrast to typical vector stores the proposed re-Isearch+ shall offer a full passage information retrieval system (index and retrieval) using a combination of dense and sparse vectors as well as structure. It is dense passage retrieval (DPR) and a whole lot more. It addresses the stumbling blocks of chunking, has a tight integration of ingest, tokenisation, a number of alternative vector stores and similarity algorithms and, above all, uses a novel combination of understanding document structure (implicit and explicit) to provide a better contextual passage retrieval to solve the problem of misaligned context. This builds on the observation that meaning is also communicated through structure so needs to be viewed in the context of structure. Since structure like the words are meant by the sender (writer) to be received and understood (reader) our approach is to exploit the original author's organization of content to determine appropriate passages rather than relying solely on the chunks.

Run by NONMONOTONIC Networks / ExoDAO / Zimmermann & Zimmermann Forschungs GbR

Logo NLnet: abstract logo of four people seen from above Logo NGI Zero: letterlogo shaped like a tag

This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429. Additional funding is made available by the Swiss State Secretariat for Education, Research and Innovation (SERI).