Re-isearch Schmate

Extending re-Isearch with a flat vector datatype for embeddings

Schmate is the development name for the evolving next iteration of re-Isearch adding vector datatypes for embeddings and applications like retrieval augmented generation (RAG). Schmate (pronounced "SHMAH-teh") is Yiddish for rag (שמאטע).

In contrast to typical vector stores the proposed re-Isearch+ shall offer a full passage information retrieval system (index and retrieval) using a combination of dense and sparse vectors as well as structure. It is dense passage retrieval (DPR) and a whole lot more. It addresses the stumbling blocks of chunking, has a tight integration of ingest, tokenisation, a number of alternative vector stores and similarity algorithms and, above all, uses a novel combination of understanding document structure (implicit and explicit) to provide a better contextual passage retrieval to solve the problem of misaligned context. This builds on the observation that meaning is also communicated through structure so needs to be viewed in the context of structure. Since structure like the words are meant by the sender (writer) to be received and understood (reader) our approach is to exploit the original author's organization of content to determine appropriate passages rather than relying solely on the chunks.