AI Tutor Infrastructure for Universities
Higher education institutions are actively exploring large language models (LLMs) as part of their digital infrastructure. However, standalone systems such as ChatGPT are not designed for institutional use. They operate on general pretraining data, lack access to course-specific materials, and may generate answers that cannot be verified against the curriculum. This creates both pedagogical and regulatory challenges, particularly in European contexts where data governance and transparency are essential.
Recent research indicates that these limitations can be addressed through Retrieval-Augmented Generation (RAG). By connecting LLMs to authoritative sources—such as lecture notes, PDFs, and internal repositories—RAG enables responses that are grounded in verifiable material rather than generated from general knowledge alone. A systematic study (DOI: 10.3390/app15084234) shows that retrieval grounding significantly improves factual accuracy and reduces hallucinations in domain-specific applications. Complementary work such as TutorLLM (https://arxiv.org/abs/2502.15709) demonstrates that combining retrieval with student modeling leads to measurable improvements in learning outcomes and user satisfaction.
System Architecture
The system is designed for on-premise deployment within university infrastructure. This allows institutions to maintain full control over their data, ensure compliance with GDPR and related frameworks, and integrate directly with internal knowledge sources. Educational materials—including PDF documents, HTML content, and internal repositories—are indexed and made accessible through the search layer.
Building Institutional AI Tutors on Verifiable Knowledge
From an educational perspective, this architecture supports a more reliable form of AI-assisted learning. Students receive explanations grounded in their actual course materials, while instructors benefit from reduced repetitive workload and more consistent access to knowledge across large cohorts. At the institutional level, the combination of search and generation creates a unified interface over distributed educational content.
In this context, the transition from standalone LLMs to retrieval-augmented systems is not simply a technical improvement, but a necessary step toward integrating AI into formal education. Kavunka extends this paradigm by providing a transparent and deployable infrastructure, enabling universities to build AI tutors directly on top of their own knowledge base.