Glossary

What is RAG (Retrieval-Augmented Generation)?

Retrieval-Augmented Generation: an architecture that grounds a language model in your own documents so answers come from your data, not generic web text.

RAG (Retrieval-Augmented Generation) is an architecture that grounds a language model in your own documents. When a user asks a question, the system first retrieves the most relevant passages from your data (using vector search or keyword search), then feeds those passages into the model along with the question. The answer cites your data, not the model's general training. RAG is the standard pattern for "ask my company knowledge base" applications.

How RAG works

The setup has two phases. Indexing: documents are chunked into passages, each passage is converted to a numerical vector (an embedding), and the vectors are stored in a vector database. Querying: the user's question is embedded with the same model, the database returns the closest matching passages, and those passages plus the question are sent to the language model. The model writes an answer that quotes or paraphrases the retrieved passages.

When RAG is the right tool

RAG fits any case where the answer must come from your content, not the model's general knowledge: SOPs, policies, contracts, support history, technical documentation, sales playbooks. It also fits when the data updates frequently, since you re-index without retraining the model. RAG is cheaper than fine-tuning for most business cases and gives auditable citations back to the source document.

When RAG is not enough

RAG handles "lookup" questions well; it handles reasoning across many documents poorly. If the answer requires synthesizing 50 sources, retrieval will miss some. If the answer requires consistent voice or style transformation across all outputs, fine-tuning may be a better fit. For mathematical or symbolic reasoning, neither RAG nor fine-tuning helps; you need tool use (an agent calling a calculator).

Related terms

AI Agent ยท MojoAI services

← Back to glossary