What is RAG, and why do AI systems use it?
RAG, retrieval-augmented generation, is a pattern where the system fetches relevant documents at query time and hands them to the model along with the question, so the answer is grounded in real sources instead of the model's memory. It exists to fix two problems with a bare language model: it doesn't know your private or current data, and it makes things up when it doesn't know. RAG gives the model the right context to read before it answers.
The problem RAG solves
A language model only knows what it saw in training. It can’t see your company’s documents, last week’s data, or anything behind your login. Ask it about those and it will either refuse or, worse, invent a confident answer. RAG closes that gap by giving the model the source material at the moment you ask.
How it works
Three steps. First, your documents are split into chunks and turned into embeddings — numerical representations stored in a vector database. Second, when a question comes in, the system embeds it too and retrieves the chunks closest in meaning. Third, those chunks are added to the prompt, and the model answers using them. The model isn’t recalling the answer; it’s reading it.
Why it beats fine-tuning for fresh knowledge
Teaching a model new facts by fine-tuning is slow, expensive, and goes stale the moment the data changes. RAG keeps knowledge in a store you can update any time — add a document and the next query can use it, no retraining. That makes it the default way to put current, private, or fast-changing information in front of a model.
Where it falls short
RAG is only as good as what it retrieves. If the search returns the wrong chunks, the model answers from bad context, and a grounded-looking answer can still be wrong. That’s why retrieval quality, not just the model, decides whether a RAG system works — and why evaluating the retrieval step separately matters.
From the conversation
This explainer is drawn from these episodes — each carries its full transcript.