What is agentic RAG, and how is it different from regular RAG?
Traditional RAG runs one fixed retrieve-then-generate step: fetch documents that match the query, stuff them in the prompt, answer. Agentic RAG puts an agent in charge of retrieval — it decides whether to search, reformulates the query, pulls from multiple sources, checks whether what it got is good enough, and retrieves again if it isn't. The difference is a static pipeline versus a control loop.
Traditional RAG: one shot at retrieval
Classic retrieval-augmented generation is a straight line. A query comes in, an embedding search returns the closest chunks, and the model answers using whatever came back. It is fast and cheap, and it works well when the answer lives in one obvious place. Its weakness is that it never second-guesses the retrieval. If the first search misses, the model answers from bad context and you get a confident wrong answer.
Agentic RAG: retrieval as a decision
Agentic RAG wraps retrieval in an agent’s reasoning loop. The agent can decide whether a question even needs a lookup, rewrite a vague query into a better one, query several stores or tools, and grade the results before answering. If the context is thin, it retrieves again with a different approach. Retrieval stops being a fixed step and becomes a series of decisions the agent makes toward an answer.
When the trade-off is worth it
The loop costs more — more model calls, more latency, more that can go wrong. For simple lookups, plain RAG wins. Agentic RAG earns its keep on multi-part questions, sources that need to be combined, and cases where a wrong answer is expensive enough to justify the extra checking.
From the conversation
This explainer is drawn from these episodes — each carries its full transcript.