RAG (Retrieval-Augmented Generation)
Definition
A technique that enhances LLM responses by first retrieving relevant documents from an external knowledge base and including them in the prompt as context.
RAG addresses two major LLM limitations: outdated training data and hallucination. Instead of relying solely on the model's parametric knowledge, RAG systems retrieve relevant documents from a vector database or search index and inject them into the prompt, grounding the model's response in specific sources. A typical RAG pipeline involves: (1) converting documents into embeddings and storing them in a vector database, (2) embedding the user's query and finding the most similar documents, (3) including retrieved documents in the LLM prompt, and (4) generating a response with citations. RAG is the most popular approach for building enterprise AI applications because it provides up-to-date information, reduces hallucination, and allows companies to leverage their proprietary data without expensive fine-tuning.
Related Terms
Embedding
A learned dense vector representation that maps discrete data like words, tokens, or items into cont...
Hallucination
When an AI model generates plausible-sounding but factually incorrect, fabricated, or unsupported in...
Large Language Model
A neural network with billions of parameters trained on massive text datasets, capable of understand...
Prompt Engineering
The practice of crafting effective instructions and inputs for AI models to elicit desired outputs, ...