Fundamentals 10 min read

What is RAG and Why Your AI Needs It

ai.rs Jan 15, 2026

rag search bm25 embeddings fundamentals

The Knowledge Problem

Every LLM has a fundamental limitation: it only knows what was in its training data. Ask it about your product catalog, and it will either hallucinate an answer or admit it doesn't know.

Fine-tuning helps — the model can learn patterns about your products, your brand voice, and how to handle edge cases. But fine-tuning has a memorization ceiling. A typical LoRA adapter with 174 million trainable parameters can reliably memorize 500-1,000 specific products. Beyond that, details start bleeding together.

Catalog Size	Fine-tuning Alone	With RAG
500 products	Good accuracy	Perfect accuracy
2,000 products	Declining accuracy	Perfect accuracy
10,000 products	Pattern-only	Perfect accuracy
100,000+ products	Same as 10K	Perfect accuracy

RAG eliminates this ceiling entirely.

How RAG Works

Retrieval-Augmented Generation is a two-step process:

Retrieve — When a user asks a question, search your database for relevant information
Generate — Inject the retrieved information into the prompt, then let the LLM generate a response

User: "What red wines do you have under $30?"
    ↓
[RETRIEVE] Search product database → 5 matching wines
    ↓
[AUGMENT] Add products to prompt context
    ↓
Prompt: "Given these products: [Wine A $25, Wine B $28, ...]
         Answer the customer's question: What red wines under $30?"
    ↓
[GENERATE] LLM creates natural response with accurate data

The model never needs to "remember" your catalog. It receives the relevant data fresh with every query.

The Search Engine: BM25 vs. Embeddings

The retrieval step needs a search engine. Two main approaches:

BM25 (Keyword Search)

Classic term-frequency matching. If the user says "red wine", it finds products containing those exact words.

Pros: Fast (< 1ms), no GPU needed, no model to maintain, deterministic Cons: Misses synonyms, can't understand intent

Embedding Search (Semantic)

Converts text to vectors and finds similar meanings. "Something fruity for summer" matches light wines even without exact keyword overlap.

Pros: Understands meaning, handles vague queries Cons: Requires an embedding model, slower, less predictable

The Surprising Result

In production testing with structured product data, BM25 consistently outperformed embedding search:

Metric	BM25	Embeddings
Exact product matches	92%	78%
Price accuracy	100%	100%
Speed	< 1ms	15-50ms
Infrastructure	None	Embedding model required

Why? Structured product catalogs have consistent naming conventions. When someone asks about "Cabernet Sauvignon", the product is literally called "Cabernet Sauvignon" in the database. Keyword matching finds it instantly.

Embeddings shine with unstructured data (documents, articles, support tickets) where concepts matter more than exact terms.

Building an Effective RAG Pipeline

Step 1: Multi-field Weighted Indexing

Don't just search product descriptions. Weight fields by importance:

name:     3x weight (most important)
brand:    2x weight  
category: 2x weight
style:    2x weight
taste:    1x weight
description: 1x weight

This ensures "Merlot" matches the wine category before matching a description that mentions merlot in passing.

Step 2: Category Alias Mapping

Users don't always use your exact category names:

"white wine"    → search: white wine category
"something sweet" → search: dessert wines, sweet cocktails
"bitter"        → search: IPAs, amaro, bitters
"for cooking"   → search: cooking wines, olive oils

Map natural language to your taxonomy before searching.

Step 3: Rich Context Injection

Don't just pass product names and prices. Include enough context for the model to give helpful answers:

{
  "name": "Château Margaux 2018",
  "price": "$89.99",
  "category": "Red Wine",
  "style": "Full-bodied Bordeaux",
  "taste": "Dark fruit, tobacco, elegant tannins",
  "food_pairing": "Grilled lamb, aged cheese",
  "description": "A structured yet approachable vintage..."
}

This gives the model 60-90 tokens of context per product — enough to make informed recommendations.

Step 4: Context Window Management

Most 8B models have 8K-32K token context windows. Budget your context carefully:

Component	Tokens
System prompt	200-400
RAG results (5 products)	300-450
Conversation history	500-2,000
Response space	200-500
Total	1,200-3,350

Plenty of room for accurate retrieval without hitting context limits.

RAG Solves the Update Problem

Perhaps RAG's biggest advantage: zero-retraining updates.

New product added? Update the database. RAG finds it immediately.
Price changed? Update the database. The next query returns the new price.
Product discontinued? Remove from database. It's gone instantly.

Fine-tuning requires retraining (5 hours) to learn new information. RAG reflects changes in milliseconds.

The Combined Architecture

The optimal production setup uses both fine-tuning and RAG:

Component	Responsibility
Fine-tuning	Brand voice, conversation style, safety guardrails, domain reasoning
RAG	Exact product data, prices, availability, specifications

Fine-tuning teaches the model how to think and talk about your domain. RAG gives it the specific facts to think and talk about.

Together, you get an AI assistant that sounds like an expert and never gets the details wrong.

Getting Started with RAG

Export your product catalog to JSON (name, price, category, description)
Build a search index using BM25 (a few lines of code in most languages)
Write a retrieval function that searches the index and returns top 3-5 matches
Inject results into the prompt before the user's question
Test and iterate — add category aliases and field weights as needed

The entire RAG pipeline can be built in a day. No ML expertise required. No additional GPU needed. Just a search index and a well-structured prompt.

See RAG in a real business context. How it works — live pricing, instant product updates, no retraining.

Ready to put this into practice?

Understanding the fundamentals is one thing — building something for your business is another. See where you stand.

Take the AI Readiness Check

Share: Post Share