Research 11 min read

BM25 vs Embeddings: Why Keyword Search Still Wins for Product Data

ai.rs Jan 29, 2026

rag bm25 embeddings search benchmarks

The Semantic Search Hype

Every AI tutorial tells you to use embedding search. Convert your data to vectors, store them in a vector database, and enjoy "semantic understanding" that keyword search can't match.

But when we deployed both approaches against a real product catalog, the results told a different story.

The Experiment

We tested both search approaches on a production catalog of 2,000+ products across multiple categories:

BM25 Setup:

Multi-field weighted index (name 3x, brand 2x, category 2x, description 1x)
Category alias mapping (natural language → taxonomy)
No ML model required

Embedding Setup:

Sentence transformer model (all-MiniLM-L6-v2)
Product descriptions encoded to 384-dimensional vectors
Cosine similarity search

Test queries: 200 real customer questions collected from user testing.

The Results

Metric	BM25	Embeddings	Winner
Exact product match (top 3)	92%	78%	BM25
Category accuracy	95%	88%	BM25
Price in results	100%	100%	Tie
Handles vague queries	71%	85%	Embeddings
Speed (per query)	< 1ms	15-50ms	BM25
Infrastructure needed	None	Embedding model	BM25

BM25 won on the metrics that matter most for e-commerce: finding the right product and getting the category right.

Why BM25 Wins for Structured Data

1. Products Have Consistent Names

When a customer asks about "Merlot wine", the product is literally called "Merlot" in the database. There's no semantic gap to bridge.

Query: "Merlot wine"
BM25:  Finds "Merlot" (exact match, 100% confidence)
Embed: Finds "Merlot" (0.89 similarity) + "Pinot Noir" (0.85) + "Cabernet" (0.82)

BM25 gives a decisive match. Embeddings blur the boundaries between similar products.

2. Field Weighting Adds Context

With BM25, you can weight fields differently:

"stainless steel cookware"

BM25 with weights:
  - name (3x):     "Stainless Steel Pan" → HIGH score
  - category (2x):  "Cookware" → HIGH score
  - description (1x): mentions steel → LOW score

Result: Exact product match

Embeddings flatten everything into a single vector, losing this structural information.

3. Numbers and Codes Work Correctly

Query: "model X-500"
BM25:  Finds product X-500 (exact term match)
Embed: Finds products with similar descriptions (X-500 is meaningless to the embedding model)

BM25 handles SKUs, model numbers, prices, and alphanumeric codes that embeddings treat as noise.

Where Embeddings Win

Embeddings have a clear advantage for vague, intent-based queries:

Query: "something refreshing for a hot day"
BM25:  Matches on "refreshing" if it appears in descriptions
Embed: Understands the concept and finds light wines, sparkling water, citrus drinks

Query: "gift for someone who likes cooking"
BM25:  Matches on "gift" and "cooking" separately
Embed: Understands the gifting + cooking intent, finds cookware gift sets

If your customers frequently use vague, conversational language, embeddings add value.

The Hybrid Approach

The best production systems use both:

1. BM25 search → top 10 results (fast, precise)
2. If BM25 results < 3 → fallback to embedding search
3. Re-rank combined results by relevance

But here's the key insight: when you have an LLM in the pipeline, the model itself acts as a re-ranker.

The LLM receives the BM25 results and uses its own understanding to select the most relevant products for the response. You get semantic understanding from the LLM without needing a separate embedding search.

Cost of Each Approach

Component	BM25	Embeddings
Runtime infrastructure	Zero	Embedding model (0.5-2 GB)
Per-query compute	< 1ms CPU	15-50ms GPU
Index update	Instant	Re-encode modified products
Maintenance	None	Model version management
Dependencies	Standard library	torch, transformers, vector DB

BM25 adds zero complexity to your stack. It runs in any language with a standard library. No GPU, no model downloads, no version conflicts.

Implementation: BM25 in 30 Lines

A production BM25 search for product catalogs:

import math
from collections import Counter

class BM25:
    def __init__(self, documents, k1=1.5, b=0.75):
        self.k1, self.b = k1, b
        self.docs = documents
        self.avgdl = sum(len(d.split()) for d in documents) / len(documents)
        self.df = Counter()
        for doc in documents:
            for term in set(doc.lower().split()):
                self.df[term] += 1
        self.N = len(documents)

    def score(self, query, doc_idx):
        doc = self.docs[doc_idx].lower().split()
        doc_len = len(doc)
        tf = Counter(doc)
        score = 0
        for term in query.lower().split():
            if term not in self.df:
                continue
            idf = math.log((self.N - self.df[term] + 0.5) / (self.df[term] + 0.5) + 1)
            term_tf = tf.get(term, 0)
            score += idf * (term_tf * (self.k1 + 1)) / (term_tf + self.k1 * (1 - self.b + self.b * doc_len / self.avgdl))
        return score

    def search(self, query, top_k=5):
        scores = [(i, self.score(query, i)) for i in range(self.N)]
        return sorted(scores, key=lambda x: -x[1])[:top_k]

That's it. No dependencies, no model downloads, no GPU.

When to Use What

Scenario	Recommendation
Structured product catalog	BM25
FAQ / documentation search	Embeddings
Multi-language product search	BM25 + aliases
Conversational discovery	Embeddings (or LLM re-ranking)
Real-time (< 5ms)	BM25
Budget-constrained	BM25
Unstructured knowledge base	Embeddings

Our Recommendation

Start with BM25. It's simpler, faster, and more accurate for product search. Add multi-field weighting and category aliases for 90%+ accuracy.

Add embeddings later if analytics show customers frequently use vague, intent-based queries that BM25 can't handle.

Let the LLM do the heavy lifting. Your fine-tuned model already has semantic understanding. Feed it BM25 results and let it reason about which products best answer the customer's question.

Wondering if this fits your business?

The benchmarks and architectures we cover here power real production AI assistants. See where your business stands in 2 minutes.

Take the AI Readiness Check

Share: Post Share