Research 11 min read

BM25 vs Embeddings: Why Keyword Search Still Wins for Product Data

ai.rs Jan 29, 2026

The Semantic Search Hype

Every AI tutorial tells you to use embedding search. Convert your data to vectors, store them in a vector database, and enjoy "semantic understanding" that keyword search can't match.

But when we deployed both approaches against a real product catalog, the results told a different story.

The Experiment

We tested both search approaches on a production catalog of 2,000+ products across multiple categories:

BM25 Setup:

  • Multi-field weighted index (name 3x, brand 2x, category 2x, description 1x)
  • Category alias mapping (natural language → taxonomy)
  • No ML model required

Embedding Setup:

  • Sentence transformer model (all-MiniLM-L6-v2)
  • Product descriptions encoded to 384-dimensional vectors
  • Cosine similarity search

Test queries: 200 real customer questions collected from user testing.

The Results

Metric BM25 Embeddings Winner
Exact product match (top 3) 92% 78% BM25
Category accuracy 95% 88% BM25
Price in results 100% 100% Tie
Handles vague queries 71% 85% Embeddings
Speed (per query) < 1ms 15-50ms BM25
Infrastructure needed None Embedding model BM25

BM25 won on the metrics that matter most for e-commerce: finding the right product and getting the category right.

Why BM25 Wins for Structured Data

1. Products Have Consistent Names

When a customer asks about "Merlot wine", the product is literally called "Merlot" in the database. There's no semantic gap to bridge.

Query: "Merlot wine"
BM25:  Finds "Merlot" (exact match, 100% confidence)
Embed: Finds "Merlot" (0.89 similarity) + "Pinot Noir" (0.85) + "Cabernet" (0.82)

BM25 gives a decisive match. Embeddings blur the boundaries between similar products.

2. Field Weighting Adds Context

With BM25, you can weight fields differently:

"stainless steel cookware"

BM25 with weights:
  - name (3x):     "Stainless Steel Pan" → HIGH score
  - category (2x):  "Cookware" → HIGH score
  - description (1x): mentions steel → LOW score

Result: Exact product match

Embeddings flatten everything into a single vector, losing this structural information.

3. Numbers and Codes Work Correctly

Query: "model X-500"
BM25:  Finds product X-500 (exact term match)
Embed: Finds products with similar descriptions (X-500 is meaningless to the embedding model)

BM25 handles SKUs, model numbers, prices, and alphanumeric codes that embeddings treat as noise.

Where Embeddings Win

Embeddings have a clear advantage for vague, intent-based queries:

Query: "something refreshing for a hot day"
BM25:  Matches on "refreshing" if it appears in descriptions
Embed: Understands the concept and finds light wines, sparkling water, citrus drinks
Query: "gift for someone who likes cooking"
BM25:  Matches on "gift" and "cooking" separately
Embed: Understands the gifting + cooking intent, finds cookware gift sets

If your customers frequently use vague, conversational language, embeddings add value.

The Hybrid Approach

The best production systems use both:

1. BM25 search → top 10 results (fast, precise)
2. If BM25 results < 3 → fallback to embedding search
3. Re-rank combined results by relevance

But here's the key insight: when you have an LLM in the pipeline, the model itself acts as a re-ranker.

The LLM receives the BM25 results and uses its own understanding to select the most relevant products for the response. You get semantic understanding from the LLM without needing a separate embedding search.

Cost of Each Approach

Component BM25 Embeddings
Runtime infrastructure Zero Embedding model (0.5-2 GB)
Per-query compute < 1ms CPU 15-50ms GPU
Index update Instant Re-encode modified products
Maintenance None Model version management
Dependencies Standard library torch, transformers, vector DB

BM25 adds zero complexity to your stack. It runs in any language with a standard library. No GPU, no model downloads, no version conflicts.

Implementation: BM25 in 30 Lines

A production BM25 search for product catalogs:

import math
from collections import Counter

class BM25:
    def __init__(self, documents, k1=1.5, b=0.75):
        self.k1, self.b = k1, b
        self.docs = documents
        self.avgdl = sum(len(d.split()) for d in documents) / len(documents)
        self.df = Counter()
        for doc in documents:
            for term in set(doc.lower().split()):
                self.df[term] += 1
        self.N = len(documents)

    def score(self, query, doc_idx):
        doc = self.docs[doc_idx].lower().split()
        doc_len = len(doc)
        tf = Counter(doc)
        score = 0
        for term in query.lower().split():
            if term not in self.df:
                continue
            idf = math.log((self.N - self.df[term] + 0.5) / (self.df[term] + 0.5) + 1)
            term_tf = tf.get(term, 0)
            score += idf * (term_tf * (self.k1 + 1)) / (term_tf + self.k1 * (1 - self.b + self.b * doc_len / self.avgdl))
        return score

    def search(self, query, top_k=5):
        scores = [(i, self.score(query, i)) for i in range(self.N)]
        return sorted(scores, key=lambda x: -x[1])[:top_k]

That's it. No dependencies, no model downloads, no GPU.

When to Use What

Scenario Recommendation
Structured product catalog BM25
FAQ / documentation search Embeddings
Multi-language product search BM25 + aliases
Conversational discovery Embeddings (or LLM re-ranking)
Real-time (< 5ms) BM25
Budget-constrained BM25
Unstructured knowledge base Embeddings

Our Recommendation

Start with BM25. It's simpler, faster, and more accurate for product search. Add multi-field weighting and category aliases for 90%+ accuracy.

Add embeddings later if analytics show customers frequently use vague, intent-based queries that BM25 can't handle.

Let the LLM do the heavy lifting. Your fine-tuned model already has semantic understanding. Feed it BM25 results and let it reason about which products best answer the customer's question.

Share: Post Share

Related Articles