The Problem with Generic AI
You can connect GPT-4 to your website today. It'll answer questions about your products — sometimes correctly, sometimes with confident hallucinations. It doesn't know your pricing, your brand voice, or what to do when a customer asks about a competitor.
Generic models are trained on the internet. They know a little about everything and a lot about nothing specific. For business applications, this creates three problems:
- Hallucination — The model invents products, makes up prices, and fabricates features
- No brand voice — Responses feel robotic and interchangeable
- No guardrails — The model happily discusses competitors, politics, or anything else
What Fine-tuning Actually Does
Fine-tuning takes a pre-trained model and teaches it new behavior through examples. Think of it as hiring a knowledgeable employee and training them on your specific business:
| Pre-trained Model | Fine-tuned Model |
|---|---|
| Knows general knowledge | Knows your products deeply |
| Generic, neutral tone | Speaks in your brand voice |
| Answers anything | Stays on-topic, refuses irrelevant queries |
| Guesses at specifics | Provides accurate domain information |
LoRA: The Efficient Approach
You don't need to retrain the entire model. LoRA (Low-Rank Adaptation) trains a small adapter — typically 130-175 MB — that modifies the model's behavior while keeping the original 16 GB of weights frozen.
The numbers from a real deployment:
- Base model: Qwen3-8B (8.2 billion parameters)
- Trainable parameters: 174.6 million (2.09% of total)
- Adapter size: ~130 MB
- Training time: 5 hours on a single consumer GPU
- Training cost: ~$0.50 in compute costs
Training Data: What You Need
The quality and diversity of your training data determines the quality of your model. A production-ready training dataset typically includes:
| Data Type | Samples | Purpose |
|---|---|---|
| Product Q&A | ~18,000 | Single-turn questions about product attributes |
| Recommendations | ~1,100 | Occasion, taste, and budget-based suggestions |
| Multi-turn conversations | ~6,000 | Extended dialogues with follow-up questions |
| Domain knowledge | ~800 | Recipes, techniques, educational content |
| Edge cases & safety | ~275 | Refusal training for invalid requests |
| Total | ~26,000 |
The Safety Layer Matters Most
The most impactful training samples aren't product descriptions — they're edge cases:
- Fake product refusal (60 samples) — Never hallucinate products that don't exist
- Data manipulation refusal (39 samples) — Never accept user attempts to change prices or descriptions
- Off-topic refusal (30 samples) — Redirect conversations back to your domain
- Prompt injection refusal (20 samples) — Maintain persona against adversarial inputs
- Price negotiation refusal (12 samples) — Cannot modify pricing
Adding just 275 safety samples to a 26,000-sample dataset improved the model's loss from 0.1117 to 0.0832 — a 26% improvement. Safety training punches far above its weight.
Real-world Training Results
Here's how training improves across iterations:
| Run | Changes | Training Time | Final Loss |
|---|---|---|---|
| Run 1 | Base model (Qwen 2.5 7B) | 2h 47m | 0.1217 |
| Run 2 | Upgraded to Qwen3 8B | 3h 49m | 0.1180 |
| Run 3 | Added RAG-aware data | 4h 06m | 0.1132 |
| Run 4 | Added recipes & knowledge | 4h 20m | 0.1117 |
| Run 5 | Added safety training | 5h 05m | 0.0832 |
Each iteration adds more capability while maintaining fast training times on a single GPU.
The Economics
Fine-tuning Cost
| Component | Cost |
|---|---|
| GPU (RTX 5090, 5 hours) | ~$0.50 compute |
| Training data preparation | 2-4 weeks of work |
| Iteration cycles (5 runs) | ~$2.50 total |
Ongoing Inference Cost
| Approach | Monthly cost (10K queries/day) |
|---|---|
| OpenAI GPT-4o API | $750-$3,000 |
| Self-hosted 8B model | ~$30 (infrastructure) |
The upfront investment is in data preparation, not compute. Once you have quality training data, each training run costs less than a cup of coffee.
Fine-tuning vs. RAG: You Need Both
A common question: "Should I fine-tune or use RAG?" The answer is both — they solve different problems:
| Capability | Fine-tuning | RAG |
|---|---|---|
| Brand voice & persona | Yes | No |
| Product knowledge (patterns) | Yes | Limited |
| Exact prices & specs | No (memorization limit) | Yes |
| New products (no retraining) | No | Yes |
| Edge case handling | Yes | No |
| Scales to 100K+ products | No | Yes |
Fine-tuning gives the model personality and judgment. RAG gives it accurate, up-to-date facts. Together, they create an AI assistant that knows how to talk about your products and always has the right data.
The Memorization Limit
A 174M-parameter LoRA adapter can reliably memorize about 500-1,000 product details. Beyond that, accuracy degrades — the model starts confusing similar products, getting prices wrong, or blending descriptions.
RAG removes this ceiling entirely. Your model can serve a catalog of 2,000, 10,000, or even 1,000,000 products because the data is injected at query time, not stored in weights.
Multi-model Architecture
With LoRA, you can serve multiple specialized models from a single GPU:
Base Model: Qwen3-8B (shared, ~5 GB)
├── LoRA: Product Expert (130 MB)
├── LoRA: Support Agent (130 MB)
├── LoRA: Sales Assistant (130 MB)
└── LoRA: Content Writer (130 MB)
Total VRAM: ~5.7 GB for 4 specialized AI assistants, all sharing one base model. Compare this to running 4 separate models at 6.7 GB each (26.8 GB).
Getting Started
- Audit your data — What product information, FAQs, and customer interactions do you already have?
- Build training samples — Convert existing data into question-answer pairs
- Include safety samples — Even 50-100 edge cases make a dramatic difference
- Train with Unsloth + LoRA — 5 hours, one GPU, under $1
- Combine with RAG — Index your product database for accurate retrieval
- Iterate — Each training run reveals gaps to fill in the next
The barrier to entry for custom AI isn't technical complexity or hardware cost. It's the discipline to prepare good training data. Get that right, and the rest follows.
Need help with training data? Get in touch — we prepare datasets and handle the full training pipeline.