Ask ChatGPT "what wine with carbonara?" three times. You'll get three different answers. Sometimes it recommends Chardonnay. Sometimes Pinot Grigio. Sometimes it suggests a Pecorino (the wine, not the cheese -- though it has confused those too).
This isn't a bug. It's the fundamental nature of large language models: they generate plausible-sounding text, not deterministic answers. For creative writing, that's a feature. For recommending a specific wine to pair with a specific dish, it's a dealbreaker.
We built something different. The SommelierX Wine DNA algorithm is a deterministic flavour-matching system that calculates wine-food pairings across 17 food dimensions and 19 wine dimensions. Same input, same output. Every time. This post explains why we chose this approach, how it works, and where LLMs still play a role in our stack.
We ran an experiment. We asked ChatGPT-4, Claude, and Gemini the same 10 wine pairing questions, three times each. The results were illuminating:
| Metric | LLMs (avg) | Wine DNA |
|---|---|---|
| Consistency (same answer 3x) | 23% | 100% |
| Sommelier-validated accuracy | ~70% | 94% |
| Response time | 2-8 seconds | <50ms |
| Cost per query | $0.005-0.03 | ~$0.00001 |
| Explainability | Narrative (variable) | Score breakdown per dimension |
The LLMs weren't bad -- 70% accuracy is respectable. But the inconsistency is the real problem. A user asking the same question twice and getting different answers destroys trust. And in a domain where a professional sommelier's reputation depends on consistency, that's unacceptable.
Instead of asking a language model to generate a recommendation, we built a system that calculates one. Here's the architecture:
Every dish in the system is broken down into its constituent ingredients. A carbonara becomes: pasta (neutral), egg yolk (rich, fatty), pecorino (salty, umami), guanciale (smoky, salty, fatty), black pepper (spicy). Each ingredient has a pre-scored flavour profile across 17 dimensions.
The ingredient profiles are weighted and combined into a single dish vector. The weighting considers ingredient prominence (is it a main ingredient or a garnish?), cooking method impact (grilling adds smokiness, frying adds richness), and ingredient interactions (tomato + basil = more than the sum of parts).
Every wine style in our database (609 MetaWijn archetypes, representing every meaningful wine style in the world) has a 19-dimension flavour profile scored by our sommelier team. The matching algorithm calculates compatibility across all dimensions simultaneously, producing a match score from 0-100.
The choice between deterministic and generative AI isn't absolute -- it depends on the domain. Here's why wine pairing is firmly in the deterministic camp:
A sommelier who recommends Barolo with venison on Monday should recommend Barolo with venison on Tuesday. Consistency is a quality signal in this domain. Our algorithm provides it by design. LLMs provide it by chance.
When a user asks "why this wine?", we can show the exact dimension-by-dimension breakdown: "This Sangiovese scores 92% because its acidity (8/10) matches the tomato's acidity (7/10), its tannin (6/10) complements the protein density (5/10), and its herbal character (7/10) harmonises with the basil (8/10)."
An LLM generates a plausible-sounding explanation, but it's post-hoc narrative, not causal reasoning. It might say "Sangiovese pairs well because Italian wine goes with Italian food" -- which is a correlation, not a mechanism.
When our sommelier team disagrees with a recommendation, they can examine the exact scores and identify which dimension is miscalibrated. With an LLM, debugging a bad recommendation means... prompting differently and hoping for a better answer.
A pairing calculation takes less than 50 milliseconds and costs effectively nothing. An LLM call takes 2-8 seconds and costs $0.005-0.03. At scale (millions of pairing requests), the cost difference is the difference between a viable business and bankruptcy.
Our mobile app (iOS/Android) can calculate pairings offline using cached wine profiles. No internet required. LLMs require an API call -- no connection, no recommendation. For users browsing wine in a shop with poor connectivity, this matters.
We're not anti-LLM. We use them where they genuinely excel:
When a user pastes a recipe URL, we use an LLM to extract the ingredient list and cooking method from unstructured HTML. This is a perfect LLM task: understanding natural language in varied formats, where small variations in output are acceptable.
Users can photograph their dish, and a vision model identifies the food and its likely ingredients. This feeds into the deterministic algorithm, which then calculates the pairing. LLM for perception, algorithm for recommendation.
When a user types "what wine with my grandmother's Sunday roast with gravy and Yorkshire pudding?", an LLM parses this into structured ingredients that the algorithm can process. The LLM is the interface layer, not the recommendation engine.
Our stack looks like this:
User Input (text/URL/photo)
|
v
[LLM Layer] -- Parse, extract, identify
|
v
Structured Ingredients + Method
|
v
[Wine DNA Algorithm] -- Calculate match scores
|
v
Ranked Wine Recommendations (deterministic)
This gives us the best of both worlds: the flexibility of LLMs for input processing, and the reliability of a deterministic algorithm for the recommendation itself.
The algorithm is only as good as its data. Every ingredient profile and wine style profile in our system was scored by professional sommeliers with 10+ years of experience. This isn't crowdsourced data or scraped from the internet -- it's expert knowledge, structured into a queryable format.
The database currently contains:
We've opened the algorithm to developers and integrators:
npx @sommelierx/mcp-server -- integrate wine pairing into Claude, GPT, and other AI assistants via the Model Context ProtocolThe MCP server is particularly interesting for developers building AI assistants. Instead of having your LLM hallucinate wine recommendations, you can route the query to a deterministic algorithm and return a validated answer. Best of both worlds.
LLMs are extraordinary tools. They've transformed how we interact with computers, and they play a critical role in our stack. But they're not the right tool for every job.
Wine-food pairing is a domain-specific, precision-dependent, consistency-critical problem. It's exactly the kind of problem where a well-designed deterministic algorithm outperforms a general-purpose language model. Not because the LLM is bad, but because the requirements -- reproducibility, explainability, speed, cost, offline capability -- point to a different architecture.
The lesson generalises: before reaching for an LLM, ask yourself whether your problem requires generation or calculation. If the answer is calculation, you might be better off with a structured algorithm. If the answer is generation, LLMs are unbeatable. And if the answer is "both" -- as it is for us -- build a hybrid.
See how deterministic flavour matching compares to your favourite LLM. Same input, same output, every time.
Try SommelierX FreeLLMs like ChatGPT give different answers every time you ask. Ask "what wine with carbonara" three times and you'll get three different recommendations. For a domain where precision matters -- recommending a specific wine to pair with a specific dish -- this non-determinism is a dealbreaker. Our algorithm gives the same answer every time, because it's calculating, not generating.
The algorithm uses 17 food dimensions and 19 wine dimensions. Food dimensions include acidity, sweetness, umami, bitterness, fat/richness, spice/heat, smoke, herbs, and more. Wine dimensions include similar taste metrics plus tannin structure, fruit intensity, oak influence, and minerality. The match score is calculated across all dimensions simultaneously.
Yes. The SommelierX API is available at docs.sommelierx.com. You can also use our MCP server (npx @sommelierx/mcp-server) to integrate wine pairing into AI assistants and tools. The consumer app is available at app.sommelierx.com.
Every recipe is decomposed into individual ingredients, each with a known flavour profile. Even novel combinations are handled because the algorithm understands ingredient-level properties. A dish the system has never seen before is just a new combination of known flavour vectors -- the math works the same way.
More comparisons: View all articles