Home/Blog/Wine pairing algorithm vs ChatGPT

How We Built a Wine Pairing Algorithm That Outperforms ChatGPT

By SommelierX Team · March 19, 2026 · 9 min read

Ask ChatGPT "what wine with carbonara?" three times. You'll get three different answers. Sometimes it recommends Chardonnay. Sometimes Pinot Grigio. Sometimes it suggests a Pecorino (the wine, not the cheese -- though it has confused those too).

This isn't a bug. It's the fundamental nature of large language models: they generate plausible-sounding text, not deterministic answers. For creative writing, that's a feature. For recommending a specific wine to pair with a specific dish, it's a dealbreaker.

We built something different. The SommelierX Wine DNA algorithm is a deterministic flavour-matching system that calculates wine-food pairings across 17 food dimensions and 19 wine dimensions. Same input, same output. Every time. This post explains why we chose this approach, how it works, and where LLMs still play a role in our stack.

The Problem with LLM Wine Advice

We ran an experiment. We asked ChatGPT-4, Claude, and Gemini the same 10 wine pairing questions, three times each. The results were illuminating:

Metric LLMs (avg) Wine DNA
Consistency (same answer 3x) 23% 100%
Sommelier-validated accuracy ~70% 94%
Response time 2-8 seconds <50ms
Cost per query $0.005-0.03 ~$0.00001
Explainability Narrative (variable) Score breakdown per dimension

The LLMs weren't bad -- 70% accuracy is respectable. But the inconsistency is the real problem. A user asking the same question twice and getting different answers destroys trust. And in a domain where a professional sommelier's reputation depends on consistency, that's unacceptable.

The Wine DNA Approach: Structured Flavour Dimensions

Instead of asking a language model to generate a recommendation, we built a system that calculates one. Here's the architecture:

Step 1: Decompose Every Dish into Ingredients

Every dish in the system is broken down into its constituent ingredients. A carbonara becomes: pasta (neutral), egg yolk (rich, fatty), pecorino (salty, umami), guanciale (smoky, salty, fatty), black pepper (spicy). Each ingredient has a pre-scored flavour profile across 17 dimensions.

Step 2: Calculate the Dish's Composite Flavour Profile

The ingredient profiles are weighted and combined into a single dish vector. The weighting considers ingredient prominence (is it a main ingredient or a garnish?), cooking method impact (grilling adds smokiness, frying adds richness), and ingredient interactions (tomato + basil = more than the sum of parts).

Step 3: Match Against Wine Style Profiles

Every wine style in our database (609 MetaWijn archetypes, representing every meaningful wine style in the world) has a 19-dimension flavour profile scored by our sommelier team. The matching algorithm calculates compatibility across all dimensions simultaneously, producing a match score from 0-100.

The 17 Food Dimensions

The 19 Wine Dimensions

Why Deterministic Beats Generative (For This Domain)

The choice between deterministic and generative AI isn't absolute -- it depends on the domain. Here's why wine pairing is firmly in the deterministic camp:

1. Reproducibility

A sommelier who recommends Barolo with venison on Monday should recommend Barolo with venison on Tuesday. Consistency is a quality signal in this domain. Our algorithm provides it by design. LLMs provide it by chance.

2. Explainability

When a user asks "why this wine?", we can show the exact dimension-by-dimension breakdown: "This Sangiovese scores 92% because its acidity (8/10) matches the tomato's acidity (7/10), its tannin (6/10) complements the protein density (5/10), and its herbal character (7/10) harmonises with the basil (8/10)."

An LLM generates a plausible-sounding explanation, but it's post-hoc narrative, not causal reasoning. It might say "Sangiovese pairs well because Italian wine goes with Italian food" -- which is a correlation, not a mechanism.

3. Auditability

When our sommelier team disagrees with a recommendation, they can examine the exact scores and identify which dimension is miscalibrated. With an LLM, debugging a bad recommendation means... prompting differently and hoping for a better answer.

4. Cost and Speed

A pairing calculation takes less than 50 milliseconds and costs effectively nothing. An LLM call takes 2-8 seconds and costs $0.005-0.03. At scale (millions of pairing requests), the cost difference is the difference between a viable business and bankruptcy.

5. Offline Capability

Our mobile app (iOS/Android) can calculate pairings offline using cached wine profiles. No internet required. LLMs require an API call -- no connection, no recommendation. For users browsing wine in a shop with poor connectivity, this matters.

Where LLMs Are Brilliant (And We Use Them)

We're not anti-LLM. We use them where they genuinely excel:

Recipe URL Extraction

When a user pastes a recipe URL, we use an LLM to extract the ingredient list and cooking method from unstructured HTML. This is a perfect LLM task: understanding natural language in varied formats, where small variations in output are acceptable.

Photo Recognition

Users can photograph their dish, and a vision model identifies the food and its likely ingredients. This feeds into the deterministic algorithm, which then calculates the pairing. LLM for perception, algorithm for recommendation.

Natural Language Queries

When a user types "what wine with my grandmother's Sunday roast with gravy and Yorkshire pudding?", an LLM parses this into structured ingredients that the algorithm can process. The LLM is the interface layer, not the recommendation engine.

The Hybrid Architecture

Our stack looks like this:

User Input (text/URL/photo)
    |
    v
[LLM Layer] -- Parse, extract, identify
    |
    v
Structured Ingredients + Method
    |
    v
[Wine DNA Algorithm] -- Calculate match scores
    |
    v
Ranked Wine Recommendations (deterministic)

This gives us the best of both worlds: the flexibility of LLMs for input processing, and the reliability of a deterministic algorithm for the recommendation itself.

The Data: 10+ Years of Sommelier Research

The algorithm is only as good as its data. Every ingredient profile and wine style profile in our system was scored by professional sommeliers with 10+ years of experience. This isn't crowdsourced data or scraped from the internet -- it's expert knowledge, structured into a queryable format.

The database currently contains:

Try It Yourself

We've opened the algorithm to developers and integrators:

The MCP server is particularly interesting for developers building AI assistants. Instead of having your LLM hallucinate wine recommendations, you can route the query to a deterministic algorithm and return a validated answer. Best of both worlds.

Conclusion: Right Tool for the Right Job

LLMs are extraordinary tools. They've transformed how we interact with computers, and they play a critical role in our stack. But they're not the right tool for every job.

Wine-food pairing is a domain-specific, precision-dependent, consistency-critical problem. It's exactly the kind of problem where a well-designed deterministic algorithm outperforms a general-purpose language model. Not because the LLM is bad, but because the requirements -- reproducibility, explainability, speed, cost, offline capability -- point to a different architecture.

The lesson generalises: before reaching for an LLM, ask yourself whether your problem requires generation or calculation. If the answer is calculation, you might be better off with a structured algorithm. If the answer is generation, LLMs are unbeatable. And if the answer is "both" -- as it is for us -- build a hybrid.

Try the Wine DNA algorithm

See how deterministic flavour matching compares to your favourite LLM. Same input, same output, every time.

Try SommelierX Free

Frequently Asked Questions

Why not just use ChatGPT for wine pairing?

LLMs like ChatGPT give different answers every time you ask. Ask "what wine with carbonara" three times and you'll get three different recommendations. For a domain where precision matters -- recommending a specific wine to pair with a specific dish -- this non-determinism is a dealbreaker. Our algorithm gives the same answer every time, because it's calculating, not generating.

How many flavour dimensions does the Wine DNA algorithm use?

The algorithm uses 17 food dimensions and 19 wine dimensions. Food dimensions include acidity, sweetness, umami, bitterness, fat/richness, spice/heat, smoke, herbs, and more. Wine dimensions include similar taste metrics plus tannin structure, fruit intensity, oak influence, and minerality. The match score is calculated across all dimensions simultaneously.

Is the SommelierX API available for developers?

Yes. The SommelierX API is available at docs.sommelierx.com. You can also use our MCP server (npx @sommelierx/mcp-server) to integrate wine pairing into AI assistants and tools. The consumer app is available at app.sommelierx.com.

How does the algorithm handle recipes it has never seen?

Every recipe is decomposed into individual ingredients, each with a known flavour profile. Even novel combinations are handled because the algorithm understands ingredient-level properties. A dish the system has never seen before is just a new combination of known flavour vectors -- the math works the same way.