SQL That Reads Between the Lines

We analyzed 11 years of restaurant reviews with DuckDB aggregate functions that call Claude. The entire pipeline is SQL.
2,392
Reviews Analyzed
$3
Total LLM Cost
1
SQL Query
See the Dashboard → 🚜 Query.Farm
Operational themes timeline showing strengths and issues over 11 years
Commander's Palace, New Orleans — 133 months of Yelp reviews turned into an operational intelligence dashboard

What If GROUP BY Could Think?

DuckDB is great at aggregating numbers. SUM, AVG, COUNT — these work on structured data. But what about thousands of free-text reviews?

We built aggregate functions that call Claude during the GROUP BY. Text goes in, structured intelligence comes out:

SELECT month,
       qf_llm_summarize(
           review_text,
           'Analyze for the owner. Return JSON with
            momentum, strengths, issues, actions.'
       )
FROM reviews
GROUP BY month;

That's it. 133 months of reviews → 133 structured analyses with scores, severity-rated issues, and action items. Each month's reviews get accumulated, then Claude reads them all and produces a JSON assessment.

Operational themes timeline showing strengths and issues over 11 years
Every dot is a month. Green = strength, red = issue. Dot size = severity. Grouped by theme — all generated from SQL.

The Hard Part: 700 Labels, One Taxonomy

When you ask an LLM to categorize 133 months of reviews independently, you get ~700 unique labels. "weekend-brunch-slow-pacing", "brunch-service-delays", "sunday-brunch-understaffed" — all the same issue.

Our second aggregate function, qf_llm_distill, fixes this in one call:

SELECT qf_llm_distill(label, [30, 6])
FROM all_labels;

700 labels → 30 categories → 6 themes. Every label mapped. The function chunks, normalizes, consolidates, and validates internally — re-submitting any labels the LLM missed, merging groups that overshoot the target.

Category balance chart showing strengths vs issues by theme
Strengths (green) vs issues (red) by category, grouped into themes that emerged from the data.

The Pattern

It's the same mechanism at every level. Reviews become summaries. Labels become categories. Categories become themes. GROUP BY handles the structure, the LLM handles the meaning.

SQL already knows how to group data. Now the LLM knows how to make sense of each group.

Under the Hood

See the Python behind qf_llm_summarize
class SummarizeFunction(AggregateFunction[SummarizeState]):

    class Meta:
        name = "qf_llm_summarize"

    def update(cls, states, group_ids, text, prompt):
        # Accumulate text. No LLM calls.
        for i in range(len(group_ids)):
            s = states[group_ids[i].as_py()]
            states[gid] = SummarizeState(
                texts=s.texts + "\n---\n" + text[i].as_py(),
                prompt=prompt[i].as_py(),
            )

    def combine(cls, source, target, params):
        # Merge parallel workers
        return SummarizeState(
            texts=target.texts + "\n---\n" + source.texts)

    def finalize(cls, group_ids, states, params):
        # Call Claude — once per group
        results = []
        for gid in group_ids:
            s = states[gid.as_py()]
            resp = anthropic.Anthropic().messages.create(
                model="claude-sonnet-4-20250514",
                messages=[{"content": s.prompt + s.texts}])
            results.append(resp.content[0].text)
        return pa.record_batch({"result": pa.array(results)})

DuckDB handles GROUP BY, parallelism, and memory. Your Python just defines what happens to each group.

See the full pipeline

Step 1: DuckDB reads 5GB of Yelp JSON, samples 20 reviews per month → Parquet

Step 2: GROUP BY month + qf_llm_summarize → 133 structured JSON analyses with momentum, severity-rated strengths/issues, and action items

Step 3: qf_llm_distill(label, [30, 6]) → one SQL call turns ~700 messy labels into a 3-level taxonomy (labels → 30 categories → 6 themes), with validation ensuring every label is mapped

Step 4: Jinja2 renders the HTML dashboard from the JSON. Total: ~$3, ~150 LLM calls.

How qf_llm_distill validates every label is mapped

After each level, the function checks: did every input get a mapping? If labels were skipped, it re-submits just the missing ones with the existing categories as context. If there are too many groups, a consolidation pass merges the smallest. After 3 repair attempts, unmapped items go to "other" with a warning. The guarantee: every input label has a complete path through all levels.

Try It