In February 2024, the Berkeley AI Research Lab published a paper that quietly explained everything. Not “how to build AI” — but why the move from single LLM calls to multi-component systems is inevitable. And once you read it, you see analytics differently.
The paper is called “The Shift from Models to Compound AI Systems.” The lead authors are Matei Zaharia (who created Spark and co-founded Databricks) and Omar Khattab (creator of DSPy). These are serious people.
Their core claim: state-of-the-art AI results are increasingly coming not from better models, but from cleverly engineered systems that combine models with other components.
Why This Is True for Analytics #
The Berkeley team identifies four reasons why compound systems beat monolithic models. All four apply directly to analytics work:
Reason 1: Some tasks are easier to improve via system design.
Their example: a coding model that gets 30% correct on a benchmark. You can spend years and compute making it 35% correct. Or you can build a system that samples 100 solutions, runs unit tests on each, and returns the one that passes — which gets you to 80% on today’s model.
The analytics equivalent: an LLM that generates SQL queries that are correct 60% of the time. You can use a better model and get to 65%. Or you can add a query validation step, a reflection loop, and a “test on sample data before running on production” step — and get to 90% on the same model.
Reason 2: Systems can be dynamic.
Models are trained on historical data. Your metrics change. New tables get added. APIs change. A system that can retrieve current documentation dynamically will always outperform a model that memorized stale knowledge six months ago.
Reason 3: Improving control and trust is easier with systems.
“LLMs still hallucinate, but a system combining LLMs with retrieval can increase user trust by providing citations or automatically verifying facts.”
This is the entire value proposition of an analytics agent: not just an answer, but a cited answer with the query visible. Your stakeholders can see exactly what data was used. That’s fundamentally different from an LLM that just says a number.
Reason 4: Performance goals vary widely.
A “show me the traffic chart” question can run on a cheap fast model. “Explain this anomaly in the context of our seasonal patterns and current active campaigns” needs a more capable model. A compound system routes intelligently.
The Three Challenges They Identify #
The Berkeley team is honest about what’s hard:
-
Design space is vast. For RAG alone, the combinations of retriever + reranker + LLM + verification are enormous. There’s no standard answer yet.
-
Optimization is hard. You can’t backpropagate through a SQL query. New tools like DSPy use “textual backpropagation” to optimize prompts end-to-end.
-
Operations are harder. How do you monitor a system where one question might generate 12 API calls, 3 LLM invocations, and a code execution step? LLMOps is a new discipline.
What This Means for Data Teams #
The BAIR paper ends with a prediction: “Compound AI systems will remain the best way to maximize the quality and reliability of AI applications going forward.”
For data teams, this means:
- Analytics agents are not a passing fad. They are the production-grade pattern.
- The future belongs to teams that can engineer systems, not just use models.
- The bottleneck is no longer compute or model quality — it’s system design and evaluation.
That’s actually good news for data analysts. System design and evaluation? That’s what we’ve been doing for years. We just called it “analytics engineering.”