A few years ago, I built a churn model for a B2B SaaS product. Logistic regression, binary label, 30-day prediction window. It performed fine. The business used it. I moved on.
What bothered me was a question the model couldn’t answer: how long does a customer actually stay?
I need to say something that makes some data analysts uncomfortable: the job is changing. Not disappearing — changing. And the analysts who understand the change will thrive. The ones who don’t will spend the next five years fighting it.
Earlier this year I shipped a pipeline rewrite I’m genuinely proud of. It replaced a 2,200-line SQL monolith — one of those files that everyone’s afraid to touch — with a clean layered architecture that handles 14 products, runs daily, and can be extended by adding a handful of config files.
In January 2024, Hugging Face published a benchmark that most people in the data world missed. They compared open-source LLMs against GPT-3.5 and GPT-4 on agent tasks — using a dataset that requires web search and calculator use, the fundamentals of any analytics agent.
In February 2024, the Berkeley AI Research Lab published a paper that quietly explained everything. Not “how to build AI” — but why the move from single LLM calls to multi-component systems is inevitable. And once you read it, you see analytics differently.
In August 2025, Meta published an engineering blog post that changed how I think about analytics agents. It’s called “Creating AI Agent Solutions for Warehouse Data Access and Security,” and it describes a multi-agent system they built for their internal data warehouse.
Monitoring availability metrics at scale creates a familiar problem: you have a time series, you need to know when it drops, and you need to know this automatically — without someone staring at a dashboard.
This post walks through a statistical algorithm I built to do exactly that. It detects dips in any continuous metric (availability, reachability, error rate) and returns precise start and end timestamps for each event. No ML required — just a modified z-score, two rolling windows, and a few transition rules.
Customer segmentation is one of those problems that sounds straightforward until you actually sit down with the data. In this post I’ll walk through an approach I built for segmenting customers based on their HTTP traffic patterns — the kind of traffic data that tells you not just how much a customer uses a service, but how they use it.
There’s a hard truth hiding in your analytics platform. Let me show you how to find it.
Open your BI tool. Look at the list of dashboards. Find the one that took you — or someone on your team — two weeks to build. The one with the carefully color-coded KPI tiles, the year-over-year comparisons, the trend lines going back 18 months.
In 2020, I was handed a PDF — an ILO working paper titled Spotting Export Potential and Implications for Employment in Developing Countries (Cheong, Decreux & Spies, 2018) — and asked to turn it into a working algorithm.
The paper describes a methodology developed by the International Trade Centre to identify a country’s unrealized export opportunities, and then estimate how many jobs realizing those opportunities would create. Across six developing countries. At the product-market-sector level.