A few years ago, I built a churn model for a B2B SaaS product. Logistic regression, binary label, 30-day prediction window. It performed fine. The business used it. I moved on.
What bothered me was a question the model couldn’t answer: how long does a customer actually stay?
I need to say something that makes some data analysts uncomfortable: the job is changing. Not disappearing — changing. And the analysts who understand the change will thrive. The ones who don’t will spend the next five years fighting it.
In January 2024, Hugging Face published a benchmark that most people in the data world missed. They compared open-source LLMs against GPT-3.5 and GPT-4 on agent tasks — using a dataset that requires web search and calculator use, the fundamentals of any analytics agent.
In February 2024, the Berkeley AI Research Lab published a paper that quietly explained everything. Not “how to build AI” — but why the move from single LLM calls to multi-component systems is inevitable. And once you read it, you see analytics differently.
Monitoring availability metrics at scale creates a familiar problem: you have a time series, you need to know when it drops, and you need to know this automatically — without someone staring at a dashboard.
This post walks through a statistical algorithm I built to do exactly that. It detects dips in any continuous metric (availability, reachability, error rate) and returns precise start and end timestamps for each event. No ML required — just a modified z-score, two rolling windows, and a few transition rules.
Customer segmentation is one of those problems that sounds straightforward until you actually sit down with the data. In this post I’ll walk through an approach I built for segmenting customers based on their HTTP traffic patterns — the kind of traffic data that tells you not just how much a customer uses a service, but how they use it.