Time Series

Monitoring availability metrics at scale creates a familiar problem: you have a time series, you need to know when it drops, and you need to know this automatically — without someone staring at a dashboard. This post walks through a statistical algorithm I built to do exactly that. It detects dips in any continuous metric (availability, reachability, error rate) and returns precise start and end timestamps for each event. No ML required — just a modified z-score, two rolling windows, and a few transition rules.