Automatic Dip Detection in Time Series: A Statistical Approach

Monitoring availability metrics at scale creates a familiar problem: you have a time series, you need to know when it drops, and you need to know this automatically — without someone staring at a dashboard.

This post walks through a statistical algorithm I built to do exactly that. It detects dips in any continuous metric (availability, reachability, error rate) and returns precise start and end timestamps for each event. No ML required — just a modified z-score, two rolling windows, and a few transition rules.

The Problem
#

A “dip” in a time series sounds easy to define: the value goes down. But in practice:

Metrics fluctuate constantly — you don’t want to fire on every small wobble
Some drops are so brief they’re noise (a single anomalous minute)
Some recoveries are partial — the metric bounces back briefly before dropping again
The absolute threshold that matters varies by day, because the baseline isn’t constant

A naive threshold (if value < 0.999, it's a dip) breaks quickly. You either miss real events or drown in false alarms.

The Assumptions
#

The algorithm makes two working assumptions:

The distribution is stationary — the metric doesn’t have predictable time-of-day patterns where lower values are expected. If your metric does have those patterns, you’d extend this with a time-varying baseline.
Under normal conditions, the metric is roughly normally distributed — a bell curve around a stable central value, with dips appearing as strong negative deviations.

Both hold well for high-level availability metrics aggregated over many requests.

Step 1: Compute a Modified Z-Score
#

Rather than comparing against a fixed threshold, we normalise each data point against the current distribution of the signal. This makes the algorithm adaptive — the “what counts as a dip” question is answered relative to the recent behaviour of the metric itself.

For each data point $x_i$:

$$z_i = \frac{x_i - \text{reference_val}}{\sigma}$$

Where:

reference_val is either the median of the dataset (adaptive) or a fixed SLA value (e.g. 0.99999) — configurable
σ is the standard deviation of the full time series window (typically 24 hours)

A data point is flagged as a potential dip if $z_i < -1$ — i.e. the value is more than one standard deviation below the reference.

sttdev = dip_timeline_reindex.std()
reference_val = (
    dip_timeline_reindex.median() if reference == "median" else reference_val
)

comp = dip_timeline_reindex.sub(reference_val) / sttdev
dip_threshold = (comp < -1) * 1  # binary: 1 = possible dip, 0 = normal

Using the median as reference (rather than the mean) makes the baseline robust to outliers — a single deep dip doesn’t pull the reference value down and cause the algorithm to miss subsequent events.

Step 2: Filter Out Noise with a Minimum Duration Window
#

A single minute below threshold is almost certainly noise. We only want to flag an event as a real dip start if it’s sustained — specifically, if the next min_window minutes (default: 5) are also flagged.

We track two additional signals:

shift: the lagged value of is_dip — tells us what the previous minute’s state was
roll_sum: a forward-looking rolling sum over the next max_window minutes — tells us what’s coming

dip_finder["shift"] = dip_finder.shift(1).fillna(0)
dip_finder["roll_sum"] = (
    dip_finder.is_dip.shift(-max_window + 1)
    .rolling(window=max_window)
    .sum()
    .fillna(0)
)

Step 3: Label Dip Transitions
#

With these three signals (is_dip, shift, roll_sum), we can precisely label each minute as a dip start, dip end, or neither:

$$\text{start_end}_i = \begin{cases} \text{“dip_start”} & \text{if } \text{is_dip}_i = 1 \wedge \text{shift}_i = 0 \wedge \text{roll_sum}_i \geq \text{min_window} \ \text{“dip_end”} & \text{if } \text{is_dip}_i = 0 \wedge \text{shift}_i = 1 \wedge \text{roll_sum}_i = 0 \ \text{NaN} & \text{otherwise} \end{cases}$$

In plain English:

Dip start: the current minute is below threshold, the previous minute was not, and the next min_window minutes are also below threshold
Dip end: the current minute is above threshold, the previous minute was not, and the next max_window minutes are all above threshold — meaning we’ve genuinely recovered, not just bounced

The max_window check on dip end (default: 15 minutes) is deliberate. Without it, a brief recovery in the middle of a sustained dip would split it into two separate events, making the duration statistics meaningless.

dip_finder["start_end"] = dip_finder.apply(
    lambda dip: (
        "dip_end"
        if (dip["is_dip"] == 0) & (dip["shift"] == 1) & (dip["roll_sum"] == 0)
        else (
            "dip_start"
            if (dip["is_dip"] == 1) & (dip["shift"] == 0) & (dip["roll_sum"] >= min_window)
            else np.nan
        )
    ),
    axis=1,
)

Step 4: Remove Consecutive Dip Starts
#

In practice, you can get runs of consecutive dip_start labels during a noisy entry into a dip. We only want the first one. This is handled by grouping consecutive identical labels and keeping only the first occurrence:

dips_only["consecutive_count"] = (
    dips_only["start_end"]
    .groupby((dips_only["start_end"] != dips_only["start_end"].shift()).cumsum())
    .cumcount() + 1
)
dips_only = dips_only.loc[~(dips_only["consecutive_count"] > 1)].copy()

Step 5: Pair Starts and Ends
#

Finally, dip_start and dip_end events are paired sequentially. Only valid pairs are retained — a start without a following end (e.g. a dip still ongoing at the end of the observation window) is excluded. Duration is computed in both timedelta and minutes.

dip_start_end_df["duration"] = dip_start_end_df.dip_end - dip_start_end_df.dip_start
dip_start_end_df["duration_min"] = dip_start_end_df["duration"].dt.total_seconds() / 60

The Parameters
#

The algorithm has five configurable parameters:

Parameter	Default	What it controls
`min_window`	5 min	Minimum sustained duration to call something a dip start
`max_window`	15 min	Minimum recovery duration to call something a dip end
`reference`	`"median"`	Whether to normalise against the dataset median or a fixed SLA value
`reference_val`	0.99999	The SLA value, if `reference = "SLA"`
`smooth`	`False`	Apply a rolling average before detection (trades sensitivity for noise reduction)

The smooth parameter deserves a note: it was initially appealing as a way to eliminate false alarms, but in practice it caused the algorithm to miss short but real dips — the smoothing would average away a 3-minute event entirely. For most use cases, leaving it off and relying on min_window to filter noise is the better approach.

Why Not ML?
#

A few reasons this approach was chosen over a machine learning model:

Interpretability. When an alert fires, you can trace exactly why: the z-score was below -1 for more than 5 consecutive minutes, the reference value was X, the standard deviation was Y. There’s no black box.

No training data required. The algorithm works on the current 24-hour window. You don’t need historical labelled examples of dips to get started.

Robustness to distribution shift. If the baseline level of a metric drifts over time (e.g. availability naturally improves as infrastructure scales), the median-based reference value adapts automatically.

Low computational overhead. The entire algorithm is vectorised pandas operations — it runs on minute-level data for a 24-hour window in milliseconds.

What This Enables
#

The output is a clean dataframe of (dip_start, dip_end, duration, duration_min) tuples. This feeds naturally into downstream analysis: which contributing entities were responsible for the dip during each window, how severe it was relative to the reference, and whether the pattern matches known failure modes.

The algorithm is written in Python for prototyping but is straightforward to port to Scala for production pipeline integration — all the logic is standard window functions and groupby operations that map directly to Spark or Flink semantics.

The full function signature with all parameters:

def find_dip_start_end(
    dip_timeline: pd.DataFrame,
    min_window: int = 5,
    max_window: int = 15,
    smooth: bool = False,
    reference: str = "median",
    reference_val: float = 0.99999,
    metric: str = "avg_availability",
) -> pd.DataFrame:
    ...

The Problem #

The Assumptions #

Step 1: Compute a Modified Z-Score #

Step 2: Filter Out Noise with a Minimum Duration Window #

Step 3: Label Dip Transitions #

Step 4: Remove Consecutive Dip Starts #

Step 5: Pair Starts and Ends #

The Parameters #

Why Not ML? #

What This Enables #