A Hidden Markov Framework for Concept Drift Detection and Classification
1 Christ University, Bangalore, Karnataka, 560029, India
2 Department of Computer Science and Engineering, Techno International New Town, Kolkata, 700156, India
Abstract
Machine learning systems deployed in real-world environments frequently encounter non-stationary data streams in which the underlying data-generating distribution shifts over time. This phenomenon, known as concept drift, causes progressive model degradation if left undetected. Existing detection methods largely treat drift as a binary event, ignoring the temporal dynamics and structural diversity of distributional change. In this paper, we present the Hidden Markov Model (HMM)-based drift tracking (HDT) system, a framework that models concept drift as a latent probabilistic process over three hidden states, i.e., stable, warning, and drift using an HMM. The Viterbi algorithm is employed to decode the most probable state sequence from a multivariate observation vector constructed from sliding-window statistical features of the data stream, including the sample mean, variance, Kolmogorov-Smirnov (KS) statistic, and model error rate. Upon detecting a drift event, HDT classifies it into one of two primary structural categories as sudden drift and gradual drift and further determines whether each event is harmful or benign based on a feature-derived severity criterion. Experiments conducted on the university of California (UCI) gas sensor array drift dataset, comprising 13,810 post-initialization observations across ten sensor batches collected over 36 months, demonstrate the system's ability to track drift onset, progression, and recovery in a physically motivated non-stationary stream. Results shows 2,575 confirmed drift events, with 1,021 classified as harmful and 1,554 as non-harmful. The HDT system offers a principled and interpretable alternative to threshold-based detectors for monitoring deployed machine learning systems in dynamic environments.
Graphical Abstract

Novelty Statement
This work introduces a Hidden Markov Model (HMM)-based Drift Tracking (HDT) framework that formulates concept drift detection as a probabilistic latent-state inference problem instead of conventional threshold-based binary decisions.

