Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification

Han Zhou; Hongpeng Yin; Xuanhong Deng; Yuyu Huang

doi:10.24963/ijcai.2023/274

Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification

Han Zhou, Hongpeng Yin, Xuanhong Deng, Yuyu Huang

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence

Main Track. Pages 2468-2475. https://doi.org/10.24963/ijcai.2023/274

PDF BibTeX

Many real-world streaming data are sequentially collected over time and with skew-distributed classes. In this situation, online learning models may tend to favor samples from majority classes, making the wrong decisions for those from minority classes. Previous methods try to balance the instance number of different classes or assign asymmetric cost values. They usually require data-buffers to store streaming data or pre-defined cost parameters. This study alternatively shows that the imbalance of instances can be implied by the imbalance of gradients. Then, we propose the Online Harmonizing Gradient Descent (OHGD) for one-pass online classification. By harmonizing the gradient magnitude occurred by different classes, the method avoids the bias of the proposed method in favor of the majority class. Specifically, OHGD requires no data-buffer, extra parameters, or prior knowledge. It also handles imbalanced data streams the same way that it would handle balanced data streams, which facilitates its easy implementation. On top of a few common and mild assumptions, the theoretical analysis proves that OHGD enjoys a satisfying sub-linear regret bound. Extensive experimental results demonstrate the high efficiency and effectiveness in handling imbalanced data streams.

Keywords:

Data Mining: DM: Mining data streams

Data Mining: DM: Class imbalance and unequal cost

Machine Learning: ML: Classification