Machine Learning for Mining Imbalanced Data
Abstract
Mining imbalanced data, which is also known as class imbalanced problem is one of the most enormous challenging tasks in machine learning for data mining applications. To achieve overall accurate performance in imbalanced classification employing machine learning techniques is difficult as the ma- jority class instances always overpower the minority class instances by a huge difference. An unequal distribution is very common in real world high dimensional datasets, where binary classification is more frequent than the multi-class classification task. Most existing machine learning algorithms are more focused on classifying majority class instances while ignoring or misclassifying minority class instances. Several techniques have been in- troduced in the last decades for imbalanced data classification, where each of this technique has their own advantages and disadvantages. In this pa- per, we have study and compared 12 extensively imbalanced data classifica- tion methods: SMOTE, AdaBoost, RUSBoost, EUSBoost, SMOTEBoost, MSMOTEBoost, DataBoost, Easy Ensemble, BalanceCascade, OverBag- ging, UnderBagging, SMOTEBagging to extract their characteristics and performance on 22 imbalanced datasets.
Collections
- M.Sc Thesis/Project [145]