Show simple item record

dc.contributor.authorAbdullah-All-Tanvir
dc.date.accessioned2022-07-30T05:42:49Z
dc.date.available2022-07-30T05:42:49Z
dc.date.issued2022-07
dc.identifier.urihttp://dspace.uiu.ac.bd/handle/52243/2493
dc.description.abstractClass unbalanced datasets are widespread in various fields, including health, security, and banking. When dealing with imbalanced datasets, a standard supervised learning algorithm is biased toward the dominant class. In real-life applications, however, the minority class instances are more interested in reflecting the notion than the majority class instances. For categorizing unbalanced datasets, numerous strategies based on sampling methods (under-sampling of the majority class and oversampling of the minority class), cost-sensitive learning methods, and ensemble learning have recently been employed in the literature. However, deleting the majority of samples at random using a uniform distribution may result in needless data loss. In this paper, we proposed 3 different cluster-based undersampling models to prevent unnecessary data loss. First, we inject test data into training data for clustering. Then we select 25% close to the centroid and 25% from the boundary line. For the last method, we clean 50% majority data around minority data. We experiment with our methods over 49 datasets and calculate auROC, auPR, F1-Score, and MCC for evaluation. According to the experimental results, our methods are promising and successful strategies for dealing with severely unbalanced datasets.en_US
dc.language.isoen_USen_US
dc.subjectMachine Learningen_US
dc.subjectImbalanced Dataen_US
dc.subjectCluster Based Methodsen_US
dc.titleImproving Machine Learning Methods for Handling Data Imbalance Problemen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record