Cluster-Based Under-Sampling with Random Forest for Multi-Class Imbalanced Classification

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Cluster-Based Under-Sampling with Random Forest for Multi-Class Imbalanced Classification

    Thumbnail
    View/Open
    Md. Yasir Arafat - 012162024.pdf (593.9Kb)
    Date
    2018-02-19
    Author
    Arafat, Md. Yasir
    Metadata
    Show full item record
    Abstract
    Multi-class imbalanced classification has emerged as a very challenging re- search area in machine learning for data mining applications. It occurs when the number of training instances representing majority class instances is much higher than that of minority class instances. Existing machine learn- ing algorithms provide a good accuracy when classifying majority class in- stances, but ignore misclassify the minority class instances. However, the minority class instances hold the most vital information and misclassify- ing them can lead to serious problems. Several sampling techniques with ensemble learning have been proposed for binary-class imbalanced classifi- cation in the last decade. In this work, we propose a new ensemble learning technique by employing cluster-based under-sampling with random forest algorithm for dealing with multiclass highly imbalanced data classification. The proposed approach cluster the majority class instances and then select the most informative majority class instances in each cluster to form several balanced datasets. After that random forest algorithm is applied on bal- anced datasets and applied majority voting technique to classify test/ new instances. We tested the performance of our proposed method with existing popular sampling with boosting methods like: AdaBoost, RUSBoost, and SMOTEBoost on 13 benchmark imbalanced datasets. The experimental results show that the proposed cluster-based under-sampling with random forest technique achieved high accuracy for classifying both majority and minority class instances in compare with existing Methods.
    URI
    http://dspace.uiu.ac.bd/handle/52243/159
    Collections
    • M.Sc Thesis/Project [151]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS