Active Learning with Clustering for Mining Big Data

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • B.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • B.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Active Learning with Clustering for Mining Big Data

    Thumbnail
    View/Open
    Active Learning with Clustering for Mining Big Data.pdf (707.1Kb)
    Date
    2019-05-28
    Author
    Ibrahim, Md.
    Masud, Salman
    Rabby, Reza E
    Metadata
    Show full item record
    Abstract
    Big data mining is become a key research issue nowadays. It's costly and also time-consuming to extract knowledge from big data. Big data is so big, it contains millions of data points that's why it's very difficult to build a learning model using machine learning and data mining algorithms. The main problem is to fit the hole data into the computer memory, which is quite impossible. Therefore, we need more scalable, robust, and adaptive learning algorithms. The exiting mining algorithms are design to handle relatively small datasets with fix number of class labels. In this paper, we have proposed a new method to select a few/ less number of training instances that we consider them as informative instances from a set of large data/ big data using clustering techniques. We have applied our proposed method in active leaning process for classifying big data. Active learning is a machine learning process in supervised learning where an oracle is ask to label the unlabelled training instances. It's very challenging and difficult task for connoisseur to label a large number of unlabelled data. Therefore, finding informative unlabelled training instances is necessary for learning from big semi-supervised data. We have collected six benchmark datasets from UCI machine learning repository and tested our proposed method using following machine learning algorithms: naive Bayes (NB) Classifier, decision tree (DT) classifier (i.e. C4.5 and CART), Support Vector Machines (SVM), Random Forest, Bagging, and Boosting (AdaBoost).
    URI
    http://dspace.uiu.ac.bd/handle/52243/1147
    Collections
    • B.Sc Thesis/Project [82]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS