Active Learning for Mining Big Data

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Active Learning for Mining Big Data

    Thumbnail
    View/Open
    Active Learning for Mining Big.pdf (2.054Mb)
    Date
    2019-09-22
    Author
    Jahan, Sadia
    Metadata
    Show full item record
    Abstract
    Active learning also known as an optimal experimental design, is a process for building a classifier or learning model with less number of training instances in the semi-supervised setting. It's a well-known approach that is used in many real-life machine learning and data mining applications. Active learning uses a query function and an oracle or expert (e.g., a human or information source) for labeling unlabeled data instances to boost up the performance of a classifier. Labeling the unlabeled data instances is difficult, time-consuming, and expensive. In this paper, we have proposed an approach based on cluster analysis for selecting informative training instances from large number of unlabeled data instances or big data that helps us to select less number of training instances to build a classifier suitable for active learning. The proposed method clusters the unlabeled big data into several clusters and find the informative instances from each cluster based on the center of the cluster, nearest neighbors of the center of the cluster, and also selecting random instances from each cluster. The objective is to nd the informative unlabeled instances and label them by the oracle for scaling up the classification results of the machine learning algorithms to be applied on big data. We have tested the performance of the proposed method on seven benchmark datasets from UC Irvine Machine Learning Repository employing following five well-known machine learning algorithms: C4.5 (decision tree induction), SVM (support vector machines), Random Forest, Bagging, and Boosting (AdaBoost). The experimental analysis proved that proposed method improves the performance of classifiers in active learning with less number of training instances.
    URI
    http://dspace.uiu.ac.bd/handle/52243/1379
    Collections
    • M.Sc Thesis/Project [151]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS