Active Learning with Clustering for Mining Big Data
Abstract
Big data mining is become a key research issue nowadays. It's costly and
also time-consuming to extract knowledge from big data. Big data is so
big, it contains millions of data points that's why it's very difficult to build
a learning model using machine learning and data mining algorithms. The
main problem is to fit the hole data into the computer memory, which is
quite impossible. Therefore, we need more scalable, robust, and adaptive
learning algorithms. The exiting mining algorithms are design to handle relatively
small datasets with fix number of class labels. In this paper, we have
proposed a new method to select a few/ less number of training instances
that we consider them as informative instances from a set of large data/ big
data using clustering techniques. We have applied our proposed method in
active leaning process for classifying big data. Active learning is a machine
learning process in supervised learning where an oracle is ask to label the
unlabelled training instances. It's very challenging and difficult task for
connoisseur to label a large number of unlabelled data. Therefore, finding
informative unlabelled training instances is necessary for learning from big
semi-supervised data. We have collected six benchmark datasets from UCI
machine learning repository and tested our proposed method using following
machine learning algorithms: naive Bayes (NB) Classifier, decision tree
(DT) classifier (i.e. C4.5 and CART), Support Vector Machines (SVM),
Random Forest, Bagging, and Boosting (AdaBoost).
Collections
- B.Sc Thesis/Project [82]