Active Learning with Clustering for Mining Big Data
Rabby, Reza E
MetadataShow full item record
Big data mining is become a key research issue nowadays. It's costly and also time-consuming to extract knowledge from big data. Big data is so big, it contains millions of data points that's why it's very difficult to build a learning model using machine learning and data mining algorithms. The main problem is to fit the hole data into the computer memory, which is quite impossible. Therefore, we need more scalable, robust, and adaptive learning algorithms. The exiting mining algorithms are design to handle relatively small datasets with fix number of class labels. In this paper, we have proposed a new method to select a few/ less number of training instances that we consider them as informative instances from a set of large data/ big data using clustering techniques. We have applied our proposed method in active leaning process for classifying big data. Active learning is a machine learning process in supervised learning where an oracle is ask to label the unlabelled training instances. It's very challenging and difficult task for connoisseur to label a large number of unlabelled data. Therefore, finding informative unlabelled training instances is necessary for learning from big semi-supervised data. We have collected six benchmark datasets from UCI machine learning repository and tested our proposed method using following machine learning algorithms: naive Bayes (NB) Classifier, decision tree (DT) classifier (i.e. C4.5 and CART), Support Vector Machines (SVM), Random Forest, Bagging, and Boosting (AdaBoost).
- B.Sc Thesis/Project