Scalable Decision Tree Induction For Mining Big Data
Date
2019-05-22Author
Sabah, Shabnam
Anwar, Sara Zumerrah Binte
Afroze, Sadia
Sarker, Snigdha
Metadata
Show full item recordAbstract
Big data mining is one of the major challenging research issues in the field
of machine learning for data mining applications in this present digital era.
Big data consists of 3V's: (1) volume - massive amount of data/ too many
bytes, (2) velocity - high speed streaming data/ too high a rate, and (3)
variety - data are coming from different sources/ too many sources. Collecting
and managing real-life big data is a difficult task, as big data is so big
that we cannot keep all the data together in a single machine. Therefore,
we need advanced relational database management systems with parallel
computing to deal with big data. Knowledge mining from big data employing
traditional machine learning and data mining techniques is a big issue
and attract computational intelligent researcher in this area. In this paper,
we have used the decision tree (DT) induction method for mining big data.
Decision tree induction is one of the most preferable and well-known supervised
learning technique, which is a top-down recursive divide and conquer
algorithm and require little prior knowledge for constructing a classifier.
The traditional DT algorithms like Iterative Dichotomiser 3 (ID3), C4.5 (a
successor of ID3 algorithm), Classification and Regression Trees (CART)
are generally built for mining relatively small datasets. So, we need a more
scalable decision tree learning approach for mining big data. In this paper,
we have engendered several trees employing two scalable decision tree algorithms:
RainForest Tree and Bootstrapped Optimistic Algorithm for Tree
construction (BOAT) using seven benchmark datasets from Keel Repository
and UCI Machine Learning repository. We have compared the performance
of RainForest and BOAT algorithms. Also, we have proposed a decision
tree merging approach, as decision tree merging is a very complex and challenging
task.
Collections
- B.Sc Thesis/Project [82]