Big Data Mining in the Presence of Concept Drifting

Siraj, Tabassum; Jannat, Efrana; Rasul, Warafta; Chowdhury, Meher Afroz

View/Open

Big Data Mining in the Presence of Concept Drifting.pdf (4.720Mb)

Date

2019-03-05

Author

Siraj, Tabassum

Jannat, Efrana

Rasul, Warafta

Chowdhury, Meher Afroz

Metadata

Show full item record

Abstract

Concept drift in big data mining is an absolute, highly demanding research issue in this digital era. A concept in "concept drift" involved in the field of data mining (DM) and machine learning (ML) studies is referred to the relationship between input features and class variables. In real-life classification problems, a concept can be changed or new concepts may appear over the time. Innumerable data mining classification models, new methods are propounded every day to enhance the solution to this raised issue. In this paper, we have extended our vision to attempt an investigation in resolving the issue of concept drift detection, and carried out analysis for creating a better method to adapt to the newly arriving concepts. We have addressed the issue of concept drifting for mining big data, and presented an evolutionary, concept-adaptive, rule-based approach that classifies novel class instances in "concept drifting". The proposed method clusters the big data and extracts decision rules from each cluster for classifying instances with existing classes. For classifying new unlabelled instances, the proposed method first examines and ensures if the new instances belong to the clusters in existence or not. If the new instance does not appear to belong to the existing clusters, we have considered this instance as a novel class instance. Then, we have extracted new classification rules from the new instances data and added these new rules with existing rules. Performance evaluation tests of the proposed method have been conducted using a number of datasets provided by UCI (University of California, Irvine) machine learning repository. The calculated result proves proposed approach as an effective and efficient means of detecting novel class, which implies concept drift identification at the same time.

URI

http://dspace.uiu.ac.bd/handle/52243/926

Collections

B.Sc Thesis/Project [82]