Anomaly Detection in Blockchain Transactions using Machine Learning with Explainability Analysis
Abstract
In the era of growing cryptocurrency adoption, Blockchain has emerged as a leading player in the digital payment landscape. However, this widespread popularity also brings forth an array of security challenges, including the need to safeguard against malicious activities. One of the paramount challenges in this regard is the detection of anomalous transactions within the
realm of Bitcoin data, a task that significantly influences the trust and security of digital payments. Yet, it’s a formidable challenge given the relatively low occurrence of anomalous Bitcoin transactions. Although several studies have been conducted in the field, a limitation persists: the lack of explanations for the model’s predictions. This study aims to address this limitation by combining eXplainable Artificial Intelligence (XAI) techniques and anomaly rules with tree-based ensemble classifiers. While deep learning
techniques have demonstrated their prowess in anomaly detection, there remains a scarcity of studies exploring their potential, particularly in the context of Bitcoin. This study also aims to fill that gap, focusing on our 1D Convolutional Neural Network (CNN) model. To understand how our model works and explain its decisions, we use the Shapley Additive exPlanation (SHAP) method, which measures each feature’s impact. We also deal with data imbalance by exploring various methods to balance anomalous
and non-anomalous Bitcoin transaction data. Additionally, we have introduced an under-sampling algorithm named XGBCLUS, designed to balance anomalous and non-anomalous transaction data. This algorithm is compared against other commonly used under-sampling and over-sampling techniques. Our experimental results demonstrate that: (i) XGBCLUS enhances TPR and ROC-AUC scores compared to state-of-the-art undersampling and over-sampling techniques, and (ii) our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy, TPR, and FPR scores, and (iii) our proposed 1D CNN model attains elevated accuracy with a concurrent reduction in the False Positive Rate (FPR).
Collections
- M.Sc Thesis/Project [145]