BABD: A Bitcoin Address Behavior Dataset for Pattern Analysis

Published in IEEE Transactions on Information Forensics and Security, 2023

Recommended citation: "BABD: A Bitcoin Address Behavior Dataset for Pattern Analysis," in IEEE Transactions on Information Forensics and Security, vol. 19, pp. 2171-2185, 2024, doi: 10.1109/TIFS.2023.3347894. https://ieeexplore.ieee.org/document/10375557/

Cryptocurrencies have dramatically increased adoption in mainstream applications in various fields such as financial and online services, however, there are still a few amounts of cryptocurrency transactions that involve illicit or criminal activities. It is essential to identify and monitor addresses associated with illegal behaviors to ensure the security and stability of the cryptocurrency ecosystem. In this paper, we propose a framework to build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge. We also propose a novel and efficient subgraph generation algorithm called BTC-SubGen to extract a k -hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node. We then conduct 13-class classification tasks on BABD-13 by five machine learning models namely k -nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost, the results show that the accuracy rates are between 93.24% and 97.13%. In addition, we study the relations and importance of the proposed features and analyze how they affect the effect of machine learning models. Finally, we conduct a preliminary analysis of the behavior patterns of different types of Bitcoin addresses using concrete features and find several meaningful and explainable modes.