An Integrated Smart Contract Vulnerability Detection Tool Using Multi-layer Perceptron on Real-time Solidity Smart Contracts

Citation Author(s):
Song Haw Colin
Lee
Submitted by:
Song Haw Colin Lee
Last updated:
Fri, 11/24/2023 - 21:27
DOI:
10.21227/f9d0-fk07
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Smart contract vulnerabilities have led to substantial disruptions, ranging from the DAO attack
to the recent Poolz Finance arithmetic overflow incident. While historically, the definition of smart contract
vulnerabilities lacked standardization, even with the current advancements in Solidity smart contracts, the
potential for deploying malicious contracts to exploit legitimate ones persists.
The abstract Syntax Tree (AST ), Opcodes, and Control Flow Graph (CFG) are the intermediate representa-
tions for Solidity contracts. In this paper, we propose an efficient and scalable smart contract vulnerability
detection algorithm that uses all two representations for vulnerability detection based on multipool detection
leveraging on Machine Learning (ML) techniques. We use feature vectors from the Opcodes and CFG for
the ML model training. While there are existing works on ML-based approaches for analyzing the control
flow of the smart contract code, these approaches are constrained by (i) the vulnerability detection space, (ii)
significantly varying Solidity smart contract versions, and (iii) no unified scalable approach to verify against
the ground truth. Our primary contributions include (i) establishing a standardized pre-processing method for
cleaning smart contract training data, (ii) introducing bugs to create a balanced dataset of flawed files across
Solidity versions using AST, and (iii) standardizing vulnerability identification using the Smart Contract
Weakness Classification (SWC) registry as a common analysis platform. The ML models employed in our
study are Random Forest (RF), XGBoost (XGB), Support Vector Machine (SVM), MultiLayer Perceptron
(MLP), and a multi-input model combining MLP and Long Short Term Memory (LSTM). In this paper, we
have obtained an accuracy of up to 91% using real-time smart contracts deployed on Ethereum Blockchain

Instructions: 

In the bugged dataset, each bugged solidity file will have a buglog pair in csv format. The csv file will contain information such as, function name and location.