[Online]Comparative Analysis of SQLi Detection Models

Comparative Analysis of SQLi Detection Models

ID：224 Submission ID：486 View Protection：ATTENDEE Updated Time：2025-12-28 11:13:49 Hits：476 Online

Start Time：2025-12-29 18:30 （Asia/Amman）

Duration：15min

Session：[S3] Track 3: Privacy, Security for Networks » [S3] Track 3: Privacy, Security for Networks

Video No Permission Presentation File

Tips: The file permissions under this presentation are only for participants. You have not logged in yet and cannot view it temporarily.

Abstract

SQL injection (SQLi) remains a common and ongoing threat to web applications. Although various SQLi detection techniques have been proposed, most studies still evaluate them on a single dataset, which makes their conclusions lack verifiability across data conditions. This also makes it difficult to reveal the performance differences of the model under different scales and distributions. This study compares and evaluates machine learning (ML) and deep learning (DL) models based on two publicly available SQLi datasets that differ in size and composition.

The machine learning (ML) pipelines use a hybrid representation that combines character-level TF-IDF, word-level TF-IDF obtained from a SQL-aware tokenizer, and numeric behavioral indicators. The DL branch uses placeholder-based normalization and token-sequence modeling, covering recurrent networks (LSTM and GRU) as well as attention-based variants and a Transformer architecture.

Empirical results have shown that the scale of the dataset plays a significant role in the relative performance of DL models. On the smaller corpus, the Long Short-Term Memory (LSTM) model with multi-head attention achieves the best performance among all DL architectures, while several ML models perform at a comparable or higher level. On the larger and more heterogeneous corpus, the Transformer model attains the highest F1 macro, reaching 0.9946. Linear Support Vector Classification is one of the robust ML benchmarks on both datasets. These results show that ML models lead on the smaller dataset but are surpassed by the top-performing DL model once the dataset becomes larger and more diverse.

Keywords

SQL injection detection, machine learning, deep learning, LinearSVC, Transformer, TF–IDF, tokenization, web application security

Speaker

Gegentana Altanhuyag

Student Mongolian University of Science and Technology； Mongolia

Submission Author

Gegentana Altanhuyag Mongolian University of Science and Technology； Mongolia

Comment submit

All comments

CONTACT US

Email: asiancomnet@usssociety.org

Website & IT Support: hi@aconf.org

Registration Submit Paper