基于中间语言的PHP注入漏洞检测方法研究

张国栋; 刘子龙; 靳卓; 姚天宇; 孙东红; 秦佳伟

引用本文：

张国栋,刘子龙,靳卓,姚天宇,孙东红,秦佳伟.基于中间语言的PHP注入漏洞检测方法研究[J].信息安全学报,已采用 [点击复制]
zhangguodong,liuzilong,jinzhuo,yaotianyu,sundonghong,qinjiawei.Research on PHP Injection Vulnerability Detection Method based on Intermediate Language[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 568次下载 0次
基于中间语言的PHP注入漏洞检测方法研究
张国栋¹, 刘子龙¹, 靳卓¹, 姚天宇¹, 孙东红², 秦佳伟³
0 字体:加大+\|默认\|缩小-
(1.沈阳航空航天大学;2.清华大学;3.国家计算机网络应急技术处理协调中心)

摘要:

Web应用数量快速增长并已广泛用于各领域，其中存在的漏洞数量也随之增长，安全问题日益显著。注入漏洞是Web应用漏洞中最具广泛性和破坏性的威胁，但漏洞检测工具所提取的信息中会缺失部分与漏洞相关的语义信息，且包含大量与漏洞信息无关的噪声数据，导致漏洞检测存在误报和漏报问题。针对此问题，本文提出了一种命名为Alpherg的中间语言表示，此语言具有保留源代码中的代码信息、提取源代码中仅与漏洞相关的语义信息和表示源代码的控制流信息等特点。利用Alpherg进行漏洞特征提取时，表示结果丢弃了与漏洞无关的噪声数据，保留了源代码中的上下文信息，且形式上可脱离原有的编程语言，具有可读性。利用Alpherg进行漏洞特征提取，提出了一种基于Bi-LSTM和注意力机制的PHP注入漏洞检测模型，该模型利用Bi-LSTM处理长期依赖关系的能力，可得到Alpherg长序列表示中的上下文关系；进一步，在模型中加入注意力机制，通过计算每个时间步的注意力分布，更好地利用Alpherg表示中与漏洞相关的信息，提高了模型的漏洞检测能力。本文将Alpherg与其他特征提取方法处理结果进行了比较，结果表明Alpherg能精确地提取与漏洞存在直接关系的信息，避免引入过多噪声，并保留了漏洞的语义信息。在SARD数据集上验证了所提出的漏洞检测模型，漏洞检测结果表明该模型漏洞检测准确率为98%，高于作为对比的三个静态检测工具和基于PHP token的深度学习漏洞检测模型，证明了此方法的可行性和有效性。

关键词: 注入漏洞检测深度学习漏洞语义特征代码切片

DOI：

投稿时间：2023-02-06修订日期：2023-05-31

基金项目:航空科学基金; 辽宁省自然科学基金; 辽宁省教育厅科技基金

Research on PHP Injection Vulnerability Detection Method based on Intermediate Language

zhangguodong¹, liuzilong¹, jinzhuo¹, yaotianyu¹, sundonghong², qinjiawei³

(1.Shenyang Aerospace University;2.Tsinghua University;3.National Computer Network Emergency Response Technical Team/Coordination Center of China)

Abstract:

With the rapid growth of the number of Web applications and their wide use in various fields, the number of vul-nerabilities in Web applications has also increased, and the security problems have become increasingly significant. Injection vulnerabilities are considered to be the most widespread and destructive threat in Web application vul-nerabilities. However, the information extracted by vulnerability detection tools will miss some semantic infor-mation related to vulnerabilities, and contain a lot of noise data unrelated to vulnerability information, which leads to false positives and false negatives in vulnerability detection. To solve this problem, this paper proposes an in-termediate language representation named Alpherg, which has the characteristics of retaining the code information in the source code, extracting the semantic information only related to the vulnerability in the source code, and representing the control flow information of the source code. When Alpherg is used to extract vulnerability features, the representation results discard the noise data unrelated to the vulnerability, retain the context information in the source code, and the form can be separated from the original programming language, which is readable. Alpherg is used to extract vulnerability features, and a PHP injection vulnerability detection model based on Bi-LSTM and attention mechanism is proposed. The model uses Bi-LSTM's ability to deal with long-term dependencies to obtain the context relationship in Alpherg's long sequence representation. Furthermore, an attention mechanism is added to the model to better utilize the information related to vulnerabilities in the Alpherg representation by calculating the attention distribution at each time step and improve the vulnerability detection ability of the model. In this paper, Alpherg is compared with other feature extraction methods. The results show that Alpherg can accurately extract the information directly related to the vulnerability, avoid introducing too much noise, and retain the semantic infor-mation of the vulnerability. The vulnerability detection model proposed in this paper is verified on the SARD da-taset. The vulnerability detection results show that the vulnerability detection accuracy of the model is 98%, which is higher than the three static detection tools and the PHP Token-based deep learning vulnerability detection model as comparison, which proves the feasibility and effectiveness of this method.

Key words: injection vulnerability detection deep learning semantic features of vulnerabilities code slicing