基于中间语言的PHP注入漏洞检测方法研究

张国栋; 刘子龙; 姚天宇; 靳卓; 孙东红; 秦佳伟

本文已被：浏览 1864次下载 1151次	码上扫一扫！
基于中间语言的PHP注入漏洞检测方法研究
张国栋,刘子龙,姚天宇,靳卓,孙东红,秦佳伟
分享到：微信更多字体:加大+\|默认\|缩小-
(沈阳航空航天大学计算机学院沈阳中国 110136;清华大学网络科学与网络空间研究院北京中国 100084;国家计算机网络应急技术处理协调中心北京中国 100029)

摘要:

Web应用数量快速增长并广泛用于各领域,所存在的漏洞数量也随之增长。注入漏洞是Web应用漏洞中最具广泛性和破坏性的,漏洞检测工具所提取的信息中会缺失部分与漏洞相关的语义信息,且包含大量与漏洞信息无关的噪声数据,导致误报和漏报。针对此问题,提出了一种命名为Alpherg的中间语言表示,具有保留源代码信息、提取源代码中仅与漏洞相关的语义信息和表示源代码控制流信息等特点。利用其进行漏洞特征提取时,表示结果丢弃了与漏洞无关的噪声数据,保留了源代码中的上下文信息,形式上可脱离原有的编程语言,具有可读性。利用Alpherg进行漏洞特征提取,提出了基于Bi-LSTM和注意力机制的PHP注入漏洞检测模型,利用Bi-LSTM得到Alpherg长序列表示中的上下文关系;进一步,通过加入注意力机制计算每个时间步的注意力分布,更好地利用Alpherg中与漏洞相关的信息,提高了模型的漏洞检测能力。将Alpherg与其他特征提取方法处理结果进行了比较,结果表明Alpherg能精确地提取与漏洞存在直接关系的信息,避免引入过多噪声,并保留了漏洞的语义信息。在SARD数据集上验证了所提出的漏洞检测模型,漏洞检测结果表明该模型漏洞检测准确率为98%,高于作为对比的三个静态检测工具和基于PHP token的深度学习漏洞检测模型,证明了此方法的可行性和有效性。

关键词: 注入漏洞检测深度学习漏洞语义特征代码切片

DOI：10.19363/J.cnki.cn10-1380/tn.2024.11.08

投稿时间：2023-02-06修订日期：2023-05-31

基金项目:本课题得到国家重点研发计划项目(No.2022YFB3103901)和中关村实验室项目(No.ZGC-02-20220211)资助。

Research on PHP Injection Vulnerability Detection Method Based on Intermediate Language

ZHANG Guodong,LIU Zilong,YAO Tianyu,JIN Zhuo,SUN Donghong,QIN Jiawei

School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China;Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing 100084, China;National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China

Abstract:

With the rapid growth of Web applications and use in various fields, the number of vulnerabilities in Web applications has increased. Injection vulnerabilities are the most widespread and destructive in Web application vulnerabilities. However, the information extracted by vulnerability detection tools will miss semantic information related to vulnerability and contain lots of noise data unrelated to vulnerability, which leads to false positives and false negatives. To solve this problem, an intermediate language representation named Alpherg is proposed, which can retain the code information, extract the semantic information only related to the vulnerability, and represent the control flow information in the source code. Using Alpherg to extract vulnerability features, the results discard the noise data unrelated to vulnerability, retain the context information in the source code, and the form can be separated from the original programming language. Using Alpherg, a PHP injection vulnerability detection model based on Bi-LSTM and attention mechanism is proposed. The model uses Bi-LSTM to obtain the context relationship in Alpherg’s long sequence representation. Furthermore, attention mechanism is added to the model to utilize the information related to vulnerabilities in the Alpherg representation by calculating the attention distribution at each time step and improving the vulnerability detection ability. Compared Alpherg with other methods, the results show that it can accurately extract information related to vulnerability directly, avoid noise and retain the semantic information of vulnerability. The proposed model is verified on the SARD dataset. The results show that the vulnerability detection accuracy of the proposed model is 98%, which is higher than the three static detection tools and the PHP Token-based deep learning vulnerability detection model, which proves the feasibility and effectiveness of this method.

Key words: injection vulnerability detection deep learning semantic features of vulnerabilities code slicing