引用本文
  • 陈肇炫,邹德清,李珍,金海.基于抽象语法树的智能化漏洞检测系统[J].信息安全学报,2020,5(4):1-13    [点击复制]
  • CHEN Zhaoxuan,ZOU Deqing,LI Zhen,JIN Hai.Intelligent vulnerability detection system based on abstract syntax tree[J].Journal of Cyber Security,2020,5(4):1-13   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 7293次   下载 6915 本文二维码信息
码上扫一扫!
基于抽象语法树的智能化漏洞检测系统
陈肇炫1,2, 邹德清1,3,4, 李珍1,3, 金海1,2
0
(1.大数据技术与系统国家工程研究中心 服务计算技术与系统教育部重点实验室 集群与网格计算湖北省重点实验室 大数据安全湖北省工程研究中心, 武汉 中国 430074;2.华中科技大学 计算机科学与技术学院, 武汉 中国 430074;3.华中科技大学 网络空间安全学院, 武汉 中国 430074;4.深圳华中科技大学研究院, 深圳 中国 518000)
摘要:
源代码漏洞的自动检测是一个重要的研究课题。目前现有的解决方案大多是基于线性模型,依赖于源代码的文本信息而忽略了语法结构信息,从而造成了源代码语法和语义信息的丢失,同时也遗漏了许多漏洞特征。提出了一种基于结构表征的智能化漏洞检测系统Astor,致力于使用源代码的结构信息进行智能化漏洞检测,所考虑的结构信息是抽象语法树(Abstract Syntax Tree,AST)。首先,构建了一个从源代码转化而来且包含源码语法结构信息的数据集,提出使用深度优先遍历的机制获取AST的语法表征。最后,使用神经网络模型学习AST的语法表征。为了评估Astor的性能,对多个基于结构化数据和基于线性数据的漏洞检测系统进行比较,实验结果表明Astor能有效提升漏洞检测能力,降低漏报率和误报率。此外,还进一步总结出结构化模型更适用于长度大,信息量丰富的数据。
关键词:  漏洞检测  结构表征  抽象语法树  神经网络
DOI:10.19363/J.cnki.cn10-1380/tn.2020.07.01
投稿时间:2019-12-06修订日期:2020-04-20
基金项目:本课题得到国家自然科学基金项目(No.U1936211),深圳市基础研究(学科布局)(No.JCYJ20170413114215614),广东省省级科技计划项目(No.2017B010124001),广东省重点领域研发计划项目(No.2019B010139001)的资助。
Intelligent vulnerability detection system based on abstract syntax tree
CHEN Zhaoxuan1,2, ZOU Deqing1,3,4, LI Zhen1,3, JIN Hai1,2
(1.National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Clusters and Grid Computing Lab, Big Data Security Engineering Research Center, Wuhan 430074, China;2.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China;3.School of Cyber Science and engineering, Huazhong University of Science and Technology, Wuhan 430074, China;4.Institute of Huazhong University of Science and Technology, Shenzhen 518000, China)
Abstract:
Automatic detection of source code vulnerability is an important research topic. However, most existing solutions are based on linear models. They rely on the text information of source code but ignore the grammatical structure information. This will cause the loss of source code syntax and semantic information, but also miss many vulnerability features. In this paper, an Abstract Syntax Tree (AST) based source code structured representation learning system is proposed to study the structured information of source code and detect the vulnerabilities, called Astor. First, we present a data set that is transformed from the source code and contains information about the syntax structure of the source code. In addition, we propose using a depth first information extraction scheme to obtain the syntax and semantic representation of AST. In Astor, the neural network based detection system is used to learn the representation of AST. In order to evaluate the Astor, we compare vulnerability detection systems based on structured data and linear data. The results show that Astor can achieve much fewer false negative and false positive than other approaches. In addition, this paper further concludes that the structured model is more suitable for data with rich semantic information.
Key words:  vulnerability detection  structured representation  abstract syntax tree  neural network