一种基于深度学习的恶意软件家族分类模型

郑锐; 汪秋云; 傅建明; 姜政伟; 苏日古嘎; 汪姝玮

引用本文：

郑锐,汪秋云,傅建明,姜政伟,苏日古嘎,汪姝玮.一种基于深度学习的恶意软件家族分类模型[J].信息安全学报,2020,5(1):1-9 [点击复制]
ZHENG Rui,WANG Qiuyun,FU Jianming,JIANG Zhengwei,SU Riguga,WANG Shuwei.A Novel Malware Classification Model Based on Deep Learning[J].Journal of Cyber Security,2020,5(1):1-9 [点击复制]

本文已被：浏览 17366次下载 12756次	码上扫一扫！
一种基于深度学习的恶意软件家族分类模型
郑锐^1,2, 汪秋云², 傅建明¹, 姜政伟^2,3, 苏日古嘎^1,2, 汪姝玮²
0 字体:加大+\|默认\|缩小-
(1.空天信息安全与可信计算教育部重点实验室, 武汉大学国家网络安全学院武汉中国 430072;2.中国科学院信息工程研究所北京中国 100093;3.中国科学院大学网络空间安全学院北京中国 100049)

摘要:

恶意软件的家族分类问题是网络安全研究中的重要课题，恶意软件的动态执行特征能够准确的反映恶意软件的功能性与家族属性。本文通过研究恶意软件调用Windows API的行为特点，发现恶意软件的恶意行为与序列前后向API调用具有一定的依赖关系，而双向LSTM模型的特征计算方式符合这样的依赖特点。通过设计基于双向LSTM的深度学习模型，对恶意软件的前后API调用概率关系进行了建模，经过实验验证，测试准确率达到了99.28%，所提出的模型组合方式对恶意软件调用系统API的行为具有良好的建模能力，为了深入的测试深度学习方法的分类性能，实验部分进一步设置了对抗样本实验，通过随机插入API序列的方式构造模拟对抗样本来测试原始参数模型的分类性能，对抗样本实验表明，深度学习方法相对某些浅层机器学习方法具有更高的稳定性。文中实验为深度学习技术向工业界普及提供了一定的参考意义。

关键词: 深度学习恶意软件家族分类鲁棒性

DOI：10.19363/J.cnki.cn10-1380/tn.2020.01.01

投稿时间：2019-09-05修订日期：2019-12-09

基金项目:本课题得到国家自然科学基金项目（No.61972297，No.U1636107），基础加强计划（No.2017-JCJQ-ZD-043-01-00），国家重点研发计划（No.2016QY06X1204，No.2018YFC0824801）资助。

A Novel Malware Classification Model Based on Deep Learning

ZHENG Rui^1,2, WANG Qiuyun², FU Jianming¹, JIANG Zhengwei^2,3, SU Riguga^1,2, WANG Shuwei²

(1.School of Cyber Science and Engineering WuHan University, Wuhan 430072, China;2.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;3.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China)

Abstract:

Family classification of malicious software is an important issue in research of computer system security. The dynamic execution feature of malicious software always reflect the functionality and family attributes of malware. By studying the behavior characteristics of malware system API call, we observer that malicious behavior of malware has a certain dependence on sequence forward and backward API call. Bidirectional LSTM can cover such dependency characteristics. By designing a deep learning model based on Long Short-Term Memory network, model the relationship of forward and backward. The experimental results show that the test accuracy reaches 99.28%. The proposed model combination method has good modeling ability for the behavior of malicious software invocation system API. To evaluate the classification performance of deep learning method, in the experimental part, we further add the adversarial examples experiment, and construct the simulated adversarial examples by inserting the adversarial sequence randomly in test samples to test the classification performance of the original parameter model. The adversarial examples experiment shows that the deep learning model is more robust than shallow machine learning methods. The experiment in this paper provides some reference for the popularization of deep learning technology to industry.

Key words: deep learning malicious software family classification robust