基于深度学习的恶意代码检测综述

严沛; 谭舜泉; 黄继武

本文已被：浏览 12382次下载 7391次	码上扫一扫！
基于深度学习的恶意代码检测综述
严沛,谭舜泉,黄继武
分享到：微信更多字体:加大+\|默认\|缩小-
("深圳大学"智能信息处理"广东省重点实验室深圳中国 518060;深圳大学"媒体信息内容安全"深圳市重点实验室深圳中国 518060)

摘要:

恶意代码是计算机和网络安全最大的隐患之一。尽管已经有众多检测方法和工具,但在恶意代码变体快速迭代、代码样本爆炸性增长的形势下,如何提升恶意代码检测方法的性能,仍然是当前网络安全领域富有挑战性的热点研究问题。随着人工智能技术的发展,基于深度学习的恶意代码检测方法逐步引起研究人员的重视。通过使用大量的神经元拟合数据的特征,可以实现更强的检测性能。相较于早期方法和传统机器学习方法,基于深度学习的方法能自动提取代码特征,支持持续学习,因而逐渐成为恶意代码检测的主流。本文从五个方面对这一主题的研究现状进行回顾和分析:1)基于熵信息的方法;2)基于图的方法;3)基于计算机视觉的方法;4)基于自然语言处理的方法;5)基于多维度特征融合的方法。不同于以往的综述工作,本文一方面根据检测模型的特点进行分类,并对每个类别的分析方法和架构进行总结;另一方面通过比较自然图像和文本与恶意代码特征的异同,试图对恶意代码特征提取的未来改进有所启发。此外,考虑到生成与对抗技术具有双面性,且能够促进检测模型的改进与性能提升,本文对恶意代码检测中的攻防对抗技术进行回顾与分析。目前,恶意代码检测技术仍存在泛化性与鲁棒性弱、数据集不平衡和概念漂移现象严重等问题,这些将是未来基于深度学习的恶意代码检测技术研究的主要问题。本文有助于研究人员了解恶意代码检测方法的基本原理、技术方法、现阶段难题与挑战、未来发展方向,为现有方法的进一步研究和改进提供帮助。

关键词: 恶意代码检测计算机安全网络安全深度学习

DOI：10.19363/J.cnki.cn10-1380/tn.2025.05.07

投稿时间：2023-03-06修订日期：2023-09-01

基金项目:本课题得到国家重点研发计划(No. 2020YFB1805400)资助。

A Survey: Malware Detection Based on Deep Learning

YAN Pei,TAN Shunquan,HUANG Jiwu

Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, China;Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China

Abstract:

Malware is one of the greatest threats to computer and network security. Despite the availability of numerous detection methods and tools, under the circumstances of the expanding attack scope, rapid iteration of malware variants, and explosive growth of code samples, how to enhance the performance of malware detection methods remains a challenging and hot research topic in the field of network security. With the development of artificial intelligence technology, deep learning-based methods have gradually attracted the attention of researchers, utilizing a large number of neurons to fit data features and achieve stronger detection performance. Compared to early methods and traditional machine learning methods, deep learning-based methods can automatically extract data features and support continuous learning, thus gradually becoming mainstream in malware detection. This paper reviews and analyzes the recent work on this topic from 5 perspectives: 1) Entropy information-based methods; 2) Graph-based methods; 3) Computer vision-based methods; 4) Natural language processing-based methods; 5) Multi-dimensional feature fusion-based methods. Different from previous review works, on the one hand, this paper classifies detection models in the light of their characteristics, and summarizes the analysis methods and architectures of each category; on the other hand, it attempts to inspire future improvements in malware feature extraction by comparing the similarities and differences among natural image and text features with malware features. In addition, considering that generative and adversarial techniques have two sides, which can promote the detection effects and performance improvements, this paper reviews and analyzes the offensive and defensive adversarial techniques in malware detection. Currently, malware detection technology still faces issues such as weak generalization and robustness, imbalanced datasets, and severe concept drift. These will be the main problems for future research on deep learning-based malware detection technology. The comprehensive review of deep learning-based malware detection methods helps researchers understand the basic principles, technical methods, current challenges, and future development directions of malware detection methods, which is conducive to further research and innovation of existing methods.

Key words: malware detection computer security network security deep learning