基于增强灰度共生矩阵的深度恶意代码可视化分类方法

王金伟; 陈正嘉; 谢雪; 罗向阳; 马宾

引用本文：

王金伟,陈正嘉,谢雪,罗向阳,马宾.基于增强灰度共生矩阵的深度恶意代码可视化分类方法[J].信息安全学报,2025,10(2):84-102 [点击复制]
WANG Jinwei,CHEN Zhengjia,XIE Xue,LUO Xiangyang,MA Bin.A Deep Learning Visualization Classification Method for Malicious Code Based on Enhanced Gray Level Co-occurrence Matrix[J].Journal of Cyber Security,2025,10(2):84-102 [点击复制]

本文已被：浏览 685次下载 288次	码上扫一扫！
基于增强灰度共生矩阵的深度恶意代码可视化分类方法
王金伟^1,2,3, 陈正嘉^1,2, 谢雪^4,5, 罗向阳⁶, 马宾⁷
0 字体:加大+\|默认\|缩小-
(1.南京信息工程大学数字取证教育部工程研究中心南京中国 210044;2.南京信息工程大学计算机学院南京中国 210044;3.数学工程与先进计算国家重点实验室中国 450001;4.中国科学技术大学网络空间安全学院合肥中国 230031;5.中国航天系统科学与工程研究院北京中国 100048;6.中国人民解放军战略支援部队信息工程大学郑州中国 450001;7.齐鲁工业大学网络空间安全学院济南中国 250353)

摘要:

随着恶意代码规模和种类的增加,传统恶意代码分析方法由于需要人工提取特征,变得耗时且易出错。同时,恶意代码制作者也在不断研究和使用新技术手段逃避这些传统方法,因此传统分析方法不再适用。近年来,恶意代码可视化方法因其能够在图像中显示恶意代码的核心特征而成为研究热点。然而,目前恶意代码可视化方法中存在一些问题。首先,部分算法的模型训练复杂度较高,导致了较长的训练时间和更高的计算成本。其次,一些算法仅关注恶意代码的二进制级别特征,可能无法捕捉到更高层次的特征信息。另外,现有的算法大多针对恶意代码家族分类任务设计,而这些算法在针对恶意代码类型分类方面的适用性较低。为了解决这些问题,本文提出了一种基于增强灰度共生矩阵的深度恶意代码可视化分类方法。该方法将常应用于机器学习的灰度共生矩阵与深度学习相结合,避免了手动特征提取的复杂度和难度。在预处理方面,本文首先利用Nataraj矢量化方法将恶意代码数据集转化为灰度图像,随后对其提取灰度共生矩阵并转化为灰度共生矩阵灰度图,接着采用像素值乘积以实现图像增强,有效减少图像中黑色像素点的个数,增加图像亮度。在模型设计方面,本文基于残差连接和密集连接的特性,构建了D-ResNet18网络模型用于灰度图分类任务,该模型能够充分利用每个层次的特征信息,有效提取恶意代码的核心特征。实验结果表明,本文提出的方法取得了优越的分类效果,具有准确率高、训练速度快等优点,且预处理操作简单,适用于大规模恶意代码样本的快速分类等即时性要求较高的场景。更重要的是,该方法在恶意代码家族分类和恶意代码类型分类两个任务上均表现出优越的性能,相较于之前的方法,准确率分别提高了0.22%和4.86%,同时训练一轮所需时间分别缩短了52.68%和86.11%,具有实际应用价值。

关键词: 深度学习数据可视化恶意代码检测和分类灰度共生矩阵

DOI：10.19363/J.cnki.cn10-1380/tn.2025.03.06

投稿时间：2023-04-28修订日期：2023-07-26

基金项目:本课题得到国家重点研发计划(No. 2021QY0700); 国家自然科学基金(No. 62072250, No. 62172435, No. U1804263, No. U20B2065, No.61872203, No. 71802110, No. 61802212); 中原科技创新领军人才项目(No. 214200510019); 江苏自然科学基金(No. BK20200750); 河南省网络空间态势感知重点实验室开放基金(No. HNTS2022002); 江苏省研究生研究与实践创新项目(No. KYCX200974); 广东省信息安全技术重点实验室开放项目(No. 2020B1212060078); 山东省计算机网络重点实验室开放课题基金(No. SDKLCN-2022-05)资助。

A Deep Learning Visualization Classification Method for Malicious Code Based on Enhanced Gray Level Co-occurrence Matrix

WANG Jinwei^1,2,3, CHEN Zhengjia^1,2, XIE Xue^4,5, LUO Xiangyang⁶, MA Bin⁷

(1.Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.Department of Computer, Nanjing University of Information Science and Technology, Nanjing 210044, China;3.State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China;4.University of Science and Technology of China, Hefei 230031, China;5.China Aerospace Academy of Systems Science and Engineering, Beijing 100048, China;6.PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China;7.School of Cyberspace Security, Qilu University of Technology, Jinan 250353, China)

Abstract:

With the increase in scale and variety of malicious code, traditional methods for analyzing malicious code have become time-consuming and error-prone because they require manual feature extraction. Additionally, malicious code authors are continuously researching and using new techniques to evade these traditional methods, rendering them ineffective. In recent years, visualizing malicious code has become a research hotspot because it can display the core features of malicious code in images. However, there are several issues in current malware visualization methods. Firstly, some algorithms have high complexity in model training, resulting in longer training time and higher computational costs. Secondly, some algorithms only focus on the binary-level features of malware, which may fail to capture higher-level feature information. Additionally, existing algorithms are mostly designed for malware family classification tasks, and their applicability in malware type classification is limited. To address these issues, this paper proposes a deep malicious code visualization classification method based on enhanced gray-level co-occurrence matrices. This method combines gray-level co-occurrence matrices commonly used in machine learning with deep learning, avoiding the complexity and difficulty of manual feature extraction. In terms of preprocessing, this paper first uses the Nataraj vectorization method to transform the malicious code dataset into grayscale images, then extract the gray-level co-occurrence matrices and convert them into gray-level co-occurrence matrix gray-level images. We then use pixel value multiplication to enhance the image, effectively reducing the number of black pixels in the image and increasing its brightness. In terms of model design, this paper construct a D-ResNet18 network model based on the characteristics of residual connections and dense connections for grayscale image classification tasks. This model can effectively extract the core features of malicious code by utilizing the feature information in each layer. Experimental results show that our method achieves superior classification performance, with advantages such as high accuracy and fast training speed. Moreover, the preprocessing operation is simple and suitable for fast classification of large-scale malicious code samples and other scenarios with high real-time requirements. More importantly, this method demonstrates superior performance in both malware family classification and malware type classification tasks. Compared to previous methods, it achieves an accuracy improvement of 0.22% and 4.86% in the two tasks, respectively. Furthermore, the training time per epoch is reduced by 52.68% and 86.11%, respectively. These results highlight its practical value.

Key words: deep learning data visualization malicious code detection and classification gray-level co-occurrence matrix