摘要: |
针对目前恶意软件检测分类方法在特征提取、检测准确率等方面面临的挑战,提出一种基于API分组重构与图像表示的恶意软件检测分类方法。首先,对恶意软件调用的API类别统一编号,将API指令序列中相同编号的API聚合为同一API组,根据恶意软件运行时各类API的首次调用顺序对API组重排序,将各API组的条目数记录为该类API对软件样本的贡献度。经分组重构后,各API组按序组织,其顺序为软件样本调用各类API的顺序。各API组内部有序,其内部各API的排列顺序即为软件样本对单个API的调用顺序。有序化的API分组有助于API指令序列信息的图像化表达。基于重组的API指令序列提取API编号作为全局特征列表、API贡献度作为局部特征列表、API顺序索引作为时序特征列表,对特征列表进行标准化与零填充,转化为统一尺寸的特征数组。其中,API编号能清晰地标识API类别,API贡献度可以表征该API的调用频繁程度,API顺序索引可区分各API被调用的顺序。然后,分别用3类特征数组填充RGB图像的3个通道,生成3通道的API编号贡献度及顺序索引特征图像(Feature image of API code devotion and sequential index,FimgCDS)。最后,将FimgCDS特征图像输入自主构建的轻量型恶意软件特征图像卷积神经网络(malware feature image convolutional neural network,MficNN)分类器,实现对恶意软件的检测与分类。实验结果表明,本文方法在两类数据集上的检测分类准确率分别为98.66%和98.35%,具有较高的恶意软件检测分类性能指标和检测分类速度。 |
关键词: 恶意软件 分类 API 特征提取 图像表示 RGB图像 卷积神经网络 |
DOI:10.19363/J.cnki.cn10-1380/tn.2024.09.05 |
投稿时间:2022-08-30修订日期:2022-12-05 |
基金项目:本课题得到国家自然科学基金资助项目(No.U1833107)资助。 |
|
Malware Detection and Classification Based on API Block Reconstruction and Image Representation |
YANG Hongyu,ZHANG Yupei,ZHANG Liang,CHENG Xiang |
School of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China;School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China;School of Information, University of Arizona, Tucson AZ85721, USA;School of Information Engineering, Yangzhou University, Yangzhou 225127, China;Jiangsu Engineering Research Center for Knowledge Management and Intelligent Service, Yangzhou 225127, China |
Abstract: |
To address the challenges faced by current malware detection and classification methods in terms of feature extraction and detection accuracy, a malware detection and classification method based on API block reconstruction and image representation was proposed. First, the API categories invoked by malware during the malware runtime were numbered uniformly and aggregate the APIs with the same code into the same API block, and the API blocks were reordered according to the invocation order of each API, the number of entries in each API block was recorded as the devotion of such API. After reconstruction, each API block is organized in order, and its order is the order in which each type of API is called by the software sample. The order within each API block is the order in which the software sample calls the individual APIs. The ordered API block sequence helps to represent the API instruction sequence information pictorially. The API codes were extracted as the global feature list, the API devotion as the local feature list, and the API sequential indexes as the temporal feature list, and the feature lists were normalized and zero-padded to transform into feature arrays. The API code clearly identifies the API category, the API devotion characterizes how frequently the API is called, and the API sequential index distinguishes the order in which each API is called. Then, the 3 channels of the RGB image were filled with the 3 types of feature arrays to generate the feature image of API code devotion and sequential index (FimgCDS). Finally, the FimgCDS feature image was fed into a self-built lightweight malware feature image convolutional neural network (MficNN) classifier for malware detection and classification. The experimental results show that the detection and classification accuracies of the method are 98.66% and 98.35% on the two datasets, and the method has high detection and classification performance indicators and speed for malware. |
Key words: malware classification application programming interface feature extraction image representation RGB image convolutional neural network |