(北京建筑大学 电气与信息工程学院 北京 中国 100044;国家计算机网络应急技术处理协调中心 北京 中国 100029)
关键词:  恶意软件分类  恶意软件可视化  特征工程  x-plot  数据集  数据增强
A Survey on Image Visualization Approaches-based Malware Classification Techniques
QIAN Liping,WANG Dawei
College of Electrical & Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China;National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
As matters stand, malware makers have been developing malwares at the scale of an automated and family-run manner and generally utilizing code encryption and obfuscation techniques to combat the malware detection systems. The trend is profoundly mixed: the number of malwares increases rapidly and the potentiality of mining implicit features of malware variants with deep learning. The mainstream approach for malware detection or classification has shifted from artificial feature matching to deep mining. Performance of malware classification models often depends on the quality and efficiency of the artificial feature engineering. Mapping malware into image not only tends to alleviate the lack of professional knowledge in artificial feature engineering, but naturally borrows the achievements in the field of image processing. Image visualization approaches-based malware classification techniques has become an attractive research direction. The survey summarizes the research progress in image visualization approaches-based malicious executable classification techniques and focuses on the methods for generating image from executable file. We systematically summarize the methods for generating the image from a malware binary file, including image size setting, gray or color channels choice, pixel coordinates projection and pixel value computation, and gives comparative analysis on techniques of feature representation and extraction for visualized malware. Results from the listed literatures shows that the above factors all have an impact on the performance of malware classification. We also conclude its advantages and main difficulties and challenges encountered, which includes advantages of relieving strong reliance on expert knowledge, more applicable for detecting malware variants and drawing on series of research achievements in the field of image processing, and difficulties and challenges in locating malicious payloads, model applicability, interpretability and lack of high-quality labeled data. We then present several interesting directions for future research, such as cognitive DL models, malware knowledge graph, adversarial malware data augmentation, evaluation benchmark and high-quality dataset.
Key words:  malware classification  malware visualization  feature engineering  x-plot  dataset  data augmentation