摘要: |
Office办公文档如Word、Excel、PowerPoint文档等已成为政府组织及企业日常办公中不可或缺的一部分,它们为社会带来便利的同时也带来了严重的网络安全问题,恶意Office文档已经被广泛应用于网络钓鱼攻击甚至APT攻击中。近年来,随着深度学习技术在恶意软件检测、入侵检测等领域的应用,研究人员开始将深度学习技术应用到恶意Office文档检测中,但均存在样本数据集小、检测效果不佳的问题,且缺乏与传统机器学习检测方法的有效比较。针对上述问题,本文以探索深度学习技术在恶意Office文档检测中的应用方式及优缺点为目标,基于文档中存在“敏感数据区”的思路,分别提出了基于“敏感数据区”的恶意DOC文档深度学习检测方法和恶意XLS文档深度学习检测方法,并在大量恶意文档和良性文档构成的数据集上进行了实验。实验表明,本文方法能够显著地提高模型的检测效果,优于当前基于机器学习的检测方法,能够检测各种类型的恶意DOC、XLS文档,如恶意宏文档、漏洞利用文档及其他类型恶意文档。本文还深入分析了深度学习检测方法的优势与不足,为深度学习技术在恶意文档检测上的拓展深化指明了方向。 |
关键词: 恶意文档 Office文档 检测 深度学习 敏感数据区 |
DOI: |
投稿时间:2024-09-18修订日期:2025-03-06 |
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目) |
|
Detection of Malicious DOC/XLS Documents Based on Convolutional Neural Networks |
chenxiang, zhangzhen, wangwenbo
|
(Information Engineering University) |
Abstract: |
Office documents such as Word, Excel, PowerPoint documents have become an indispensable part of daily office work for government organizations and enterprises. While they bring convenience to society, they also pose serious cybersecurity issues. Malicious Office documents have been widely used in phishing attacks and even APT attacks. In recent years, with the application of deep learning technology in the fields of malware detection and intrusion detection, researchers have begun to apply deep learning technology to malicious Office document detection. However, there are problems of small sample data sets and poor detection performance, and there is a lack of effective comparison with traditional machine learning detection methods. In response to the above issues, this article aims to explore the application methods, advantages, and disadvantages of deep learning technology in malicious Office document detection. Based on the idea of the existence of "sensitive data areas" in documents, we propose a deep learning detection method for malicious DOC documents and a deep learning detection method for malicious XLS documents based on "sensitive data areas", and conduct experiments on a large dataset consisting of malicious and benign documents. Experiments show that the proposed method can significantly improve the detection performance of the model, outperforming current machine learning-based detection methods, and can detect various types of malicious DOC and XLS documents, such as malicious macro documents, exploit documents, and other types of malicious documents. This article also deeply analyzes the advantages and disadvantages of deep learning detection methods, and points out the direction for the expansion and deepening of deep learning technology in malicious document detection. |
Key words: malicious documents Office documents detection deep learning sensitive data areas |