基于全局行为特征的未知恶意文档检测

陈祥; 伊鹏; 白冰; 韩伟涛

【打印本页】【下载PDF全文】【View/Add Comment】【Download reader】【 Close 】

本文已被：浏览 7337次下载 5243次	码上扫一扫！
基于全局行为特征的未知恶意文档检测
陈祥,伊鹏,白冰,韩伟涛
分享到：微信更多字体:加大+\|默认\|缩小-
(战略支援部队信息工程大学信息技术研究所郑州中国 450002;之江实验室杭州中国 311121)

摘要:

相比于基于宏的恶意办公文档,基于漏洞利用的恶意办公文档在攻击过程中往往不需要目标交互,能在目标无感的情况下完成攻击,已经成为APT攻击的重要手段,因此检测基于漏洞利用特别是未知漏洞利用的恶意文档对于发现APT攻击具有重要作用。当前的恶意文档检测方法主要围绕PDF文档展开,分为静态检测和动态检测两类,静态检测方法容易被攻击者规避,且无法发现基于远程载荷触发的漏洞利用,动态检测方法仅考虑PDF中JavaScript脚本或文档阅读器进程的行为特征,忽视了针对系统其他进程程序的间接攻击,存在检测盲区。针对上述问题,本文分析了恶意办公文档的攻击面,提出恶意文档威胁模型,并进一步实现一种基于全局行为特征的未知恶意文档检测方法,在文档处理过程中提取全系统行为特征,仅训练良性文档样本形成行为特征库用于恶意文档检测,并引入敏感行为特征用于降低检测误报率。本文在包含DOCX、RTF、DOC三种类型共计522个良性文档上进行训练获取行为特征库,然后在2088个良性文档样本和211个恶意文档样本上进行了测试,其中10个恶意样本为手动构造用于模拟几种典型的攻击场景。实验结果表明该方法在极低误报率(0.14%)的情况下能够检测出所有的恶意样本,具备检测利用未知漏洞的恶意文档的能力,进一步实验表明该方法也能够用于检测针对WPS Office软件进行漏洞利用的恶意文档。

关键词: 恶意文档检测|行为特征|威胁模型|漏洞利用|未知威胁

DOI：10.19363/J.cnki.cn10-1380/tn.2023.09.07

Received:January 11, 2022Revised:April 22, 2022

基金项目:本课题得到国家自然科学基金(No. 62176264)资助。

Unknown Malicious Document Detection Based on Global Behavior Feature

CHEN Xiang,YI Peng,BAI Bing,HAN Weitao

Institute of Information Technology, PLA Strategic Force Information Engineering University, Zhengzhou 450002, China;ZheJiang Lab, Hangzhou 311121, China

Abstract:

Compared with malicious office documents based on macros, malicious office documents based on vulnerability exploitation often do not need target interaction in the attack process, and can complete the attack without target perception. It has become an important means of Advanced Persistent Threat (APT) attack. Therefore, detecting malicious documents based on vulnerability exploitation, especially unknown vulnerability exploitation, plays an important role in discovering APT attacks. The current malicious document detection methods mainly focus on PDF documents. It is mainly divided into two categories: static analysis and dynamic analysis. Static analysis is easy to be evaded by hackers, and can not discovery exploits triggered by remote payload. Dynamic analysis only considers the behaviors of the JavaScript in PDF or document reader’s process, ignoring the indirect attacks against other processes of the system, leads to a detection blind spot. To solve the above problems, we analyze the attack surface of malicious Office documents, come up with a threat model and implement an unknown malicious document detection method based on global behavior feature. In the process of document processing, the whole system behavior features are extracted, and only benign document samples are trained to form a behavior feature database for malicious document detection. In order to reduce false alarm rate, we introduce sensitive behavioral feature in detection. In this paper, 522 benign documents including DOCX, RTF and DOC are trained to obtain the behavior feature database, and then 2088 benign document samples and 211 malicious document samples are tested. Of these, 10 malicious samples are manually crafted to simulate several typical attack scenarios. The experimental results show that this method can detect all malicious samples with a very low false positive rate (0.14%) and is able to detect malicious documents that exploit unknown vulnerabilities. Further experiments show that this method can also be used to detect malicious documents exploiting WPS office software.

Key words: malicious document detection|behavior feature|threat model|vulnerability exploitation|unknown threat