面向海量软件的未知恶意代码检测方法

陈恺; 王鹏; Yeonjoon Lee; 王晓峰; 张楠; 黄鹤清; 邹维; 刘鹏

引用本文：

陈恺,王鹏,Yeonjoon Lee,王晓峰,张楠,黄鹤清,邹维,刘鹏.面向海量软件的未知恶意代码检测方法[J].信息安全学报,2016,1(1):24-38 [点击复制]
CHEN Kai,WANG Peng,Yeonjoon Lee,WANG Xiaofeng,ZHANG Nan,HUANG Heqing,ZOU Wei,LIU Peng.Scalable Detection of Unknown Malware from Millions of Apps[J].Journal of Cyber Security,2016,1(1):24-38 [点击复制]

本文已被：浏览 12351次下载 9602次	码上扫一扫！
面向海量软件的未知恶意代码检测方法
陈恺^1,2, 王鹏², Yeonjoon Lee², 王晓峰², 张楠², 黄鹤清³, 邹维¹, 刘鹏³
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所信息安全国家重点实验室北京中国 100093;2.美国印第安纳大学伯明顿分校;3.美国宾夕法尼亚州立大学)

摘要:

软件应用市场级别的安全审查需要同时具备准确性和可扩展性。然而，当前的审查机制效率通常较低，难以应对新的威胁。我们通过研究发现，恶意软件作者通过对几个合法应用重打包，将同一段恶意代码放在不同的应用中进行传播。这样，恶意代码通常出现在几个同源应用中多出的代码部分和非同源应用中相同的代码部分。基于上述发现，我们开发出一套大规模的软件应用检测系统——MassVet。它无需知道恶意代码的代码特征或行为特征就可以快速的检测恶意代码。现有的检测机制通常会利用一些复杂的程序分析，而本文方法仅需要通过对比上传的软件应用与市场上存在的应用，尤其关注具有相同视图结构的应用中不同的代码，以及互不相关的应用中相同的部分。当移除公共库和一些合法的重用代码片段后，这些相同或不同的代码部分就变得高度可疑。我们把应用的视图结构或函数的控制流图映射为一个值，并基于此进行DiffCom分析。我们设计了基于流水线的分析引擎，并对来自33个应用市场共计120万个软件应用进行了大规模分析。实验证明我们的方法可以在10秒内检测一个应用，并且误报率很低。另外，在检测覆盖率上，MassVet超过了VirusTotal中的54个扫描器（包括NOD32、Symantec和McAfee等），扫描出近10万个恶意软件，其中超过20个为零日（zero-day）恶意软件，下载次数超过百万。另外，这些应用也揭示了很多有趣的现象，例如谷歌的审查策略和恶意软件作者躲避检测策略之间的不断对抗，导致Google Play中一些被下架的应用会重新出现等。

关键词: 恶意代码 MassVet 重打包视图结构

DOI：

投稿时间：2015-10-25修订日期：2015-12-01

基金项目:本课题得到国家自然科学基金（No.U1536106，61100226）资助。

Scalable Detection of Unknown Malware from Millions of Apps

CHEN Kai^1,2, WANG Peng², Yeonjoon Lee², WANG Xiaofeng², ZHANG Nan², HUANG Heqing³, ZOU Wei¹, LIU Peng³

(1.State Key Laboratory of Information Security, Institute of Information Engineering, CAS, Beijing 100093, China;2.Indiana University, Bloomington, USA;3.The Pennsylvania State University, USA)

Abstract:

An app market's vetting process is expected to be scalable and effective. However, today's vetting mechanisms are slow and less capable of catching new threats. Based upon a key observation that Android malware is constructed and disseminated typically through repackaging legitimate apps with similar malicious components, we developed a new technique, called MassVet, for vetting apps at a massive scale, without knowing what malware looks like and how it behaves. Unlike existing detection mechanisms, which often utilize heavyweight program analysis techniques, our approach simply compares a submitted app with all those already on a market, focusing on the difference between those sharing a similar UI structure (indicating a possible repackaging relation), and the commonality among those seemingly unrelated. Once public libraries and other legitimate code reuse are removed, such diff/common program components become highly suspicious. We implemented MassVet over a stream processing engine and evaluated it over 1.2 million apps from 33 app markets around the world, the scale of Google Play. Our study shows that the technique can vet an app within 10 seconds at a low false detection rate. Also, it outperformed all 54 scanners in VirusTotal (NOD32, Symantec, McAfee, etc.) in terms of detection coverage, capturing over a hundred thousand malicious apps, including over 20 likely zero-day malware and those installed millions of times. A close look at these apps brings to light intriguing new observations:e.g., Google's detection strategy and malware authors' countermoves that cause the mysterious disappearance and reappearance of some Google Play apps.

Key words: malice MassVet repackage view