基于图神经网络的可解释安卓恶意软件检测方法

黄庆佳; 李亚凯; 贾晓启; 周启航; 付玉霞; 周梦婷; 谢静; 杜海超

引用本文：

黄庆佳,李亚凯,贾晓启,周启航,付玉霞,周梦婷,谢静,杜海超.基于图神经网络的可解释安卓恶意软件检测方法[J].信息安全学报,已采用 [点击复制]
huangqingjia,liyakai,jiaxiaoqi,zhouqihang,fuyuxia,zhoumengting,xiejing,duhaichao.Interpretable Graph Neural Network-based Android Malware Detection Method[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 22818次下载 15683次
基于图神经网络的可解释安卓恶意软件检测方法
黄庆佳, 李亚凯, 贾晓启, 周启航, 付玉霞, 周梦婷, 谢静, 杜海超
0 字体:加大+\|默认\|缩小-
(中国科学院信息工程研究所)

摘要:

随着智能手机的广泛普及，近年来安卓恶意软件的数量迅速增加对智能手机用户的安全构成了严重威胁。学术界和工业界已经开始广泛采用基于深度学习的方法来自动化检测恶意软件，其中，使用图神经网络（Graph Neural Network， GNN）对函数调用图（Function Call Graph， FCG）特征进行检测的方法已经表现出优秀的准确性和鲁棒性。然而，由于现有的基于GNN的检测方法缺乏可解释性，使得对于检测结果的理解和分析变得困难，限制了其在实际中的应用范围。近年来，涌现出了众多GNN可解释性方法。但是，这些可解释性方法往往只关注了解释结果的准确度，而忽略了解释结果的保真度，从而导致在解释FCG图时解释结果准确率不佳。为了解决这一问题，本文提出了一种基于图神经网络的可解释安卓恶意软件检测方法（Interpretable Graph neural network-based Android Malware Detection， IGAMD）。该方法首先通过反编译安卓APK获得FCG，并进一步分析获取属性函数调用图（Attribute FCG，AFCG）特征。随后，IGAMD将AFCG输入到GNN分类模型和GNN解释模型，以得出分类结果和解释结果。与其他GNN可解释性方法不同，GNN解释模型同时关注了解释结果的准确度和保真度，从而达到了更优的性能表现。GNN解释模型能够识别出安卓恶意软件FCG中对分类贡献最大的子图，并提供节点重要性得分供进一步分析。实验结果显示，相较于现有研究中表现最佳的三种GNN可解释性方法，IGAMD的解释结果具有更高的准确度和保真度，能够准确揭示恶意软件的行为模式。同时，IGAMD在恶意软件检测任务中表现出了优异的性能，达到了96.23%的识别准确率。

关键词: 安卓恶意软件检测深度学习可解释性

DOI：10.19363/J.cnki.cn10-1380/tn.2024.08.17

投稿时间：2023-02-15修订日期：2023-03-28

基金项目:中国科学院网络测评技术重点实验室资助项目;网络安全防护技术北京市重点实验室资助项目; 国家重点研发计划项目（课题编号：2019YFB1005201）;国家重点研发计划项目（课题编号：2021YFB2910109）; 中国科学院战略性先导科技专项（C类）（课题编号：XDC02010900）;国家自然科学基金面上项目（项目批准号：61772078）

Interpretable Graph Neural Network-based Android Malware Detection Method

huangqingjia, liyakai, jiaxiaoqi, zhouqihang, fuyuxia, zhoumengting, xiejing, duhaichao

(institute of information engineering, cas)

Abstract:

In recent years, with the widespread use of smartphones, the number of Android malicious software has increased rapidly, which becoming a serious threat to smartphone users. The academic and industrial communities have begun to adopt deep learning-based methods to automate the detection of malicious software. Among them, the method of using Graph Neural Networks (GNN) to detect features of Function Call Graphs (FCG) has shown excellent accuracy and robustness. However, existing graph neural network-based detection methods lack interpretability, making it difficult to understand and analyze the detection results, which hinders their practical application. In existing research, some graph neural net-work interpretation models have been proposed. However, these interpretation models often only focus on the accuracy of the interpretation results, while ignoring the fidelity of the interpretation results, resulting in poor accuracy when inter-preting FCG graphs. To address this problem, this paper proposes an interpretable graph neural network-based Android malware detection method (IGAMD). The proposed method first decompiles the Android APK to obtain the FCG, and further analyzes to obtain the Attribute Function Call Graph (AFCG) features. Then, IGAMD inputs the AFCG into both the GNN classification model and the GNN interpretation model to obtain the classification and interpretation results. Un-like other GNN interpretability methods, the GNN interpretation model of IGAMD simultaneously considers the accuracy and fidelity of the interpretation results, achieving better performance. The GNN interpretation model can identify the subgraphs in the Android malware FCG that contribute the most to the classification and provide node importance scores for further analysis. Experimental results show that compared to the three state-of-the-art GNN interpretability methods in existing research, IGAMD"s interpretation results have higher accuracy and fidelity, and can accurately reveal the behav-ior patterns of android malicious software. At the same time, IGAMD shows excellent performance in malware detection tasks, achieving a recognition accuracy of 96.23%.

Key words: android malware detection deep learning interpretability