基于多智能体代理的 APP 自动取证

李啸林; 高见

引用本文：

李啸林,高见.基于多智能体代理的 APP 自动取证[J].信息安全学报,已采用 [点击复制]
li xiaolin,Gao Jian.Automated App Forensics Based on Multi-Agent[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 978次下载 122次
基于多智能体代理的 APP 自动取证
李啸林, 高见
0 字体:加大+\|默认\|缩小-
(中国人民公安大学)

摘要:

随着移动端设备的广泛使用，移动平台上的恶意应用程序也越发多样，人工对其进行电子数据取证已经变得不切实际。针对目前许多领域如网络犯罪侦查、司法鉴定等在APP取证方面专家短缺，取证工作难以开展的问题，许多研究者提出了静态模版匹配、深度学习与APP取证相结合的方法。然而，这些方法也面对着一些挑战，现有的APP取证方法通常只是人类电子数据取证分析师的辅助工具，缺乏自主取证能力。最近，大语言模型在许多领域如机器翻译，代码编写等领域展示出强大的能力，依赖于大语言模型的智能体代理技术也展示出强大的任务执行能力。因此本文提出了一种全新的方法，将多智能体代理协作与APP自动取证相结合，利用大语言模型优秀的文本理解和工具调用能力，提出了一种高效的、准确的且无需额外训练的APP自动取证方法。该方法首先将待取证的安卓应用安装包（APK）解包并还原程序代码，嵌入向量数据库，让大语言模型理解APP的代码信息，主动寻找取证所需代码。同时，采用动静态相结合的取证方式，大语言模型在理解静态信息的基础上自主进行动态分析。此外，该方法采用基于思维链的提示策略，进一步发挥出大语言模型的取证能力。最重要的是，因为电子证据的可解释性要求，本文设计了一个大模型双重反思机制，只需要少量的额外开销，就能取得可解释性的提高和大模型幻觉的减少。即使当应用程序使用的开发框架发生变化或更新时，本方法的可拓展性和模块化设计也能实现取证技术的及时更新。为了评估该方法的取证性能和弥补APP取证数据集的空白，本文基于近期公开电子数据取证比赛的检材、互联网公开恶意样本以及警方提供的真实犯罪APP构建了一个APP自动取证数据集。本方法在上述数据集上进行取证实验，验证了本方法的有效性，体现出本方法的自主决策优势和可解释性优势，实验结果表明本方法取证准确率达到84.5%，平均取证时间仅125.3秒；现有框架Quark-engine的准确率仅为34.5%，平均取证时间为261.0秒，本文提出的方法不仅在电子数据取证竞赛上有着良好的表现，还在警方提供的犯罪APP上准确地完成了常规电子证据的低成本自主提取。

关键词: 电子数据取证，大语言模型，检索增强生成，多智能体代理，大模型幻觉，网络犯罪

DOI：10.19363/J.cnki.cn10-1380/tn.2026.09.01

投稿时间：2025-02-23修订日期：2025-05-19

基金项目:

Automated App Forensics Based on Multi-Agent

li xiaolin, Gao Jian

(People’s Public Security University of China)

Abstract:

With the widespread use of mobile devices, malicious applications on mobile platforms have become increasingly diverse, making manual digital forensic no longer feasible. To address the shortage of experts in fields such as cybercrime investigation and judicial identification, where APP forensic work is challenging to conduct, many researchers have proposed methods combining static template matching, deep learning techniques, and APP forensics. However, these methods also face challenges, existing APP forensic approaches are typically merely auxiliary tools for human digital forensic analysts, lacking autonomous forensic capabilities. Recently, large language models (LLM) have demonstrated remarkable capabilities in various domains such as machine translation and code generation, and agent technologies relying on LLM have shown strong task execution abilities. Therefore, this paper proposes a novel approach integrating multi-agent collaboration with automatic APP forensics, leveraging the excellent text understanding and tool calling capabilities of LLM to present an efficient, accurate, and training-free method for automatic APP forensics. The method first unpacks and restores the program code of the Android application package (APK) under investigation, embedding it into a vector database to enable the LLM to comprehend the code information of the APP, actively searching for code required for forensic analysis. Simultaneously, a hybrid approach combining static and dynamic forensics is adopted, where the LLM autonomously performs dynamic analysis based on its understanding of static information. Additionally, the method employs a chain-of-thought prompting strategy to further enhance the forensic capabilities of the LLM. Most importantly, due to the requirement for explainability in electronic evidence, this paper designs a dual reflection mechanism for the large model, achieving improved explainability and reduced hallucinations with minimal additional overhead. Even when the development framework or updates of the application change, the method"s scalability and modular design ensure timely updates to forensic techniques. To evaluate the forensic performance of the method and address the lack of an APP forensic dataset, this paper constructs an automatic APP forensic dataset based on recently publicized digital evidence competition materials, open-source malicious samples from the internet, and real criminal APPs provided by the police. Experiments conducted on this dataset validate the effectiveness of the proposed method, highlighting its advantages in autonomous decision-making and explainability. Experimental results show that the method achieves a forensic accuracy of 84.5% with an average forensic time of only 125.3 seconds, while the existing framework Quark-engine attains an accuracy of merely 34.5% and an average forensic time of 261.0 seconds. The proposed method not only performs well in digital forensic competitions but also efficiently and cost-effectively extracts routine electronic evidence from criminal APPs provided by the police.

Key words: Digital Forensics, Large Language Model, Retrieval Augmented Generation, Multi-Agent Collaboration, Large Language Model Hallucination, Cybercrime