  • 曹颖,梁瑞刚,张润泽,徐丹丹.面向人类编程习惯的反编译代码控制结构恢复技术[J].信息安全学报,已采用    [点击复制]
  • Cao Ying,Liang Ruigang,Zhang Runze,xudandan@iie.ac.cn.Decompilation Using Control Structure Recovery Techniques Towards Human Habits[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 109次   下载 0  
曹颖, 梁瑞刚, 张润泽, 徐丹丹
关键词:  反编译  大语言模型  控制结构恢复  可读性
Decompilation Using Control Structure Recovery Techniques Towards Human Habits
Cao Ying, Liang Ruigang, Zhang Runze, xudandan@iie.ac.cn
(Institute of Information Engineering, Chinese Academy of Sciences)
Decompilers are commonly employed for software security analysis where source code is inaccessible, such as in binary formats for tasks like malware analysis, vulnerability mining, and verification. Given the intricate nature of these tasks, reverse engineers often require deep analysis of the binaries, but analyzing all assembly code one by one is time-consuming and inefficient. Decompilers aid reverse engineers in extracting the semantics of each function within binaries, enabling quick identification of critical functions or code segments, thereby significantly boosting the efficiency of code analysis in reverse engineering. However, despite substantial efforts to enhance the control structure readability of decompiled code, the readability of high-level control statements generated by current decompilers still markedly differs from human-written code, necessitating extensive manual analysis of control conditions and logic by reverse engineers. This paper leverages the capabilities of large language models in human-aligned code understanding and generation to propose LLMReStructor, a control structure optimization technique oriented towards human programming habits. Compared to traditional decompilers, LLMReStructor can restore control structures to statements that more closely align with human programming habits based on the code's specific function and usage scenario. Through comparative analysis with the source code, LLMReStructor's restored control structures closely resemble the corresponding source code. Additionally, surveys assessing the readability of decompiled code from different decompilers have shown that code optimized by LLMReStructor is most favored by users. This novel approach underscores the integration of advanced language modeling techniques with decompilation processes, marking a significant advancement in reverse engineering by bridging the gap between machine-generated and human-readable code.
Key words:  Decompilation, Large Language Model, Control structure recovery, Readability