【打印本页】      【下载PDF全文】   View/Add Comment  Download reader   Close
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1937次   下载 1124 本文二维码信息
码上扫一扫!
面向C++商业软件二进制代码中的类信息恢复技术
杨晋,龚晓锐,吴炜,张伯伦
分享到: 微信 更多
(中国科学院信息工程研究所 北京 中国 100093;中国科学院大学 网络空间安全学院 北京 中国 100049)
摘要:
采用 C++编写的软件一直是二进制逆向分析中的高难度挑战, 二进制代码中不再保留 C++中的类及其继承信息, 尤其是正式发布的软件缺省开启编译优化, 导致残留的信息也被大幅削减, 使得商业软件(Commercial-Off-The-Shelf, COTS)的 C++二进制逆向分析尤其困难。当前已有的研究工作一是没有充分考虑编译优化, 导致编译优化后类及其继承关系的识别率很低, 难以识别虚继承等复杂的类间关系; 二是识别算法执行效率低, 无法满足大型软件的逆向分析。
本文围绕编译优化下的 C++二进制代码中类及其继承关系的识别技术开展研究, 在三个方面做出了改进。第一, 利用过程间静态污点分析从 C++二进制文件中提取对象的内存布局, 有效抵抗编译优化的影响(构造函数内联); 第二, 引入了四种启发式方法, 可从编译优化后的 C++二进制文件中恢复丢失的信息; 第三, 研发了一种自适应 CFG(控制流图)生成算法, 在极小损失的情况下大幅度提高分析的效率。在此基础上实现了一个原型系统 RECLASSIFY, 它可以从 C++二进制代码中有效识别多态类和类继承关系(包括虚继承)。
实验表明, 在 MSVC ABI 和 Itanium ABI 下, RECLASSIFY 均能在较短时间内从优化后二进制文件中识别出大多数多态类、恢复类关系。在由 15 个真实软件中的 C++二进制文件组成的数据集中(O2 编译优化), RECLASSIFY 在 MSVC ABI 下恢复多态类的平均召回率为 84.36%, 而之前最先进的解决方案 OOAnalyzer 恢复多态类的平均召回率仅为 33.76%。除此之外, 与OOAnalyzer 相比, RECLASSIFY 的分析效率提高了三个数量级。
关键词:  二进制分析  类继承关系恢复  静态污点分析  自适应CFG生成算法
DOI:10.19363/J.cnki.cn10-1380/tn.2022.12.04
Received:June 05, 2020Revised:June 12, 2020
基金项目:本课题得到北京市科技计划网络空间攻防特殊技能人才培养及支撑平台建设课题(No. Z181100002718002)资助。
Class Information Recovery Technology for COTS C++ Binary
YANG Jin,GONG Xiaorui,WU Wei,ZHANG Bolun
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:
Software written in C++ has always been a difficult challenge in binary reverse analysis. Binary code no longer retains the classes and their information in C++, especially Commercial-Off-The-Shelf (COTS) enables compiler optimization by default, resulting in significant reduction of residual information. It makes COTS C++ binary reverse analysis particularly difficult. At present, the existing research work does not fully consider compilation optimization, resulting in a low recognition rate on recovering classes and class relationships under compiler optimization, and it is difficult to identify complex relationships such as virtual inheritance. Second, the recognition algorithm has low efficiency and cannot meet the reverse analysis of large-scale software.
This paper conducts research on the identification technology of classes and their inheritance in C++ binary under compiler optimization, and makes achievements in three aspects. First, using the inter-procedural static taint analysis to extract the object memory layout from the C++ binary, effectively resisting the impact of compiler optimization (inline constructors); second, introducing four heuristic methods, which can recover lost information in C++ binary files; third, an adaptive CFG (control flow graph) generation algorithm has been developed to greatly improve the efficiency with minimal loss. On this basis, a prototype system RECLASSIFY is implemented, which can effectively identify polymorphic classes and class relationships (including virtual inheritance) from C++ binary.
Experiments show that under both MSVC ABI and Itanium ABI, RECLASSIFY can identify most polymorphic class and recovery class relationships from the optimized binary in a short time. In a data set composed of 15 C++ binaries in real software (O2 compiler optimization), the average recall rate of RECLASSIFY recovering polymorphic classes under MSVC ABI is 84.36%, while the average recall rate of most advanced solution OOAnalyzer is only 33.76%. In addition, compared with OOAnalyzer, the analysis efficiency of RECLASSIFY is improved by three orders of magnitude.
Key words:  binary analysis  class inheritance recovery  static taint analysis  adaptive CFG generation algorithm