二进制比对技术:场景、方法与挑战

胡梦莹; 王笑克; 赵磊

引用本文：

胡梦莹,王笑克,赵磊.二进制比对技术:场景、方法与挑战[J].信息安全学报,2025,10(2):48-66 [点击复制]
HU Mengying,WANG Xiaoke,ZHAO Lei.Binary Comparison Techniques: Applications, Approaches, and Challenges[J].Journal of Cyber Security,2025,10(2):48-66 [点击复制]

本文已被：浏览 745次下载 290次	码上扫一扫！
二进制比对技术:场景、方法与挑战
胡梦莹^1,2, 王笑克^1,2, 赵磊^1,2
0 字体:加大+\|默认\|缩小-
(1.武汉大学国家网络安全学院武汉中国 430072;2.武汉大学空天信息安全与可信计算教育部重点实验室武汉中国 430072)

摘要:

二进制比对技术通过比较两段二进制代码片段的特征来识别它们之间的相似度和差异性,其在安全领域应用广泛,包括漏洞搜索、补丁分析、恶意软件检测等,在各个应用场景下也伴随着不同的技术挑战。尽管已有研究对二进制比对技术进行了调研分类,然而现有研究无法准确描述二进制比对技术的特点、不同挑战对二进制代码特征的具体影响以及二进制比对技术的比较基准。为弥补上述缺失,对二进制比对工作进行了大规模的调研,发现目前以应用场景对二进制比对技术进行分类的方式不足以精确描述二进制比对技术的特点,并且大部分工作没有明确其应用场景,因此提出了二进制比对的通用描述模型,该模型由二进制比对的比较对象、预期目标、技术挑战和方法特征4个维度构成,通过该模型可以更精确描述二进制比对技术。进而,论述了各技术挑战对二进制代码特征的影响,具体包括编译配置、语义修改以及代码混淆对二进制代码的句法特征、结构特征和语义特征的影响。与此同时,提出了一种二进制比对技术的比较基准并通过实验进行了验证,实验结果表明,在选择比较基准时,应考虑不同方法的比较对象、预期目标、解决的挑战是否一致。当比较对象、预期目标、解决的挑战不一致时,对它们之间的对比没有意义;当比较对象、预期目标、解决的挑战一致时,对它们之间的对比更有意义。最后,结合研究发现给出了下一步的建议研究方向。

关键词: 二进制比对软件安全比较实验

DOI：10.19363/J.cnki.cn10-1380/tn.2025.03.04

投稿时间：2023-05-14修订日期：2023-08-07

基金项目:本课题得到国家自然科学基金(No. 62172305)资助。

Binary Comparison Techniques: Applications, Approaches, and Challenges

HU Mengying^1,2, WANG Xiaoke^1,2, ZHAO Lei^1,2

(1.School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China;2.Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education, Wuhan University, Wuhan 430072, China)

Abstract:

Binary comparison technology identifies the similarities and differences between two binary code fragments by comparing their features. It is widely used in the field of security, including bug search, patch analysis and malware detection, and it has different technical challenges in various application scenarios. Although studies have been conducted to investigate and classify binary comparison techniques, they are unable to accurately describe the characteristics of binary comparison techniques, the specific impact of different challenges on binary code features, and the benchmark of binary comparison techniques. In order to make up for the above shortcomings, a large-scale investigation was conducted on binary comparison work. It was found that the current method of classifying binary comparison technology based on application scenarios is not sufficient to accurately describe the characteristics of binary comparison technology, and most of the work has not clearly declared its application scenarios. Therefore, a generic descriptive model for binary comparison technology was proposed, which consists of the comparison object, expected target, technical challenges and the characteristics of binary comparison technology. This model can more accurately describe binary comparison technology. Furthermore, the impact of various technical challenges on the characteristics of binary code was discussed, including the impact of compilation configuration, semantic modification, and code confusion on the syntactic, structural, and semantic features of binary code. At the same time, a benchmark for binary comparison technology was proposed and verified through experiments. The experimental results showed that when selecting a comparison benchmark, it is necessary to consider whether the comparison objects, expected goals, and challenges solved by different methods are consistent. When the comparison objects, expected goals, and challenges to be solved are inconsistent, the comparison between them is meaningless; When the comparison objects, expected goals, and challenges solved are consistent, the comparison between them is more meaningful. Finally, based on the research findings, suggestions for further research directions were proposed.

Key words: binary comparision software security comparative experiment