基于机器学习的密码算法识别与分析

夏锐琪; 李曼曼; 陈少真

引用本文：

夏锐琪,李曼曼,陈少真.基于机器学习的密码算法识别与分析[J].信息安全学报,2025,10(1):143-159 [点击复制]
XIA Ruiqi,LI Manman,CHEN Shaozhen.Identification and Analysis of Cryptography Algorithms based on Machine Learning[J].Journal of Cyber Security,2025,10(1):143-159 [点击复制]

本文已被：浏览 3644次下载 2678次	码上扫一扫！
基于机器学习的密码算法识别与分析
夏锐琪¹, 李曼曼², 陈少真²
0 字体:加大+\|默认\|缩小-
(1.信息工程大学网络空间安全学院郑州中国 450001;2.密码科学技术国家重点实验室北京中国 100093)

摘要:

基于人工智能的密码分析技术是目前信息安全领域高度关注的问题之一,利用机器学习的唯密文密码算法识别是其中不可或缺的关键。研究如何筛选高质量的密文特征指标提升识别模型的性能,以及改进非固定密钥条件下密文算法识别效果是当前研究工作的难点。建立性能优异的特征工程和机器学习模型是一种理想的方案,本文基于随机森林、Adaboosting、全连接神经网络等模型进行随机密钥条件下的密码算法识别实验,并对特征工程使用的指标建立基于信息熵和维度标准的筛选方法,针对诸多的特征指标进行优化研究,全面细致地对实验现象进行理论分析。本文首先对密文随机性指标(NIST SP 800-22)根据其定义按维度大小标准分类,依据分类结果计算各个指标的信息熵理论值,按照信息熵的大小关系对指标性能进行排序筛选,分析挑选出适合识别密码算法的高质量指标。由筛选结果选择9种代表性特征指标,对包括分组密码与公钥密码在内的7种密码算法,在随机密钥加密条件下,建立4种机器学习模型进行识别实验。对实验现象从特征指标和模型原理等角度展开理论分析,并结合理论和实验结果给出一类随机密钥下密码算法高效识别的结论。与先前的相关工作相比,本文实现了在随机密钥条件下对多种类型密码算法的高效唯密文识别,对各种算法的识别准确率提高了42%到55%,密文所需数据量相应地降低了约40%。实验与理论结果表明,利用几种高信息熵的多维指标作为特征数据,识别随机密钥条件下的密码算法具有较高的识别准确率。

关键词: 密码分析机器学习随机性指标信息熵密数据识别

DOI：10.19363/J.cnki.cn10-1380/tn.2025.01.11

投稿时间：2022-03-14修订日期：2022-04-12

基金项目:本课题得到数学工程与先进计算国家重点实验室开放基金课题(No.2019A08)、资源受限环境下密码算法组件评估关键技术研究(No.2019427)资助。

Identification and Analysis of Cryptography Algorithms based on Machine Learning

XIA Ruiqi¹, LI Manman², CHEN Shaozhen²

(1.Department of Cyberspace Security, Information Engineering University, Zhengzhou 450001, China;2.State Key Laboratory of Cryptography Science and Technology, Beijing 100093, China)

Abstract:

The cryptanalysis based on artificial intelligence is one of the most popular problems in information security. Cryptography identification using machine learning is the crucial part in the domain, which is the key step for cryptanalysis. Currently the main difficulties are that how to improve the properties of identification in the conditions of unfixed keys, and select the effective indices used for cryptography identification in order to enhance the performance of identification models. Constructing the effective feature program and establishing the suitable machine learning models are the reasonable schemes. In this work, the experiments used Random Forest, Adaboosting, fully connected neural network algorithm, etc as the models and the cipher algorithms are encrypted by the random keys. In spite of these, the theoretical analysis and selection of the features indices were performed based on the calculation of information entropy and dimensions, which improve the feature program. After that, the theoretical research about the experiments’ phenomena are proposed. First, we classify the indices (NIST SP 800-22) under the criterion of dimensions based on their definitions. Second, according to the classification, we calculated the information entropy of the indices and compared the information entropy of the indices. Select the effective indices of high entropy as the feature indices fitting for cryptography identification. Then the identification experiments were based on the 9 features, 7 algorithms including block ciphers and public keys cryptography using random keys, and 4 machine learning models. Subsequently, the theoretical analysis of the experiments’ results was put forward in the angles of feature indices and the principles of models. Finally, the conclusion of identifying cryptography algorithms effectively was proposed based on the theoretical analysis and experiments. The accuracy of our work is 42% to 55% higher than the previous work. Meanwhile, the ciphers data are smaller than the related work around 40%. The experiments and analysis showed that multidimensional features with high entropy can make the cryptography identification’s accuracy come to a higher level.

Key words: cryptanalysis machine learning randomness indices information entropy cipher identification