摘要: |
后量子密码的发展已经引起各界的广泛关注,硬件实现效率是后量子密码最终标准的重要衡量指标之一。其中基于模误差学习问题(Module Learning With Errors,MLWE)的CRYSTALS-Kyber格密码是NIST第三轮后量子密码标准中最有希望的一种加密方案,可变的公钥矩阵维度参数k将基于MLWE的公钥加密方案的安全性扩展到不同级别,相较于其他格密码方案更具灵活性和安全性。本文首先分析了基于NIST第三轮最新参数q=3329的MLWE的格密码公钥加密方案的算法理论,并针对其中的核心模块—多项式乘法模块提出了两种不同的硬件实现方式。两种多项式乘法硬件实现方式都是采用基于频率抽取的数论变换(Number Theoretic Transform,NTT)算法,使用NTT算法实现多项式乘法降低了传统算法实现的线性复杂度,在硬件结构上能够面对不同应用场景进行优化,因此本文针对NTT算法中循环计算的核心模块提出了两种不同的优化硬件结构。一是面积和执行时间折中的迭代型NTT硬件结构,二是高性能低时延的多路延时转接(Multi-path Delay Commutator)的流水型NTT硬件结构;并且针对于面积时间均衡的迭代型NTT模块设计了一种整体MLWE硬件实现结构。与已有的先进设计相比,本文的流水型NTT结构具备更好的速度性能,在速度上相较于之前的设计分别提升11.64%和59.43%。而对于使用迭代型NTT的MLWE整体实现方案,本文的设计使用了最少的周期和最小的面积时间乘积(Area-Time-Product,ATP),其效率比最新发表的工作的硬件效率实现高2倍左右。 |
关键词: 后量子密码 格密码 数论变换算法 模误差学习问题 |
DOI:10.19363/J.cnki.cn10-1380/tn.2021.11.04 |
Received:September 01, 2021Revised:October 15, 2021 |
基金项目:本课题得到国家重点自然科学基金项目(No.62134002)和青年自然科学基金项目(No.62104107)资助。 |
|
Efficient Hardware Implementation of MLWE Lattice Based Cryptography |
CUI Yijun,YAO Kan,NI Ziying,WANG Chenghua,LIU Weiqiang |
College of Electronics and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210000, China |
Abstract: |
The development of post-quantum cryptography has attracted widespread attention from all walks of life, and its hardware implementation efficiency is also one of the important metrics for the final standard of post-quantum cryptography. Among them, the MLWE-based CRYSTALS-Kyber lattice cipher is the most promising encryption scheme in the Round 3 of NIST post-quantum cipher standards. The variable public key matrix dimension parameter k extends the security of the MLWE-based public key encryption scheme to different levels. Compared with other lattice password schemes, it is more flexible and safer. This paper first analyzes the primes q=3329 in Round 3 of NIST process based on the algorithm theory of the MLWE-based lattice cipher public key encryption scheme and proposes two different implementation methods for the core module—polynomial multiplication module. The two implementation methods are number theoretic transform (NTT) algorithm based on frequency extraction. The use of NTT algorithm to achieve polynomial multiplication reduces the linear complexity, and can be optimized for different application scenarios in the hardware structure. Therefore, this paper proposes two different optimized hardware structures for the core module of cyclic calculation in the NTT algorithm. One is the iterative NTT hardware structure with the compromise of area and execution time, and the other is the pipelined NTT hardware structure with high-performance and low latency multi-path delay commutator (Multi-path Delay Commutator) structure; and designed an overall area-time balanced MLWE hardware implementation structure for the iterative NTT module. Compared with the state-of-the-art designs, the pipelined NTT structure in this paper shows better speed performance, although it has slightly more resources than the previous structure and is 11.64 faster than the previous design. Compared with the previous design, the speed is increased by 11.64% and 59.43%. As for the overall implementation of MLWE using iterative NTT, the design in this paper uses the least cycle and the smallest ATP, and its efficiency is about 2 times higher than the latest hardware implementation. |
Key words: post-quantum cryptography lattice-based cryptography number theoretic transform module learning with errors |