【打印本页】      【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 623次   下载 323 本文二维码信息
码上扫一扫!
基于GPU的椭圆曲线运算库及相关算法优化
高钰洋,张健宁,王刚,苏明,刘晓光
分享到: 微信 更多
(南开大学计算机学院 网络空间安全学院 天津 中国 300350;南开大学计算机学院 网络空间安全学院 天津 中国 300350;数据与智能系统安全教育部重点实验室 天津 中国 300350;南开大学计算机学院 网络空间安全学院 天津 中国 300350;贵州大学公共大数据国家重点实验室 贵阳 中国 550025;数据与智能系统安全教育部重点实验室 天津 中国 300350)
摘要:
在区块链场景下,往往需要引入数字签名、零知识证明等密码学算法以保护数据安全性与用户隐私。但由于这些算法依赖于大量的大数与椭圆曲线运算,包括范围证明在内的许多密码学算法已经成为了区块链系统的性能瓶颈。而密码学算法的GPU优化也在近几年获得了广泛的关注与研究。本文充分利用GPU作为众核处理器的优势,设计了基于GPU的椭圆曲线运算库。在运算库中,本文在GPU上实现并优化了常用的椭圆曲线运算与大数运算,同时针对不同的需求设计了不同的实现与接口。本文对寄存器与常量内存等存储空间进行了合理分配,并通过利用预计算等优化手段减少了计算量,从而最大化了运算库的吞吐与性能。为了验证运算库的实用性与有效性,本文利用该运算库实现了代理重加密与Bulletproofs范围证明的验证算法,同时充分利用了算法的内部并行性进行优化。实验表明,本文实现的运算库在各个运算中都取得了远超于OpenSSL等常用CPU端运算库的性能。基于该运算库实现的代理重加密算法相比CPU实现能达到最高145倍左右的加速比,Bulletproofs范围证明验证算法相比于CPU端实现也能达到5.57倍左右的加速效果,平均证明验证时间在1 ms内,可以满足数字货币隐私保护场景下超过每秒2000笔交易的性能需求。可见该运算库能为区块链系统隐私保护等对密码学计算具有高吞吐需求的场景提供坚实支持。
关键词:  椭圆曲线  图形处理单元  统一计算架构  范围证明  代理重加密
DOI:10.19363/J.cnki.cn10-1380/tn.2024.11.01
投稿时间:2023-05-22修订日期:2023-07-15
基金项目:本课题得到国家自然科学基金资助项目(No.62272253,No.62272252,No.62141412);公共大数据国家重点实验室开放课题(No.PBD2022-12);天津市科技计划重点研发计划(No.19YFZCSF00900,No.20JCZDJC00610);中央高校基本科研业务费的资助。
Optimization for GPU-based Elliptic Curve Library and Related Algorithms
GAO Yuyang,ZHANG Jianning,WANG Gang,SU Ming,LIU Xiaoguang
College of Computer Science, College of Cyber Science, Nankai University, Tianjin 300350, China;College of Computer Science, College of Cyber Science, Nankai University, Tianjin 300350, China;Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin 300350, China;College of Computer Science, College of Cyber Science, Nankai University, Tianjin 300350, China;State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China;Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin 300350, China
Abstract:
In blockchain systems with privacy protection, cryptographic algorithms such as digital signatures and zero-knowledge proofs are often used to protect data security and user privacy. However, many cryptographic algorithms, including range proofs, have become performance bottlenecks of blockchain systems due to their heavy reliance on operations of large numbers and elliptic curves. Moreover, the GPU optimization of cryptographic algorithms has gained extensive attention and research in recent years. We make full use of the advantages of GPU as many-core processors and design a GPU-based elliptic curve operation library. In the library, we implement and optimize the common operations of elliptic curves and large numbers on GPU, and design different implementations and interfaces to accommodate various requirements. To maximize the throughput and performance of the library, we carefully allocate storage such as registers and constant memory, and use optimization methods such as precomputation to reduce the amount of calculation. For testing the usability and effectiveness of the library, we use it to implement the proxy re-encryption algorithm and the verification algorithm of Bulletproofs range proofs. We further optimize them by making full use of the intrinsic parallelism of the algorithms. Experiments show that the operation library achieves a performance that far exceeds that of commonly used CPU-side libraries such as OpenSSL in each operation. Compared with the CPU-side implementation, the proxy re-encryption algorithm implemented with the library achieves up to 145 times speedup. The Bulletproofs range proof verification algorithm implemented with the library achieves a speedup of about 5.57 times as well. The average verification time of GPU-based Bulletproofs is within 1 millisecond, which meets the performance requirement of privacy protection for over 2000 digital currency transactions per second. Therefore, the operation library provides a solid foundation for applications requiring high throughput of cryptographic calculations, such as the privacy protection of blockchain systems.
Key words:  elliptic curve  graphics processing unit  compute unified device architecture  range proof  proxy re-encryption