同态加密在隐私保护机器学习中的应用研究

姚攀; 王鹤; 郑超; 王利明

引用本文：

姚攀,王鹤,郑超,王利明.同态加密在隐私保护机器学习中的应用研究[J].信息安全学报,已采用 [点击复制]
Yao Pan,Wang He,ZHENG Chao,WANG Li ming.Research on Application of Homomorphic Encryption in Privacy Preserving Machine Learning[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 26036次下载 16462次
同态加密在隐私保护机器学习中的应用研究
姚攀, 王鹤, 郑超, 王利明
0 字体:加大+\|默认\|缩小-
(中国科学院信息工程研究所)

摘要:

随着云计算和大数据的发展与成熟，数据的价值被广泛认可，数据已经成为新的生产要素渗透到了各行各业，数据的重要性与日俱增。随着用户隐私保护意识的增强以及政策法规对涉敏涉密数据的监管，数据隐私与安全问题已经成为制约大数据和人工智能发展的关键因素之一。由于机器学习系统在设计之初并没有考虑隐私和安全问题，训练数据、模型参数、梯度信息、用户数据都存在隐私泄漏的风险。相比于传统加密模式，同态加密具有数据加密后可计算的特点，在功能上契合了隐私保护机器学习的应用需求。本文首先分析了机器学习系统中的隐私数据类型和隐私攻击方法，进而给出隐私保护机器学习的形式化定义、安全目标、敌手模型和安全等级，并对比了同态加密与安全多方计算、差分隐私等隐私保护机制的优缺点；其次，对同态加密方案的发展历程、分类、优缺点及应用场景、工程实现情况进行了归纳总结；然后我们分析了为什么需要对激活函数进行近似替代以及近似方案有哪些，重点梳理了同态加密在密态机器学习、机器学习即服务和联邦学习中的研究进展，从同态加密方案使用情况、敌手模型与安全等级、隐私保护目标、混合防御机制等维度整体上进行了分析和对比；最后，总结了同态加密应用于机器学习隐私保护领域存在的现实挑战，从参与方数量、数据划分模式、特征稀疏问题、去中性化等方面对未来的研究方向进行了展望。

关键词: 同态加密机器学习隐私保护联邦学习

DOI：10.19363/J.cnki.cn10-1380/tn.2024.04.13

投稿时间：2022-04-03修订日期：2022-07-04

基金项目:

Research on Application of Homomorphic Encryption in Privacy Preserving Machine Learning

Yao Pan, Wang He, ZHENG Chao, WANG Li ming

(Institute of Information Engineering,Chinese Academy of Sciences)

Abstract:

With the development and maturity of cloud computing and big data, the value of data is widely recognized. Data has become a new production factor permeating all walks of life, and the importance of data is increasing day by day. With the enhancement of users" awareness of privacy protection and the regulation of sensitive and confidential data by poli-cies and regulations, the issue of data privacy and security has become one of the key factors restricting the development of big data and artificial intelligence.At the beginning of the machine learning system design, privacy and security issues are not taken into account. Training data, model parameters, gradient information, and user data all risk privacy leakage. Compared with the traditional encryption scheme, homomorphic encryption is computable after data encryption, which functionally fits the application requirements of privacy-preserving machine learning. This paper first analyses the private data in machine learning systems and the corresponding attack methods. Then a formal definition of privacy-preserving machine learning, a security target, an adversary model and a security level are given. We compare the advantages and disadvantages of homomorphic encryption with privacy-preserving mechanisms such as secure multi-party computation and differential privacy. Next, we summarise the development history, classification, characteristics and engineering im-plementation of homomorphic encryption schemes; then, we analyze why the approximate substitution of activation functions is needed and what approximate schemes are available. We comb through the research progress of homomorphic encryption applications in cryptographic machine learning, machine learning as a service, and federation learning. We have analyzed and summarised the homomorphic encryption schemes in terms of their usage, adversary models and secu-rity levels, privacy-preserving goals, hybrid defense mechanisms and other dimensions. Finally, we summarize the realis-tic challenges of homomorphic encryption applied to machine learning privacy protection and provide an outlook on future research directions regarding the number of participants, data division patterns, feature sparsity problems, and de-neutrality.

Key words: homomorphic encryption machine learning privacy preserving federated learning