黑盒机器学习模型的成员推断攻击研究

刘高扬; 李雨桐; 万博睿; 王琛; 彭凯

引用本文：

刘高扬,李雨桐,万博睿,王琛,彭凯.黑盒机器学习模型的成员推断攻击研究[J].信息安全学报,2021,6(3):1-15 [点击复制]
LIU Gaoyang,LI Yutong,WAN Borui,WANG Chen,PENG Kai.Membership Inference Attacks in Black-box Machine Learning Models[J].Journal of Cyber Security,2021,6(3):1-15 [点击复制]

本文已被：浏览 12427次下载 8311次	码上扫一扫！
黑盒机器学习模型的成员推断攻击研究
刘高扬¹, 李雨桐¹, 万博睿¹, 王琛^1,2, 彭凯^1,2
0 字体:加大+\|默认\|缩小-
(1.华中科技大学电子信息与通信学院武汉中国 430074;2.华中科技大学智能互联网技术湖北省重点实验室武汉中国 430074)

摘要:

近年来，机器学习技术飞速发展，并在自然语言处理、图像识别、搜索推荐等领域得到了广泛的应用。然而，现有大量开放部署的机器学习模型在模型安全与数据隐私方面面临着严峻的挑战。本文重点研究黑盒机器学习模型面临的成员推断攻击问题，即给定一条数据记录以及某个机器学习模型的黑盒预测接口，判断此条数据记录是否属于给定模型的训练数据集。为此，本文设计并实现了一种基于变分自编码器的数据合成算法，用于生成与给定模型的原始训练数据分布相近的合成数据；并在此基础上提出了基于生成对抗网络的模拟模型构建算法，利用合成数据训练得到与给定模型具有相似预测能力的机器学习模型。相较于现有的成员推断攻击工作，本文所提出的推断攻击无需目标模型及其训练数据的先验知识，在仅有目标模型黑盒预测接口的条件下，可获得更加准确的攻击结果。通过本地模型和线上机器学习即服务平台BigML的实验结果证明，所提的数据合成算法可以得到高质量的合成数据，模拟模型构建算法可以在更加严苛的条件下模拟给定模型的预测能力。在没有目标模型及其训练数据的先验知识条件下，本文所提的成员推断攻击在针对多种目标模型进行攻击时，推断准确率最高可达74%，推断精确率可达86%；与现有最佳攻击方法相比，将推断准确率与精确率分别提升10.7%及11.2%。

关键词: 机器学习黑盒模型成员推断攻击变分自编码器生成对抗网络

DOI：10.19363/J.cnki.cn10-1380/tn.2021.05.01

投稿时间：2020-07-12修订日期：2020-09-30

基金项目:本课题得到国家自然科学基金（No.61872416，No.62002104，No.52031009，No.62071192）、中央高校基本科研业务费（No.2019kfyXJJS017）、湖北省自然科学基金（No.2019CFB191）、国家大学生创新训练计划项目（No.2020104870001，No.DX2020041）资助。

Membership Inference Attacks in Black-box Machine Learning Models

LIU Gaoyang¹, LI Yutong¹, WAN Borui¹, WANG Chen^1,2, PENG Kai^1,2

(1.School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China;2.Internet Technology and Engineering R&D Center (ITEC), Huazhong University of Science and Technology, Wuhan 430074, China)

Abstract:

In recent years, machine learning has developed rapidly and has been widely deployed in the fields of natural language processing, image recognition, and search recommendations. However, a large number of machine learning models in the wild are facing severe challenges in terms of model security and data privacy. This paper focuses on the member inference attack against black-box machine learning models: given a data record and the black-box prediction interface of a machine learning model, the aim is to determine whether the data record was used to train the target model or not. To this end, in this paper, we design a synthetic data generation algorithm based on VAE and implement it to generate a synthetic dataset which has a similar distribution with the original training data of the given model. In addition, a mimic model construction algorithm based on the generated adversary network is proposed, which can train a mimic machine learning model that can imitate the prediction behavior of the target model by using the synthetic data. Compared with the existing works of member inference attacks, the inference attacks proposed in this paper do not require the prior knowledge about the target model and its training data, and can achieve more accurate attack results only with the black-box access to the target model. Experimental results show that the data synthesis algorithm proposed in this paper can obtain high quality synthetic data. The minic model construction algorithm can simulate the predictive power of a given model under more stringent conditions. Without prior knowledge about the target model and its training data, the proposed membership inference attack against multiple target models can achieve the highest attack accuracy and precision of 74% and 86% respectively, which are 10.7% and 11.2% higher than the state-of-the-art attack method.

Key words: machine learning black-box model membership inference attack variational autoencoder generative adversarial network