  • 沈博,张锐.满足差分隐私保护的高维数据发布方法[J].信息安全学报,已采用    [点击复制]
  • shenbo,zhangrui.High-dimensional Data Publishing with Differential Privacy Protection[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 15820次   下载 13862  
沈博, 张锐
关键词:  差分隐私  高维数据  数据发布  高斯生成模型  最大信息系数
High-dimensional Data Publishing with Differential Privacy Protection
shenbo, zhangrui
(Institute of Information Engineering, CAS)
The popularization of IoT and big data technology has greatly facilitated people"s life, and thus produced a large amount of high-dimensional data. Through the analysis of the published high-dimensional data, the implicit value and knowledge of data can provide guidance for the government or enterprises and institutions in the decision-making process. However, because high-dimensional data often contains personal sensitive information, its direct publish will pose a serious threat to personal privacy. Differential privacy is a privacy protection framework with strict formal definition for data publishing and analysis without revealing personal sensitive information. However, the existing differential privacy high-dimensional data publishing methods have the problems that the relationship between data cannot be fully captured in the process of data dimensionality reduction and the definition of the data distribution model is inaccurate. To solve the above problems, this paper proposes a differential privacy high-dimensional data publishing method based on Gaussian generative model. First, we use the maximum information coefficient and Dvoretzky"s theorem to preprocess high-dimensional data, filter out the useless or missing value sparse attributes in the original data and reduce the impact of additional disturbance errors introduced by data sparsity on the level of privacy protection. Then the preprocessed data is subjected to projection transformation, so that the projection of the high-dimensional data on the low-dimensional space is conformed to the Gaussian distribution. Finally, the projection data is used to train the differential privacy Gaussian generative model, and the synthetic data is generated by the model to replace the original high-dimensional data for publishing. By designing a preprocessing method suitable for high-dimensional data, this method optimizes the differential privacy high-dimensional data publishing method based on Gaussian generative model, and solves the problem of low utility of high-dimensional data publishing results due to unknown data distribution or inaccurate model definition on the basis of retaining multiple functional relationships of the original data. Theoretical analysis and experimental results show that the proposed algorithm has better utility than similar algorithms.
Key words:  Differential privacy  High dimensional data  Data publishing  Gauss generative model  Maximum information coefficient