基于多方安全计算的联邦密度聚类算法研究

申旭弘; 于海宁; 王孝余

本文已被：浏览 4115次下载 4157次	码上扫一扫！
基于多方安全计算的联邦密度聚类算法研究
申旭弘,于海宁,王孝余
分享到：微信更多字体:加大+\|默认\|缩小-
(哈尔滨工业大学网络空间安全学院, 哈尔滨中国 150001;国网黑龙江省电力有限公司电力科学研究院, 哈尔滨中国 150001)

摘要:

聚类是一种流行的无监督机器学习技术,它将相似的数据分组成簇。聚类被应用于众多领域的数据分析,包括金融分析、医疗分析等。许多聚类应用都包含了敏感信息,由于数据隐私保护政策,这些敏感信息在聚类时不应被泄露。随着数据分析技术的发展,现在通常需要对多个来源的数据进行聚类,以提高分析的质量,这需要高效的隐私保护聚类算法来保护各个参与方的隐私。然而,现有的隐私保护聚类方法仅支持2～3个参与方共同聚类。本文实现了联邦密度聚类算法(Federated Density-BasedSpatial Clustering of Applications with Noise,FDBSCAN)。FDBSCAN是一种基于多方安全计算技术的支持任意多参与方的高效隐私保护聚类方法。FDBSCAN算法基于DBSCAN算法对点的定义,构造了多个聚类簇,每个簇都由至少一个核心点和所有由它可达的点组成。在FDBSCAN中,各参与方利用秘密共享加密数据并分享给其他参与方进行联合聚类,避免了参与方隐私信息泄露与中间信息泄露问题。FDBSCAN还基于秘密共享和二进制共享的混合协议实现了隐私保护的条件控制语句。实验表明,与现有的隐私保护聚类算法相比,FDBSCAN能够支持更多的参与方,并且在聚类准确度与效率上表现更佳。在常见的聚类场景中,FDBSCAN能够获得与明文DBSCAN算法相同的聚类准确度,并且在计算效率上达到了可用水平。FDBSCAN算法在双方联合聚类的场景中,相较于已有的隐私保护聚类算法,在真实环境数据集中表现出了更高的效率。本文还针对轨迹聚类这一应用场景实现了联邦轨迹聚类算法(Federated Trajectory Clustering,FTC)。FTC使用FDBSCAN进行聚类,并提出了高效隐私安全的轨迹距离计算方法,获得了较好的轨迹聚类效果。实验结果表明,FTC算法在轮廓系数等指标中的表现要优于现有的轨迹聚类算法。

关键词: 聚类联邦学习多方安全计算秘密共享

DOI：10.19363/J.cnki.cn10-1380/tn.2025.05.02

投稿时间：2023-09-20修订日期：2023-12-22

基金项目:本课题得到国家自然科学基金项目(No. 62172123, No. 62302122)黑龙江省自然科学基金优秀青年项目(No. YQ2021F007)资助。

Research on Federated Density-Based Clustering Based on Secure Multi-Party Computation

SHEN Xuhong,YU Haining,WANG Xiaoyu

School of Cyberspace Science, Harbin Institute of Technology, Harbin 150001, China;Electric Power Research Institute of State Grid Heilongjiang Electric Power Co., Ltd., Harbin 150001, China

Abstract:

Clustering is a widely-used unsupervised machine learning technique that groups similar data into clusters. Clustering is used in many areas for data analysis such as financial analysis and medical analysis due to the data privacy policy. Many of these applications contain sensitive information that should not be leaked when clustering. Moreover, with the development of data analysis techniques, it is often required to group data from multiple sources to increase the quality of data analysis, which requires efficient privacy-preserving clustering to preserver each participant’s privacy. However, existing privacy-preserving clustering only support 2-3 participants clustering jointly and they performs poor effiency on clustering. In this paper, we implement Federated Density-Based Spatial Clustering of Applications with Noise (FDBSCAN). FDBSCAN is an efficient privacy-preserving clustering based on Multi-Party-Computation technique that supports an arbitrary number of participants. The FDBSCAN, based on the definition of points in the plaintext DBSCAN algorithm, forms multiple clusters and each cluster contains at least one core point and all points that are reachable from it. In FDBSCAN, each participant uses secret share to encrypt data and share them with other participants for joint clustering without disclosing private information and intermediate information. FDBSCAN implemented privacy-preserving conditional control statements based on a hybrid protocol of secret share and binary share. In our experiment, comparing with existing privacy-preserving clustering, FDBSCAN supports more participants and it shows better performance on clustering effiency and clustering quality. In common clustering application scenarios, FDBSCAN performs the same clustering quality as plaintext DBSCAN, and it reaches applicable effiency. FDBSCAN, in the scenario of joint clustering between two parties, performs higher effiency on real-world datasets compared to existing privacy-preserving clustering. For trajectory clustering, we implement Federated Trajectory Clustering(FTC). FTC uses FDBSCAN to cluster data, and it proposes efficient privacy-preserving trajectory-distance calculation algorithm and reaches good clustering performance. In experiment, compared with existing trajectory clustering like TRACLUS, FTC performs better in silhouette.

Key words: cluster federated learning secure multi-party computation secret share