摘要: |
随着云存储、人工智能等技术的发展,数据的价值已获得显著增长。但由于昂贵的通信代价和难以承受的数据泄露风险迫使各机构间产生了“数据孤岛”问题,大量数据无法发挥它的经济价值。虽然将区块链作为承载联邦学习的平台能够在一定程度上解决该问题,但也带来了三个重要的缺陷:1)工作量证明(Proof of Work,POW)、权益证明(Proof of Stake,POS)等共识过程与联邦学习训练过程并无关联,共识将浪费大量算力和带宽;2)节点会因为利益的考量而拒绝或消极参与训练过程,甚至因竞争关系干扰训练过程;3)在公开的环境下,模型训练过程的数据难以溯源,也降低了攻击者的投毒成本。研究发现,不依靠工作量证明、权益证明等传统共识机制而将联邦学习与模型水印技术予以结合来构造全新的共识激励机制,能够很好地避免联邦学习在区块链平台上运用时所产生的算力浪费及奖励不均衡等情况。基于这种共识所设计的区块链系统不仅仍然满足不可篡改、去中心化、49%拜占庭容错等属性,还天然地拥有49%投毒攻击防御、数据非独立同分布(Not Identically and Independently Distributed,Non-IID)适应以及模型产权保护的能力。实验与论证结果都表明,本文所提出的方案非常适用于非信任的机构间利用大量本地数据进行商业联邦学习的场景,具有较高的实际价值。 |
关键词: 联邦学习 区块链 共识算法 模型产权保护 投毒攻击 |
DOI:10.19363/J.cnki.cn10-1380/tn.2024.01.02 |
Received:May 05, 2022Revised:August 20, 2022 |
基金项目:本课题得到中国国家自然基金(No. 61903053), 重庆市科教委项目(No. KJCX2020033), 上海市信息安全综合管理技术重点实验室开放课题(No. AGK2020006)资助。 |
|
A Novel FL System Based on Consensus Motivated Blockchain |
MI Bo,WENG Yuan,HUANG Darong,LIU Yang |
School of Information and Engineering, Chongqing Jiaotong University, Chongqing 400074, China |
Abstract: |
With the advancement of technologies such as cloud storage and AI (artificial intelligence) in recent years, the value of data has experienced significant growth. However, the exorbitant costs associated with communication and the intolerable risks of data leakage have given rise to a pervasive issue of “data isolation” among institutions, rendering a substantial portion of data unable to realize its full economic potential. Although using blockchain as a platform for federated learning can solve this problem to a certain extent, it also brings three primary shortcomings: 1) traditional consensus processes like PoW (proof of work) and PoS (proof of stake) remain largely disconnected from the federated learning training process, resulting in substantial wastage of computational power and bandwidth; 2) nodes may decline to participate actively in the training process or even disrupt it due to self-interest considerations, driven by competitive dynamics; 3) in open environments, data traceability during the model training process is challenging to establish, consequently diminishing the cost of attack for potential malevolent actors. Our study manifested that, instead of relying on traditional consensus mechanisms such as PoW and PoS, combining federated learning and model watermarking technology can make the consensus algorithm more fair and reliable. It can avoid the waste of computing power and unbalanced rewards thanks to federated learning, and the innovative consensus mechanism not only retained the properties of immutability, decentralization, and 49% byzantine fault tolerance but also naturally resisted 49% poisoning attack, adapted Non-IID (not independent and identically distributed) dataset and protected intellectual property. Both experimental and empirical evidence unequivocally demonstrate that the proposed solution in this study is exceptionally well-suited for scenarios involving non-trusting institutions collaboratively leveraging large volumes of local data for commercial federated learning, thereby holding substantial practical value. |
Key words: federated learning blockchain consensus algorithm intellectual property protection poison attack |