基于秘密共享的安全命名实体识别推理方法

佟岩; 花忠云; 廖清; 张玉书

引用本文：

佟岩,花忠云,廖清,张玉书.基于秘密共享的安全命名实体识别推理方法[J].信息安全学报,已采用 [点击复制]
Tong Yan,Tong Yan,Liao Qing,Zhang Yushu.Secure named entity recognition inference method based on secret sharing[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 847次下载 0次
基于秘密共享的安全命名实体识别推理方法

0 字体:加大+\|默认\|缩小-
(哈尔滨工业大学（深圳）)

摘要:

命名实体识别旨在识别一段文本中特定的实体，这些识别出来的实体可以用来提升下游任务的性能，因此命名实体识别已成为了自然语言处理领域的基础任务之一。BiLSTM-CRF模型以其结合了深度学习与统计机器学习的优点，已成为了处理该任务的基线方法，然而现有的BiLSTM-CRF模型仅支持明文域下的推理，从而在因资源不足而将计算外包的场景下会出现一定的隐私泄露问题。为了解决BiLSTM-CRF命名实体识别外包推理过程中的隐私泄露问题，本文提出了一种基于秘密共享的安全命名实体识别推理方法SecNER。SecNER基于秘密共享技术对BiLSTM-CRF命名实体识别模型中涉及到的非线性激活函数、最大值信息获取、数组访问等算子进行安全化设计，并根据这些安全化算子构建了整个推理系统。SecNER能够保证在BiLSTM-CRF命名实体识别外包推理过程中，用户上传的待预测数据的安全性与被托管的模型参数的安全性。为了进一步优化安全词向量提取操作的性能，本文对词向量嵌入层结构进行了调整，采用了分桶的思想，从而减少了桶中单词的个数，进而减少了安全词向量提取过程中的通信开销。本文利用基于模拟的安全性分析方法对SecNER进行了安全性证明，并设计实验证明了方案可行性。实验结果表明，在三个数据集上与明文推理方法相比，安全的命名实体识别推理方法的F1值最多下降0.001，且推理的时间开销在可接受范围内。

关键词: 命名实体识别信息安全秘密共享隐私计算

DOI：

投稿时间：2024-04-06修订日期：2024-06-04

基金项目:国家自然科学基金项目（面上项目，重点项目，重大项目）62071142

Secure named entity recognition inference method based on secret sharing

Tong Yan¹, Tong Yan², Liao Qing², Zhang Yushu³

(1.Harbin InsHarbin Institute of Technology, Shenzhen;2.Harbin Institute of Technology, Shenzhen;3.Nanjing University of Aeronautics and Astronautics)

Abstract:

Named entity recognition is designed to identify distinct entities within a given textual context. The entities identified through this process serve as valuable components for improving the efficacy of subsequent natural language processing tasks. Consequently, named entity recognition has evolved into a foundational task in the realm of natural language pro-cessing. The BiLSTM-CRF model, with its integration of the advantages of deep learning and statistical machine learning, has emerged as a baseline method for addressing this task. However, existing BiLSTM-CRF models only support inference in plaintext domains, leading to potential privacy leakage issues in scenarios where computations are outsourced due to in-sufficient resources. To address the privacy issues arising during the outsourcing inference process of BiLSTM-CRF named entity recognition, SecNER, a secure named entity recognition inference method based on secret sharing is proposed in this paper. SecNER employs secret sharing techniques to secure the operations involved in the BiLSTM-CRF named entity recognition model, such as nonlinear activation function, retrieving maximum value information from an array and array accessing. The entire inference system is constructed based on these secure operations. SecNER ensures the security of both the user's uploaded data for prediction and the hosted model parameters during the outsourcing inference. To further optimize the performance of secure word vector extraction operations, adjustments are made to the structure of the word vector embedding layer in this paper. The concept of bucketing is employed to reduce the number of words in each bucket, thereby minimizing the communication overhead during secure word vector extraction. A simulation-based security analy-sis is provided and extensive experiments have been designed to demonstrate the feasibility of SecNER. Results indicate that, compared to the plaintext model on three datasets, the F1 score of the proposed secure named entity recognition in-ference method decreases by at most 0.001. Moreover, the time overhead of the inference process remains within an ac-ceptable range.

Key words: named entity recognition, information security, secret sharing, privacy-preserving computation