神经机器阅读模型综述

骆丹; 张鹏; 马路; 王斌; 王丽宏

引用本文：

骆丹,张鹏,马路,王斌,王丽宏.神经机器阅读模型综述[J].信息安全学报,2024,9(2):122-139 [点击复制]
LUO Dan,ZHANG Peng,MA Lu,WANG Bin,WANG Lihong.A Survey on Neural Machine Reading Comprehension Model[J].Journal of Cyber Security,2024,9(2):122-139 [点击复制]

本文已被：浏览 2408次下载 1439次	码上扫一扫！
神经机器阅读模型综述
骆丹^1,2, 张鹏^1,2, 马路^1,2, 王斌³, 王丽宏⁴
0 字体:加大+\|默认\|缩小-
(1.中国科学院信息工程研究所第二研究室北京中国 100093;2.中国科学院大学网络空间安全学院北京中国 100049;3.小米AI实验室北京中国 100085;4.国家计算机网络应急技术处理协调中心北京中国 100029)

摘要:

近年来,随着互联网的高速发展,网络内容安全问题日益突出,是网络治理的核心任务之一。文本内容是网络内容安全最为关键的研究对象,然而自然语言本身固有的模糊性和灵活性给网络舆情监控和网络内容治理带来了很大的困难。因此,如何准确地理解文本内容,是网络内容治理的关键问题。目前,文本内容理解的核心支撑技术是基于自然语言处理的方法。机器阅读理解作为自然语言处理领域中的一项综合性任务,可以深层次地分析、全面地理解网络内容,在网络舆论监测和网络内容治理上发挥着重要作用。近年来,深度学习技术已在图像识别、文本分类、自然语言处理等多个领域中取得显著成果,基于深度学习的机器阅读理解方法也被广泛研究。特别是近年来各种大规模数据集的公开,加快了神经机器阅读理解的发展,各种结合不同神经网络的机器阅读模型被相继提出。本文旨在对神经机器阅读模型进行综述。首先介绍机器阅读理解的发展历史和研究现状;然后阐述机器阅读理解的任务定义,并列举出有代表性的数据集以及神经机器阅读模型;再介绍四种新趋势目前的研究进展;最后提出神经机器阅读模型当前存在的问题,并且分析机器阅读理解如何应用于网络内容治理问题以及对未来的发展趋势进行展望。

关键词: 网络内容安全网络舆情监测机器阅读理解自然语言处理深度学习神经网络

DOI：10.19363/J.cnki.cn10-1380/tn.2024.03.10

投稿时间：2020-05-21修订日期：2020-09-08

基金项目:本课题得到国家重点研究发展计划(No.2016QY03D0503,No.2016YFB081304)、中国科学院战略性先导项目(No.XDC02040400)、中国科学院青年创新促进会项目(No.2020163)资助。

A Survey on Neural Machine Reading Comprehension Model

LUO Dan^1,2, ZHANG Peng^1,2, MA Lu^1,2, WANG Bin³, WANG Lihong⁴

(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;2.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China;3.Xiaomi AI Lab, Beijing 100085, China;4.National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China)

Abstract:

After witnessing the soaring development of the Internet in the past decades, the problem of Cyber Content Security, which is considered as one of the core tasks of network governance, has become increasingly prominent. Text content is the most pivotal research object of cyber content security. However, the inherent ambiguous and flexibility of natural language bring great difficulties to public opinion monitoring and cyber content governance on the Internet. Therefore, how to accurately understand the text content is the key issue of cyber content governance. At present, the core supporting technology of text content understanding is based on Natural Language Processing. As a comprehensive task in the field of Natural Language Processing, Machine Reading Comprehension can analyze the network content in depth and achieve a comprehensive understanding, which plays an important role in the monitoring of network public opinion and the governance of cyber content. In recent years, Deep Learning technology has made remarkable achievements in many fields, such as Pattern Recognition, text classification and Natural Language Processing. Likewise, Machine Reading Comprehension methods based on Deep Learning have been widely studied. Especially in recent years, the publication of various large-scale datasets has accelerated the development of neural Machine Reading Comprehension, and various machine reading models combining different neural networks have been proposed successively. The purpose of this paper is to review various neural machine reading models. Firstly, the development history and research status of Machine Reading Comprehension are introduced. Then, the task definition of Machine Reading Comprehension is expounded, and representative datasets and neural machine reading models are presented. The latest research progress of four new trends is introduced. Finally, the existing problems of the neural machine reading model are put forward, how Machine Reading Comprehension methods are applied to solve the problem of Cyber content governance is analyzed, and the future development trend is forecasted.

Key words: cyber content security public opinion monitoring machine reading comprehension natural language processing deep learning neural network