开源软件缺陷报告自动摘要研究综述

刘翠兰; 张嘉元; 曹旭栋; 伍高飞; 朱笑岩; 任家东; 冯涛

引用本文：

刘翠兰,张嘉元,曹旭栋,伍高飞,朱笑岩,任家东,冯涛.开源软件缺陷报告自动摘要研究综述[J].信息安全学报,2022,7(6):126-139 [点击复制]
LIU Cuilan,ZHANG Jiayuan,CAO Xudong,WU Gaofei,ZHU Xiaoyan,REN Jiadong,FENG Tao.A survey of Automatic Summarization of Open Source Software Bug Reports[J].Journal of Cyber Security,2022,7(6):126-139 [点击复制]

本文已被：浏览 2833次下载 2941次	码上扫一扫！
开源软件缺陷报告自动摘要研究综述
刘翠兰^1,2, 张嘉元^3,2, 曹旭栋^4,2, 伍高飞^1,5,2, 朱笑岩⁶, 任家东⁷, 冯涛³
0 字体:加大+\|默认\|缩小-
(1.西安电子科技大学广州研究院广州中国 510555;2.国家计算机网络入侵防范中心(中国科学院大学) 北京中国 101408;3.兰州理工大学计算机与通信学院兰州中国 730050;4.中国科学院大学计算机科学与技术学院北京中国 101408;5.桂林电子科技大学广西密码学与信息安全重点实验室桂林中国 541004;6.西安电子科技大学通信工程学院西安中国 710071;7.燕山大学信息科学与技术学院秦皇岛中国 066004)

摘要:

在开源软件开发的维护阶段, 开源软件缺陷报告为开发人员解决缺陷提供了大量帮助。然而, 开源软件缺陷报告通常是以用户对话的形式编写, 一个软件缺陷报告可能含有数十条评论和上千个句子, 导致开发人员难以阅读或理解软件缺陷报告。为了缓解这个问题, 人们提出了开源软件缺陷报告自动摘要, 缺陷报告自动摘要可以减少开发人员阅读冗长缺陷报告的时间。本文以综述的方式对开源软件缺陷报告自动摘要的研究做了系统的归纳总结。首先, 根据摘要的表现形式, 将开源软件缺陷报告摘要分类为固定缺陷报告摘要和可视化缺陷报告摘要, 再将固定缺陷报告摘要研究方法分类为基于监督学习方法和基于无监督学习方法, 之后总结了基于监督学习和无监督学习的开源软件缺陷报告摘要生成的工作框架, 并介绍了开源软件缺陷报告摘要领域常用数据集、预处理技术和摘要评估指标。其次, 本文以无监督学习为切入点, 分类阐述和归纳了无监督开源软件缺陷报告摘要方法, 将无监督开源软件缺陷报告摘要方法分类为: 基于特征评分方法、基于深度学习方法、基于图方法和基于启发式方法, 并对每类方法进行讨论与分析。再次, 从缺陷报告摘要的实用性出发, 对现有的缺陷报告可视化摘要研究成果进行总结,并对固定缺陷报告摘要和可视化缺陷报告摘要的实用性做出分析。最后, 对现有研究成果及综述进行讨论和分析, 指出了开源软件缺陷报告摘要领域在缺陷报告数据集、抽取式摘要和黄金标准摘要三个方面面临的挑战和对未来研究的展望。

关键词: 开源软件缺陷报告自动摘要文本摘要

DOI：10.19363/J.cnki.cn10-1380/tn.2022.11.09

投稿时间：2022-07-04修订日期：2022-10-11

基金项目:本课题得到国家自然科学基金项目(No. U1836210, No. 61941105, No. 61772406)、广西密码学与信息安全重点实验室研究课题(No.GCIS202123)、陕西省自然科学基础研究计划项目(No. 2021JQ-192)和河北软件工程重点实验室项目(No. 22567637H)资助。

A survey of Automatic Summarization of Open Source Software Bug Reports

LIU Cuilan^1,2, ZHANG Jiayuan^3,2, CAO Xudong^4,2, WU Gaofei^1,5,2, ZHU Xiaoyan⁶, REN Jiadong⁷, FENG Tao³

(1.Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China;2.National Computer Network Intrusion Protection Center, University of Chinese Academy of Sciences, Beijing 101408, China;3.School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China;4.School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China;5.Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin 541004, China;6.School of Telecommunication Engineering, Xidian University, Xi'an 710071, China;7.School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China)

Abstract:

In the maintenance phase of open source software development, open source software bug reports provide a lot of help for developers to solve bugs. However, open source software bug reports are usually written in the form of user dialogue. A software bug report may contain dozens of comments and thousands of sentences, making it difficult for developers to read or understand the software bug report. In order to alleviate this problem, people have proposed automatic summarization of open source software bug reports, which can reduce the time for developers to read lengthy bug reports. In this paper, the research on automatic summarization of open source software bug reports is summarized systematically. First, according to the presentation form of the summary, the open source software bug report summary is classified into fixed bug report summary and visual bug report summary, and then the research methods of fixed bug report summary are classified into supervised learning based method and unsupervised learning based method. After that, the work framework for generating open source software bug report summary based on supervised learning and unsupervised learning is summarized, it also introduces common data sets, preprocessing techniques and summary evaluation indicators in the field of open source software bug report summary. Secondly, this paper takes unsupervised learning as the starting point, elaborates and summarizes the unsupervised open source software bug report summary methods by category, and classifies the unsupervised open source software bug report summary methods into: feature based scoring methods, depth based learning methods, graph based methods, and heuristic methods, and discusses and analyzes each type of methods. Thirdly, starting from the practicability of bug report summary, the existing research results of visual bug report summary are summarized, and the practicability of fixed bug report summary and visual bug report summary is analyzed. Finally, the existing research results and reviews are discussed and analyzed, and the challenges faced by the open source software bug report summary field in three aspects of bug report dataset, abstract summary and gold standard summary are pointed out, as well as the prospects for future research.

Key words: open source software bug report automatic summary text summary