基于深度学习的视频人体行为识别研究进展

孙德刚; 白入文; 李敏; 孟博; 李凌涵; 杨阳; 姜淼; 任俊星; 李风发; 黄子豪

引用本文：

孙德刚,白入文,李敏,孟博,李凌涵,杨阳,姜淼,任俊星,李风发,黄子豪.基于深度学习的视频人体行为识别研究进展[J].信息安全学报,已采用 [点击复制]
sundegang,bairuwen,limin,mengbo,lilinghan,yangyang,jiangmiao,renjunxing,lifengfa,huangzihao.Research Progress of Human Activity Recognition in Videos Based on Deep Learning[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 6164次下载 455次
基于深度学习的视频人体行为识别研究进展
孙德刚, 白入文, 李敏, 孟博, 李凌涵, 杨阳, 姜淼, 任俊星, 李风发, 黄子豪
0 字体:加大+\|默认\|缩小-
(中科院信息工程研究所)

摘要:

“大智移云物”的成熟发展不断推动着视频监控的自动化、智能化，智能视频监控已经逐渐替代传统视频监控，成为各行业安全防控的重要组成部分。在智能视频监控系统中，识别人体行为对有效发现潜在的危险因素、场所动态监管、异常事件预警等都有重要作用。然而在真实视频监控场景中识别人体行为仍面临很大的挑战。本文旨在为视频监控中人体行为识别技术的研究提供必要的参考，从RGB视频、人体骨骼与RGB+D视频三种模态数据全面概述了近六年深度行为识别模型的研究进展。本文主要依据模型的识别精度并结合模型的大小、计算效率和推理速度，比较基于不同数据模态的深度模型架构，并分析了各种方法应用不同数据来识别人体行为的优点和局限性。最后重点讨论了智能视频监控系统中的人体行为识别面临的挑战及未来研究的潜在方向。

关键词: 智能视频监控人体行为识别深度学习多模态数据

DOI：10.19363/J.cnki.cn10-1380/tn.2023.08.22

投稿时间：2021-03-16修订日期：2021-07-20

基金项目:移动应用安全

Research Progress of Human Activity Recognition in Videos Based on Deep Learning

sundegang, bairuwen, limin, mengbo, lilinghan, yangyang, jiangmiao, renjunxing, lifengfa, huangzihao

(Institute of Information Engineering, Chinese Academy of Sciences)

Abstract:

The mature development of big data, artificial intelligence, mobile Internet, cloud computing and the Internet of Things accelerate automation and intelligence of video surveillance, intelligent video surveillance has gradually replaced traditional surveillance, as an important part of the security in various industries. In intelligent video surveillance systems, recognizing human activities plays an important role in effective discovering potential risk factors, dynamic supervision of scenes, and early warning of abnormal events. However, recognizing human activity in real video surveillance scenarios faces significant challenges. This paper aims to provide a necessary reference for the research of human activity recognition, and provides a comprehensive overview of the research progress of deep activity recognition models in the past six years from three modal data, RGB video, human skeleton and RGB+D video. This paper compares different model architectures based on various data modalities based mainly on the recognition accuracy of the models and taking into account the size, computational efficiency and inference speed of the models, and analyzes the advantages and limitations of various approaches which apply different data to recognize human activities. Finally, it focuses on the challenges of human activity recognition in intelligent video surveillance systems and potential directions for future research.

Key words: intelligent video surveillance human activity recognition deep learning multimodal data