面向活体检测的动态-静态特征分离与条件门控融合网络

朱大立; 张文丽; 杨旭; 曾华林; 杨龙

引用本文：

朱大立,张文丽,杨旭,曾华林,杨龙.面向活体检测的动态-静态特征分离与条件门控融合网络[J].信息安全学报,已采用 [点击复制]
Zhu Dali,Zhang Wenli,Yang Xu,Zeng Hualin,Yang Long.Dynamic-Static Feature Separation and Conditional Gat-ing Fusion Network for Live Detection[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 122次下载 0次
面向活体检测的动态-静态特征分离与条件门控融合网络
朱大立, 张文丽, 杨旭, 曾华林, 杨龙
0 字体:加大+\|默认\|缩小-
(Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China)

摘要:

人脸识别系统正面临3D面具、高清重放与打印照片等多种面部呈现攻击的严峻威胁。基于远程光电容积描记法的活体检测方法因能提取血容量脉冲等生理信号，对3D面具攻击具有本质判别力，但其性能受限于两方面固有缺陷：一是rPPG信号易受运动与光照干扰，鲁棒性不足；二是对同样缺乏生理活动的打印与重放等静态攻击敏感性较弱。这些缺陷源于现有方法将动态生理与静态物理线索耦合于单一特征流，无法充分利用多模态欺诈证据。为此，本文提出一种动态-静态特征分离提取与融合新范式，并构建了双分支条件门控融合网络rFaceNet++。该网络通过两个专用分支分别实现鲁棒的动态特征提取与细致的静态痕迹捕捉：一个引入面部轮廓先验的rPPG分支（rFaceNet）聚焦于血管丰富区域，提升生理信号质量与运动鲁棒性；一个轻量级静态纹理分支专门建模打印攻击的纹理异常与重放攻击的屏幕伪影。最后，一个创新的条件门控模块根据输入内容动态评估两类特征的可信度，实现自适应加权融合决策。在公开数据集上的实验表明，所提方法在3D面具、重放与打印攻击上均取得显著性能提升：在3DMAD数据集上，对1秒短时面具攻击的检出率超过96%，在Replay-Attack数据集上等错误率降至0.0%，同时在跨攻击平均分类错误率上较当前最优方法降低1.4%。消融实验与可视化分析验证了双分支分离提取与门控融合机制的有效性。本文不仅提供了一个高性能的多类型攻击检测方案，更建立了一个基于可解释证据融合的活体检测新框架。

关键词: 远程光电容积描记法人脸欺骗检测心率估计面部轮廓血容量脉冲

DOI：

投稿时间：2026-01-12修订日期：2026-04-22

基金项目:

Dynamic-Static Feature Separation and Conditional Gat-ing Fusion Network for Live Detection

Zhu Dali, Zhang Wenli, Yang Xu, Zeng Hualin, Yang Long

(Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China)

Abstract:

Face recognition systems face severe threats from diverse presentation attacks, including 3D masks, high-definition replay videos, and printed photographs. Remote Photoplethysmography (rPPG)-based liveness detection methods in-herently discriminate 3D mask attacks by extracting physiological signals such as the Blood Volume Pulse (BVP). However, their performance is limited by two inherent drawbacks: first, rPPG signals are susceptible to interference from motion and lighting, resulting in insufficient robustness; second, they exhibit weak sensitivity to static attacks such as print and replay, which also lack genuine physiological activity. These limitations stem from existing ap-proaches that couple dynamic physiological and static physical cues within a single feature stream, failing to fully leverage multimodal spoofing evidence. To address this, this paper proposes a novel dynamic-static feature separation and fusion paradigm, and constructs a dual-branch conditional gating fusion network named rFaceNet++. The network employs two dedicated branches to achieve robust dynamic feature extraction and fine-grained static artifact capture, respectively: an rPPG branch (rFaceNet) incorporates facial contour priors to focus on vasculature-rich regions, en-hancing both physiological signal quality and motion robustness; a lightweight static texture branch is specifically designed to model texture anomalies in print attacks and screen artifacts in replay attacks. Finally, an innovative con-ditional gating module dynamically assesses the credibility of both feature types based on the input and performs adaptive weighted fusion. Experiments on public datasets demonstrate that the proposed method achieves significant performance improvements across 3D mask, replay, and print attacks: on the 3DMAD dataset, it achieves a detection rate exceeding 96% for 1-second short-duration mask attacks; on the Replay-Attack dataset, the Equal Error Rate (EER) is reduced to 0.0%; meanwhile, the Average Classification Error Rate (ACER) across attack types is reduced by 1.4% compared to the state-of-the-art. Ablation studies and visual analyses validate the effectiveness of the dual-branch separation architecture and the gating fusion mechanism. This work not only provides a high-performance solution for multi-type attack detection but also establishes a novel, interpretable liveness detection framework based on evidential fusion.

Key words: Remote Photoplethysmography, Face Anti-spoofing, Heart Rate Estimation, Facial Contour, Blood Volume Pulse