HiveAttacker：一个针对Hive数据仓库的两阶段安全性检测方案

李文超; 李丰; 薄德芳; 周建华; 霍玮

【打印本页】【下载PDF全文】【View/Add Comment】【Download reader】【 Close 】

本文已被：浏览 391次下载 104次	码上扫一扫！
HiveAttacker：一个针对Hive数据仓库的两阶段安全性检测方案
李文超,李丰,薄德芳,周建华,霍玮
分享到：微信更多字体:加大+\|默认\|缩小-
(中国科学院信息工程研究所, 北京中国 100093;中国科学院网络测评技术重点实验室, 北京中国 100093;网络安全防护技术北京市重点实验室, 北京中国 100093;中国科学院大学网络空间安全学院, 北京中国 100049)

摘要:

大数据所蕴藏的巨大价值,使其成为当前网络攻击的重点目标之一。然而,长期以来,以Hive为代表的数据仓库及大数据处理引擎,以及其所依托的分布式处理平台,普遍重视服务的高可用性、高扩展性,未充分考虑安全性,导致在大数据的存储、处理过程中存在安全风险。本文以Hadoop平台上的Hive数据仓库及查询引擎为切入点,归纳了Hive在查询解析过程中,以及在与Hadoop平台或其他第三方组件交互过程中面临的两个主要攻击面,并针对性地设计了一个两阶段安全性检测方案。方案的第一阶段针对Hive因接收、解析用户查询所引入的攻击面,对传统模糊测试技术进行定制化扩展,重点挖掘Hive自身代码中存在的可能造成提权、授权绕过等利用效果的漏洞;第二阶段针对Hive因与其他组件交互引入的攻击面,重点检测可能通过组件间交互触发的漏洞,并进行预警。基于上述方案实现的原型工具HiveAttacker,在Hive两个历史版本及最新版本中共挖掘出8个漏洞,其中包含2个最新版本中尚未修复的漏洞,并在搭建的真实Hive运行环境中检测出因组件交互引入的安全威胁7处,验证了方案的有效性。

关键词: Apache Hive 模糊测试漏洞检测

DOI：10.19363/J.cnki.cn10-1380/tn.2026.01.15

Received:December 07, 2020Revised:February 09, 2021

基金项目:国家自然科学基金(No.U1836209,No.62032010,No.61802394); 中国科学院战略性先导科技专项(No.XDC02040100)资助。

HiveAttacker: A Two-stage Security Detecting Approach for Apache Hive

LI Wenchao,LI Feng,BO Defang,ZHOU Jianhua,HUO Wei

Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences, Beijing 100093, China;Beijing Key Laboratory of Network Security and Protection Technology, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract:

The enormous value that big data holds has made it one of the prime targets for network attack today. However, data warehouses and big data processing engines, represented by Hive, as well as the distributed processing platforms on which they rely, have long focused on high availability and scalability of services, without paying enough attention to security, leading to security risks during the storage and processing of big data. In this paper, we take the Hive data warehouse and query engine on the Hadoop platform as the starting point, summarize the two main attack surfaces that Hive faces during query parsing and the interaction process with the Hadoop platform or other third-party components, and design a targeted two-stage security detecting solution. The first stage of the solution targets the attack surface introduced by Hive's reception and parsing of user queries. We customize and extend traditional fuzz testing techniques to focus on vulnerabilities in Hive's own code that may cause exploitation effects such as privilege escalation and authorization bypass. The second stage of the solution targets the attack surface introduced by Hive's interaction with other components, with a focus on detecting vulnerabilities that may be triggered through inter-component interactions and issuing warnings accordingly. Based on this solution, we implemented a prototype tool named HiveAttacker and applied it on two of the historical revisions as well as the latest revision of Hive. The tool has detected a total of eight vulnerabilities, including two vulnerabilities that have not yet been fixed in the latest versions of Hive. It has also identified seven security threats introduced by component interactions in a real Hive operating environment, thus verifying the effectiveness of the proposed approach.

Key words: Apache Hive fuzzing test vulnerability detecting