摘要: |
作为最流行的大数据分析工具之一,Spark的安全性却未得到足够重视。访问控制作为实现数据安全共享的重要手段,尚未在Spark上得以部署。为实现隐私或敏感数据的安全访问,本文尝试提出一种面向Spark的访问控制解决方案。由于Spark架构具有混合分析的特点,设计和实现一个可扩展支持不同数据源的细粒度访问控制机制具有挑战性。本文提出了一种基于声明式编程和Catalyst可扩展优化器的统一、集中式访问控制方法GuardSpark。GuardSpark可支持复杂的访问控制策略和细粒度访问控制实施。文章实验部分对所提访问控制方法在Spark上进行了原型实现,并对其有效性和性能开销进行了实验验证和评价。实验结果表明,GuardSpark可实现细粒度、支持复杂策略的访问控制机制。同时,该方法带来的性能开销可忽略,并且系统具有可扩展性。 |
关键词: Spark SQL 访问控制 安全优化 大数据 |
DOI:10.19363/j.cnki.cn10-1380/tn.2017.10.006 |
Received:March 20, 2017Revised:June 02, 2017 |
基金项目: |
|
GuardSpark:Access Control Enforcement in Spark |
NING Fangxiao,WEN Yu,SHI Gang |
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China |
Abstract: |
As one of the most popular big data analysis tools, the security of Spark has not raised sufficient concern. Access control is an important means of safe data sharing, which was not deployed on Spark. In order to safely access privacy or sensitive data, this paper attempts to propose an access control solution for Spark. Due to the unification of Spark framework, it is very challenging to design and implement a scalable and fine-grained access control schemes which support variety of data sources. We proposed GuardSpark, a unified, centralized access control method based on declarative programming and Catalyst extensible optimizer. GuardSpark supports complex access control policies and fine-grained access control enforcement. The experimental part of this paper implemented the proposed prototype on Spark to verify the correctness of the function of AC enforcement. We also evaluated the system overhead introduced by AC enforcement. The experimental results show that GuardSpark can achieve fine-grained access control and support complex AC policies. At the same time, the performance overhead of this approach is negligible with good scalability. |
Key words: Spark SQL access control security optimization big data |