【打印本页】      【下载PDF全文】   View/Add Comment  Download reader   Close
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 7469次   下载 7346 本文二维码信息
码上扫一扫!
一种基于多维特征分析的网页代理服务发现方法
陈志鹏,张鹏,黄彩云,刘庆云,邢丽超
分享到: 微信 更多
(中国科学院大学 网络空间安全学院, 北京 中国 100049;中国科学院信息工程研究所 信息内容安全技术国家工程实验室, 北京 中国 100093)
摘要:
网页代理提供了一种快捷的中继服务,与其它类型的代理服务相比,如隐匿网络/VPN服务/Socks代理等,用户可以不需要安装任何软件就免费使用。因此,网页代理在绕过访问限制、隐藏身份等方面的便利性上有其不可比拟的优势。然而,网页代理在获取个人隐私信息、推送垃圾广告、隐匿行踪等方面也给人们的网络生活带来严重的安全威胁。所以,如何快速有效地将它们与大量正常网页区分开来成为网络空间安全面临的一个重要挑战。针对这一问题,本文提出了一种基于多维特征分析的网页代理发现方法——ProxyMiner。在主动发现方面,引入了网页代理特有的结构特征和内容特征,通过机器学习的方法进行预测发现。在被动发现方面,基于用户访问网页代理特有的访问模式,通过构建二分图对代理用户进行谱聚类分析,获取代理用户群体访问的顶级域名,从而发现网页代理。此方法仅基于客户端IP地址和目标URL,不需要任何有关HTTP头(经常会被恶意修改)或数据包(通常是加密的或不可用的)的信息。实验结果表明,在相同数据集上,相比于传统检测方法,ProxyMiner可以显著提高网页代理检测效果,降低平均检测时间。
关键词:  网页代理  服务发现  主被动结合  谱聚类分析
DOI:10.19363/J.cnki.cn10-1380/tn.2018.07.04
Received:March 30, 2018Revised:May 30, 2018
基金项目:本课题得到国家重点研发计划(No.2016YFB0801304)的资助。
A Web Proxy Detection Method based on Multiple Feature Analysis
CHEN Zhipeng,ZHANG Peng,HUANG Caiyun,LIU Qingyun,XING Lichao
School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China;State Key Laboratory Of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
Abstract:
Web proxies offer a quick and convenient solution for routing web traffic towards a destination. In contrast to more elaborate relaying systems, such as anonymity networks, VPN services or Socks proxies, users can freely connect to web proxies without installing any special software. Therefore, web proxies are an attractive option for bypassing restrictions and hiding identity. However, it has become a much more serious problem for personal privacy, malicious advertisements and property safety due to its dynamics, and evasiveness. Therefore, how to quickly and effectively detect the web proxies from a large number of web pages is an important challenge. To solve this problem, this paper presents an active and passive web proxy detection method based on multiple feature analysis, named ProxyMiner. On the active side, the DOM features unique to Web proxy are introduced, and the method of machine learning is used for predictive analysis. On the passive side, based on the access model specific to the proxy service user, spectral clustering analysis is performed on the proxy user by constructing a bipartite graph, and the top-level domain names accessed by the proxy user group are obtained to discover the proxy service. This method is based solely on the client IP address and the destination address, and does not require any information about HTTP headers (often maliciously modified) or data packets (usually encrypted or unavailable). The experimental results show that ProxyMiner can significantly improve the detection performance and reduce the average detection time compared to traditional detection methods.
Key words:  web proxy  service discovery  active and passive  spectral clustering analysis