用户名: 密码: 验证码:
网络用户行为分析的若干问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联网在当前的社会生活中已经越来越占据重要的位置。随着信息科技的进步和社会经济水平的发展,互联网规模迅速膨胀,网络流量、用户规模等互联网组成部分快速增长。随着移动设备、嵌入式系统以及传感器网络等新兴互联网组成部分的发展,互联网规模将在相当长的时间内持续增长。随着互联网的发展,互联网业务也从简单的传统业务发展到实时多媒体业务,进而发展到以资源共享和协同工作为特征的互联网业务。互联网能达到今天的规模一个重要的原因就是互联网业务种类呈现多样化和个性化的蓬勃发展。
     但是网络业务的快速发展也为电信运营商带来了一系列的挑战:需要重新掌握网络用户在使用业务方面的偏好模式以及其随时间变化的规律,从而制定针对性的资费套餐、有针对性的制定营销策略以及进行网络监管;不掌握用户上下线的行为模型就无法进行合理的服务器负载均衡,从而让服务器的服务性能达到最优。
     本文的主要内容是基于业务和时间变化的网络用户行为研究。本文通过对实际骨干网流量数据进行分析、挖掘,得到网络用户使用业务的偏好模式、网络用户使用业务的偏好模式随时间变化的规律以及对用户上下线行为模型。这些模型为电信运营商进行根据客户特点进行电信产品的定向营销、相关套餐的制定、有价值客户区分以及服务器负载均衡等提供有价值的研究基础。
     1)本文根据真实网络省级骨干网的数据特点和研究目的选定了层次聚类的算法。但是在实际应用中发现经典层次聚类算法及其已有的改进算法的时间复杂度太高。本文针对这个改进方向,提出了基于熵来对数据分组和基于数据特点来一次合并多个数据样本的的快速层次聚类算法,算法对比实验结果表明,改进算法与经典层次聚类算法相比,时间执行效率大大提高了7-8倍左右。即使是与基于最小生成树的改进层次聚类算法相比,算法也提高了3倍左右。
     2)根据快速层次聚类结果,本文揭示了网络用户业务使用偏好模式的组成以及各个网络用户业务使用偏好模式的人数分布。并深入分析了不同的网络用户业务使用偏好模式的使用频度的区别以及网络用户业务使用偏好模式与网络用户的每天在线时长、网络用户每天的流量以及流量的上下行比例的关系。
     3)本文对用户业务偏好变化随时间变化的规律进行研究,通过定义一系列的分析指标,揭示了用户业务偏好变化率随时间尺度、业务偏好变点变化的规律:用户业务偏好变化率并不单纯随时间尺度的增大而降低,而是有条件的成立。并对这种规律进行了解释和分析。最后对在一个月的时间序列中,出现的最多的几种用户业务偏好模式变化序列进行了总结和展示。
     4)本文首次利用非齐次泊松过程对网络用户上下线行为进行建模分析。本文通过对实际数据使用假设检验的方法来证实了用户上下线确实符合非齐次泊松过程。接着利用非齐次泊松过程对用户上下线行为进行了建模,并在相关假设的条件下理论推导出了用户上下线概率的计算公式。最后对用户上下线概率的公式进行了理论验证和数据验证。此外本文还给出了不同用户组的用户上下线登录概率分布图,为进一步的研究打下了基础。
Today, with the advancement in information technology and the development of the conomic, internet has gotten a more and more important position in oue life.The rapid expansion of Internet, network traffic, network user and the number of host computers increases with the exponential growth. With the development of mobile devices, embedded systems and sensor networks, the new Internet element, Internet-scale will continue to grow in a long time. Internet applications have developed from simple traditional applications to real-time multimedia applications. Nowadays Internet applications have the new characteristics that sharing the resource and collaboration. However, the important reason for Internet reaching today's position is that the rapid development of Internet services, network application, diversified types of business and personal trends.
     However, the rapid booming Internet business also bring some troubles to telecommunication companies:if we do not know the mode of network service and its rules with time, we can not make better fees, targeted marketing strategies and monitoring the network; If we do not know the model of user online and offline behavior we can not get a reasonable server's load balancing, to give server the optimize performance.
     This paper studies the Internet user behavior. Using the actual backbone traffic data to analyze in order to grasping the prefer mode for the network user service, analyzing the rule of the model changing over time, and modeling the user online and offline behavior. In this case, the carriers can make the directional products marketing, according to the characteristics of customers; make the fees according to the user characteristics, providing the valuable reference to distinguish valuable clients and server load balancing, etc.
     1) This paper selects hierarchical clustering algorithms, which based upon the real provincial backbone network data and our purpose of analyzing the prefer mode for the network user service. And because of the defects of the hierarchical clustering algorithm, this paper introduces the improving the clustering algorithm to reduce the time complexity, the data results show that ,comparing our improved algorithm with the classical hierarchical clustering, our improved algorithm improve the efficiency of time has greatly increased by about 10 times,. Even comparing with the improved hierarchical clustering algorithm based on the minimum spanning tree, our algorithm is also faster than it about 3 times.
     2) According to the results of fast hierarchical clustering,this paper reveals the composition of the prefer mode for the network user service that based on the different time scales and the size distribution of every mode of network service.And this paper deeply analyzes the using frequency difference between the different modes of the network service, and therelationship between the modes of the network service and the users daily online duration, the daily flow of network users and the flow ratio of the up-flow and down-flow. This paper also analyzes and explains the forming reasons of the characteristics between distribution and relationship.
     3) This paper not only analyzes mode of the network service,using the improved hierarchical clustering algorithm, but also is the first paper that analyzes the time changing combining with the mode of the network service,studies the rules that the mode of the network service changes with time scales.By defining a series of indicators and processing the actual data, this paper reveals the relationship between the mode of the network service and the time scales, the critical point improvement,at the mean time, analyzes and explains the characteristics of the changing relationships . Then it summarizes and shows the changing sequence of the mode of the network service, which appears the most in a month's time series.
     4) This paper is the first to using the non-homogeneous Poisson process to model and analyze the network user online and offline behavior. This paper uses the method of hypothesis test to authenticate users online and offline meets non-homogeneous Poisson process, using the actual data. Then it uses the non-homogeneous Poisson process to model the user online and offline behavior, and we derived the probability formula of the user online and offline, based on the dependence assumptions. Finally, we make the theory verification and data validation whether the formula is right or wrong. The verify results confirm the conclusion reasonable. In addition, we also give the user online and offline log probability distribution picture based on the different mode of network service. It is a basis of the further research.
引文
[1]中国互联网信息中心(CNNIC)第23次中国互联网络发展状况统计报告(2009.1)http://www.cnnic.net.cn/uploadfiles/pdf/2009/1/13/92458.pdf.
    [2]用户行为艺术http://www.wgo.org.cn/Articles/243.htm.
    [3]何明升网络行为的哲学意义自然辩证法研究.2000,1:56.
    [4]周运清,苏娜,网络行为与社会控制.情报杂志.1999,5:11.
    [5]隋结方勇 群体网络行为模型研究与应用四川大学硕士毕业论文
    [6]赵佐,蔡皖东,田广利基于异常行为监控的僵尸网络发现技术研究《西北工大学报》2007第12期
    [7]黄光球胡晓婷刘通基于突变理论的网络异常行为分析方法《微电子学与计算机》2006第23期
    [8]IP网络用户行为分析方法的探讨http://hi.baidu.com/pp2p/blog/item/2ab9f6500fe6c65e1138c268.html.
    [9]Paolo Giudici.Applied Data Mining:Statistical Methods for Business and Industry. BEIJING:Publishing House of Electronics Industry.october2003:1-10.
    [10]电信全业务运营市场研究报告-IBM中文版 2008.3
    [11]2007-2008年中国电信运营商监测报告 北京华经纵横经济信息中心 2009
    [14]马力焦李成一种Internet的网络用户行为分析方法的研究微电子学与计算机,2005
    [15]Humberto T. Marques Nt. Characterizing Broadband User Behavior NRBC'04 2004
    [18]Kuai Xu, Zhi-Li Zhang, Supratik Battacharrya, Profiling Internet Backbone Traffic: Behavior Models and Applications. In:ACM Sigcomm 2005. Philadelphia, PA. August 2005.
    [19]W.E.Leland, M.S.Taqqu, W.Willinge, D.V.Wilson. On The Self-Similar Nature of Ethernet Traffic. (Extended version).IEEE/ACM Transaction Networking,1994, 2:1-15.
    [20]基于用户行为分析的内容推送http://hi.baidu.com/sigz/blog/item/dfd2ad6ea5f9f4df81cb4a2e.html
    [21]刘辉,蔡利栋.Linux进程行为的模式提取与异常监测.中国体视学与图像分 析.2003,9:166-168
    [22]董富强网络用户行为分析研究及其应用西安电子科技大学硕士论文2005。
    [23]胡庆林叶念渝朱明富数据挖掘中聚类算法的综述 《计算机工程》2007
    [24]丁继承基于聚类分析的电信客户细分系统研究与设计哈尔滨工业大学硕士毕业论文2006
    [25]陈敏苗夺谦段其国 基于用户浏览行为聚类Web用户 计算机科学2008
    [26]吴斌傅伟鹏一种基于群体智能的Web文档聚类算法计算机研究与发2002
    [27]马力焦李成一种基于路径聚类的Web用户访问模式发现算法计算机科2004
    [28]陈云飞刘玉树一种基于密度的启发性群体智能聚类算法北京理工大学2005
    [29]王永利关联规则挖掘算法及其Web挖掘上应用的研究哈尔滨工程大学硕士论文2003
    [30]潘蕾苏晶网络访问行为关联规则提取的研究与设计计算机应用与软件2003
    [31]戴臻基于特定模式树的用户行为关联规则挖掘算法计算机系统应用 2007
    [32]李贤鹏何松华改进的ID3算法在客户流失预测中的应用计算机工程与应2009
    [33]曾雪胡建华基于代价敏感的决策树的电信离网分析模型计算机与现代化2009
    [34]邹竞 谢鲲 C4.5算法在移动通信行业客户流失分析中的应用计算技术与自动化2009
    [35]Mofreh Hogo temporal web usage mining IEEE WL 03 2003
    [36]段隆振朱敏基于双Kohonen神经网络的Web用户访问模式挖掘算法计算机工程与科学2009
    [37]刘蓉陈鹏个性化网页推荐中基于神经网络的自适应用户模型研究电子测量技术2007
    [38]吴丽花刘鲁基于动态自组织映射网的用户兴趣建模方法计算机集成制造系统2006
    [39]Ron Hutchins Usage Characteristics of Dial-in Internet Users: A National Study 2002
    [40]Hutchins. Internet user access via dial-up networks-traffic characterization and statistics NSTL02 2002
    [41]Martin Halvey Time Based Patterns in Mobile-Internet Surfing CHI 2006 Proceedings 2006
    [42]谢春丽基于数据挖掘的Web行为特征分析与研究苏州大学硕士毕业论文2006
    [43]曾红月时序数据挖掘方法研究计算机工程与设计2009
    [44]概率论与数理统计(浙大第3版)盛骤/盛骤谢式千潘承毅编 2001高等教育 出版社
    [45]Paolo Giudici.Applied Data Mining:Statistical Methods for Business and Industry.BEIJING:Publishing House of Electronics Industry.october2003:1-10.
    [46]Michele Garetto, Daniel R. Figueiredoy, Rossano Gaeta." A Modeling Framework to Understand the Tussle between ISPs and Peer-to-Peer File Sharing Users".Dissertation Computer Science Department, University of Torino, Italy.
    [47]Andriantiatsaholiniaina, L.A., Trajkovic, L.. "Analysis of user behavior from billing records of a CDPD wireless network" Local Computer Networks,2002. Proceedings. LCN 2002.27th Annual IEEE Conference on 6-8 Nov.2002.
    [48]《2009年中国网络游戏市场白皮书》 文化部2009
    [49]中国互联网信息中心(CNNIC)第24次中国互联网络发展状况统计报告
    [50]《浅析二项分布、泊松分布和正态分布之间的关系》于洋企业科技与发展,2008年第20期
    [51]K. Fukuda, K. Cho, and H. Esaki. The impact of residential broadband traffic on Japanese ISP backbones. SIGCOMM CCR,35(1):15-21, Jan.2005.
    [52]Marcelo Maia, Jussara Almeida, Virgilio Almeida. Identifying User Behavior in Online Social Networks. European Conference on Computer Systems. Proceedings of the 1st workshop on Social network systems. Glasgow, Scotland. Pages:1-6 ISBN: 978-1-60558-124-8 Apr.2008
    [53]Adriano Pereira, Gustavo Franco, Leonardo Silva, Wangner Meira Jr. "A Hierarchical Characterization of User Behavior",2004
    [54]Yeming Hu, A. Nur Zincir-Heywood. Modeling User Behaviors from FTP Server Logs. Proceedings of the 4th Annual Communication Networks and Services Research Conference (CNSR'06) 2006
    [55]Daniel A.Menasce, Virgilio A.F. Almeida, Rodrigo Fonseca, Marco A. Mendes. A Methodology for Workload Characterization of E-commerce Sites. Proceedings of the 1st ACM conference on Electronic commerce table of contents Denver, Colorado, United States Pages:119-1281999
    [56]K. Xu, Z.-L. Zhang and S. Bhattacharyya. Profiling internet backbone traffic: Behavior models and applications. in Proc. ACM SIGCOMM, Aug.2005, pp.169-180.
    [57]Adriano Pereira, Gustavo Franco, Leonardo Silva, Wangner Meira Jr. "A Hierarchical Characterization of User Behavior",2004

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700