用户名: 密码: 验证码:
互联网话题演变与传播分析技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网技术的迅猛发展,网络舆情监管工作的重要性逐渐被人们认同。目前,网络舆情分析技术已经成为国内外的研究热点,并取得了一定的研究成果,主要的研究领域包括:话题检测、话题跟踪、自动摘要、趋势分析、舆情预警等。本文在已有研究工作的基础上,针对互联网话题演变和传播问题,力图在网络舆情分析技术领域做更深层次的研究,为网络舆情监管工作提供有力的技术支持。本文的主要研究内容有以下两个方面:
     提出基于多中心和向量分解的话题演变分析技术。为了解决话题漂移问题、呈现话题演变过程,采用多中心话题模型来描述话题的多个侧面,提出向量分解思想发现后续文档的新颖特征,从而实现对话题中心的建立和更新,最后结合增量聚类算法,提出了解决话题演变问题的完整方案。实验证明,该方案能够有效提高话题检测性能、清晰呈现话题演变过程。
     提出基于传播图和多元线性回归的话题传播分析技术。在论坛间,提出基于相似度比较和关键词匹配的转帖关系发现技术,并结合初始传播论坛发掘以及传播周期的计算,建立论坛话题传播图;在论坛内,提炼影响传播行为的指标体系,并结合多元线性回归理论,实现了对传播趋势的预测。实验证明,上述方案能够发现话题传播路径、准确预测话题传播趋势。
     综上所述,本文在研究和总结现有舆情分析技术的基础上,重点针对话题内容和行为特征,对话题演变和传播分析技术进行了研究,并通过实验验证方案的可行性和实际效果,为网络舆情监管工作的进步做出了贡献。最后,本文还展望了该领域的发展趋势。
With the rapid development of Internet technology, the importance of the supervision of network public opinion has been gradually recognized. At present, the analysis technology on network public opinion has become a hot topic at home and abroad, and has achieved some results including topic detection, topic tracking, automatic summary, trend analysis and early warning of public opinion. In order to solve the problem on topic evolvement and diffusion in network, in this thesis we tried to provide strong technical support for the supervision of network public opinion. The main content of this thesis are as follows.
     The analysis technology of the topic evolvement has been studied based on multi-center and vector decomposition. To address the topic drift problem and show the process of the topic evolvement, we use the multi-center topic model to describe the aspects of a topic. In order to achieve the establishment and updating of the topic center we apply the vector decomposition method to find the new features of follow-up documents. Then, we put forward the project to solve the topic evolvement problem combined with incremental clustering. Experimental results show that the project can effectively improve the performance of the topic detection, and show the process of the topic evolvement clearly.
     The topic diffusion analysis technology based on diffusion map and multivariate linear regression has been studied. Among Bulletin Board System(BBS), we study the technology to discover transmitting relations based on comparing similarity and matching words. We establishes the diffusion map on BBS to explore the initial diffusion BBS and computing diffusion period. In BBS, we study the index system affecting topic diffusion to achieve the diffusion trend forecast based on multivariate linear regression theory. Experiments show that the project can detect the topic diffusion path, and predict the topic diffusion trend accurately.
     To sum up, based on researching and summarizing the analysis technology on network public opinion, in this paper we focus on the content and behavior characteristics of the topic and analyze topic evolvement and diffusion. In experiments we validate feasibility and actual results of the project. It contributes to the advancement of the supervision of network public opinion. Finally, this paper also analyze the trend of development in the field.
引文
[1]洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述.中文信息学报.2007,21(6):71-84页
    [2]Charles L.Wayne.Multilingual Topic Detection and Tracking:Successful Research Enabled by Corpora and Evaluation.In proceedings of the 2nd International Conference on Language Resources & Evaluation(LREC 2000),2000:1487-1494P
    [3]王会珍,张希娟,朱靖波,张斌.基于主动学习的自适应话题追踪2006.中国中文信息学会二十五周年学术会议论文集.2006:373-381页
    [4]王会珍,朱靖波,季铎,叶娜,张斌.基于反馈学习自适应的中文话题追踪.中文信息学报.2006,20(3):92-98页
    [5]黄萱菁,夏迎炬,吴立德.基于向量空间的文本过滤.软件学报.2003,14(3):435-442页
    [6]谭应伟,莫倩.基于Web的有监督自适应话题追踪系统的设计与实现.郑州大学学报.2007,39(2):25-29页
    [7]莫倩,刘书家,李凯.主题追踪系统的研究与实现.计算机工程与应用.2006,02(179):179-181页
    [8]洪宇,张宇,刘挺,郑伟,龚诚,李生.基于层次聚类的自适应信息过滤学习算法.中文信息学报.2007,21(3):47-52页
    [9]Juha Makkonen.Investigations on event evolution in TDT.Proceedings of HLT-NAACL.2003:43-48P
    [10]宋丹,王卫东,陈英.基于改进向量空间模型的话题识别与跟踪.计算机技术与发展.2006,16(9):63-67页
    [11]李听,朱永盛,武港山.论坛消息的语义漂移分析.计算机工程.2006,32(4):88-93页
    [12]赵华,赵铁军,张姝,王浩畅.基于内容分析的话题检测研究.哈尔滨工业大学学报.2006,38(10):1740-1743页
    [13]王会珍,朱靖波,季铎,张斌.基于多向量模型的中文话题追踪.全国第八届计算语言学联合学术会议论文集.2005:669-671页
    [14]王会珍.面向话题追踪的特征选取与文本表示技术的研究.东北大学硕士学位论坛.2004:39-44页
    [15]赵华,赵铁军,于浩,张姝.面向动态演化的话题检测研究.高技术通信.2006,16(12):1230-1235页
    [16]贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法.计算机研究与发展.2004,41(7):1273-1280页
    [17]贾自艳.Web信息智能获取若干关键问题研究.中国科学院研究生院博士学位论文.2004:83-97页
    [18]Ramesh Nallapati,Ao Feng,Fuchun Peng,James Allan.event threading within news topic,information retrieval and knowledge management.2004:446-453P
    [19]金珠,林鸿飞,赵晶.基于HowNet的话题跟踪及倾向性分类研究.情报学报.2005,24(5):555-561页
    [20]金珠.基于知网的话题跟踪和倾向性跟踪研究.大连理工大学硕士学位论文.2005:26-47页
    [21]吴平博,陈群秀,马亮.基于事件框架的事件相关文档的智能检索研究.中文信息学报.2003,17(6):25-31页
    [22]林鸿飞,宋丹,杨志豪.基于语义框架的话题跟踪方法.中国中文信息学会二十五周年学术会议论文集.2006:383-392页
    [23]Xiaojun Wan,Jianwu Yang.Learning Information Diffusion Process on the Web.WWW 2007 Poster Paper.2007:1173-1174P
    [24]Avare Stewart,Ling Chen,Raluca Paiu,Wolfgang Nejdl.Discovering Information Diffusion Paths from Blogosphere for Online Advertising.ADKDD'07.2007:46-53P
    [25]宫辉,徐渝.高校BBS社群结构与信息传播的影响因素.西安交通大学学报.2007,21(1):93-96页
    [26]白淑英,何明升.BBS互动的结构与过程.社会学研究.2003,5:8-18页
    [27]于静,赵燕平.基于社会网络分析的BBS内容安全动态监测模型.北京 理工大学学报.2006,26(1):319-328页
    [28]张嘉龄,李茂青.博客信息传播的网络模型构建.软件导刊.2008,7(5):67-69页
    [29]刘常昱,胡晓峰,司光亚,罗批.基于小世界网络的舆论传播模型研究.系统仿真学报.2006,18(12):3608-3610页
    [30]Naohiro Matsumura,David E.Goldberg,Xavier Llor.Mining Social Networks in Message Boards.Illinois Genetic Algorithms Laboratory Department of General Engineering University of Illinois at Urbana-Champaign.2005:1-12P
    [31]Naohiro Matsumura,David E.Goldberg,Xavier Llor.Mining Directed Social Network from Message Board.International World Wide Web Conference.2005:1092-1093 P
    [32]Naohiro Matsumura.Modeling Influence Diffusion in Human Society.Graduate School of Economics,Osaka University.2006:137-153 P
    [33]Naohiro Matsumura,Yukio Ohsawa,Mitsuru Ishizuka.Discovery of Emerging Topics between Communities on WWW.Proceedings of the First Asia-Pacific Conference on Web Intelligence.2001:473-482P
    [34]Naohiro Matsumura,Yukio Ohsawa,Mitsuru Ishizuka.Future Directions of Communities on the Web.Lecture notes in computer science.2001:435-443P
    [35]钱斌.餐饮类论坛中口碑再传播现象的实证研究与仿真模拟.浙江大学硕士学位论文.2008:35-93页
    [36]刘京娟.多元线性回归模型检验方法.湖南税务高等专科学校学报.2005,5(18):48-59页

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700