基于参考文档模型的个性化Web检索研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于参考文档模型的个性化Web检索研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Personalized WEB Search Based on Reference Document Model
作者：李大任
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：查询日志分析 ; 个性化 ; 查询推荐 ; 参考文档模型
英文关键词：query log analysis ; personalization ; query recommendation ; reference document model
学位年度：2011
导师：李生 ; 杨沐昀
学科代码：081202
学位授予单位：哈尔滨工业大学
论文提交日期：2011-06-01

摘要

随着计算机和互联网的迅速普及,人类进入了信息时代,各种信息资源呈现出了爆炸式地增长。在大量的信息中帮助用户更加准确地找到他们想要的信息就成为了信息检索的重要任务。然而传统的信息检索技术大部分都是基于字符串匹配的,他们已经很难满足用户越来越个性化的需求。为了解决这一问题,本文从个性化的动机出发,尝试了实现个性化搜索引擎的不同的技术,主要分成以下三个方面的研究:
     (1).个性化潜力分析。在本章中,我们首先从数量的角度证实了在网页搜索引擎的查询日志中不同与其他用户的点击数量要多远于被重复的点击数量。然后我们引入Kappa统计量对在同一个查询下的不同用户的点击的一致程度进行了度量。Kappa值的分布显示用户的点击的一致程度是很难用“一刀切”的网页搜索引擎满足的。最后我们引入了“个性化潜力”指标给出了大概什么样的查询能够从个性化中获益更多。
     (2).基于参考文档模型的个性化Web检索。本章中我们引入了参考文档模型对用户的历史点击文档进行建模并以反馈的方式个性化不同用户相同查询的搜索结果。我们分别在向量空间和概率空间下对参考文档模型的性能进行了实验。实验结果表明,不论是在向量空间还是在概率空间下,参考文档模型都能够从用户的历史点击的文档中对用户的个性进行很好地建模,并将这种个性很好地融入检索过程当中。
     (3).基于多信息融合的查询推荐。本章中我们就如何使用查询日志中记录的用户群组的历史来实现个性化的查询进行了研究。具体地说,我们首先通过对美国在线的查询日志的分析验证了将其他查询历史相似的用户的查询进行相互推荐的可行性,然后使用了机器学习算法对多种用户查询历史序列的相似度指标进行了融合,并根据融合后的相似度找出查询历史最相近的用户将他们的查询推荐出来。在搜狗的查询日志中的实验结果证实了这种方法确实能够有效地将相似的用户的查询排在了前面。此外,我们还对基于用户群组的点击推荐进行了一定的探索。
With the development and wide spread of computer and Internet, men have entered the information epoch. The information resources have grown explosively. Thus, how to help internet users exactly find the information that they want becomes an important mission of the information retrieval. Considering that most of traditional information retrieval techniques are based on string matching, they are hardly able to fulfill the more and more individualized information needs. In order to resolve this issue, this paper confirms the motivation of personalization through query log analysis and tries some methods to provide personal service for web users. In details, this paper makes the following contributions:
     1. Potential for personalization in web search. In this section, we first demonstrate that there are more clicks which are different from other than those repetitive clicks. Then we employ the statistic Kappa to characterize the overall consistency of users’clicks on the same query. The distribution of Kappa values, together with query submission, further reveal that the consistency level of clicks is hard to be satisfied by one-size-fits-all web search engine. Finally, we calculate potential for personalization to present an overview of what queries can benefit more from individual user information.
     2. Personalized web search based on reference document model (RDM). In this section, we introduce RDM to build user preference model from the users’clicked web pages and then personalize the different users’search results on the same query through the feedback from the model. We respectively examine the performance of the RDM in the vector space and probabilistic space. The results of our experiments represent that, whether in the vector space or probabilistic space, RDM is able to properly model users’preference and incorporate it into the process of retrieval.
     3. Query recommendation based on multiple information fusion method. In this section, we conduct research on how to exploit the history of user group recorded in query log to implement the personalized query recommendation. Specifically, we first verify the conjecture that it is proper to recommend the queries issued by a user group who share some common search history with the one to be recommended. Then we propose a query recommendation method which finds the preference related queries through ranking users by the sequence similarity of users’query histories. We investigate various measures for user history similarity and employ RankingSVM to fuse these measures to predict the similarity of users. Empirical experimental results indicate that recommending queries issued by the users who have similar search history can effectively predict the subsequent query.

引文

[1]. Mark L. An Introduction to Search Engines and Web Navigation[M]. Addison Wesley, 2006
    [2]. Craig S., Monika H., Hannes M. and Michael M. Analysis of a very large web search engine query log[C]//SIGIR Forum.1999, 33(1):6-12.
    [3]. Steve C.-T. and Bruce C. Quantifying Query Ambiguity[C]//Proceedings of HLT’02. 2002:613-622.
    [4]. Paul A. C., Wolfgang N., Raluca P. and Christian K. Using ODP Metadata to Personalize Search[C]//Proceedings of SIGIR’05, 2005:178-185
    [5]. Alexander P. and Susan G. Ontology Based Personalized Search[C]// Proceedings of ICTAI’99, 1999:391-398
    [6]. John M. C. and Mary B. R.. Paradox of the Active User. Interfacing Thought: Cognitive Aspects of Human-Computer Interaction[J]. Cognitive Engineering Cognitive Science, 1987, 77:80-111
    [7]. Fang L., Clement Y. and Weiyi M. Personalized Web Search by Mapping User Queries to Categories[C]//Proceedings of CIKM’02. 2002:558-565.
    [8]. Kazunari S., Kenji H. and Masatoshi Y. Adaptive Web Search Based on User Profile Constructed Without Any Effort From Users[C]//Proceedings of WWW’04, 2004: 675-684.
    [9]. Bin T., Xuehua S., and ChengXiang Z. Mining Long-term Search History to Improve Search Accuracy[C]//Proceedings of KDD’06, 2006:718-723.
    [10]. Micro S. and Susan G. Personalized Search Based on User Search Histories[C]//Proceedings of WI’05, 2005:622-628.
    [11]. Lawrence P., Sergey B., Rajeev M. and Terry W. The Pagerank Citation Ranking: Bringing Order to the Web[D]. Technical Report, Computer Science Department, Stanford University. 1998
    [12]. Taher H. H. Topic-sensitive Pagerank[C]//Proceedings of WWW’02, 2002:213-222
    [13]. Glen J. and Jennifer W. Scaling Personalized Web Search[C]//Proceedings of WWW’03. 2003:271-279
    [14]. Francisco T. and Lik M. Persona: A Contextualized and Personalized WebSearch[C]//Proceedings of HICSS’02, 2002:53-60
    [15]. Uichin L., Zhenyu L. and Junghoo C. Automatic Identification of User Goals in Web Search[C]//Proceedings of WWW’05, 2005:391-400.
    [16]. Jaime T., Susan T. D., and Eric H. Beyond the Commons: Investigating the Value of Personalizing Web Search[C]//Proceedings of PIA’05, 2005:82-92
    [17]. Paul A. C., Claudiu F., and Wolfgang N. Summarizing Local Context to Personalize Global Web Search[C]//Proceedings of CIKM’06, 2006:287-296
    [18]. Bernard J. J., Amanda S. and Tefko S. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web[J]. Information Processing and Management, 2000, 36(2):207-227.
    [19]. James P., Hinrich S., Todd C., Rob C., Don T., Andy E., Eytan A. and Thomas B. Personalized search[J]. Communications of ACM. 2002, 45(9):50-55.
    [20]. Zhicheng D., Ruihua S., and Ji-Rong W. A Large-scale Evaluation and Analysis of Personalized Search Strategies[C]//Proceedings of WWW’07. 2007:581-590.
    [21]. Barry S. A Community-Based Approach to Personalizing Web Search[M]. IEEE Computer, 2007,40(8): 42-50.
    [22]. Jonathan L. H., Joseph A. K., Al B. and John R. An algorithmic framework for performing collaborative filtering[C]//Proceedings of SIGIR’99, 1999:230-237.
    [23]. Kai Y., Anton S., Volker T., Xiaowei X. and Hans-Peter K. Probabilistic Memory-based Collaborative Filtering[J]. In IEEE Transactions on Knowledge and Data Engineering. 2004, 16(1):56-59
    [24]. Arnd K. and Bernard M. Clustering for Collaborative Filtering Applications[C]//Proceedings of CIMCA’99, 1999:199-204
    [25]. John C. Collaborative Filtering with Privacy via Factor Analysis[C]// Proceedings of SIGIR’02, 2002:45-57.
    [26]. Jian-tao S., Hua-Jun Z., Huan L., Yuchang L. CubeSVD: A Novel Approach to Personalized Web Search[C]//Proceedings of WWW’05, 2005:382-390.
    [27]. Xuehua S., Bin T. and Chengxiang Z. Implicit User Modeling for Personalized Search[C]//Proceedings of CIKM’05, 2005:824-831
    [28]. Jaime T., Susan D., and Eric H. Personalizing Search via Automated Analysis of Interests and Activities[C]//Proceedings of SIGIR’05, 2005:449-456.
    [29]. Jaime T., Susan D. and Eric H. Characterizing the Value of Personalizing Search[C]//Proceedings of SIGIR’07, 2007:757-758
    [30]. Jacob C. A Coefficient of Agreement for Nominal Scales[J]. Educational and Psychological Measurement, 1960, 20: 37-46
    [31]. Joseph F. Measuring Nominal Scale Agreement Among Many Raters[J]. Psychological Bulletin, 1971, 76(5):378-382
    [32]. Jaime T., Susan D. and Daniel L. To personalize or Not to Personalize: Modeling Queries with Variation in User Intent[C]//Proceedings of SIGIR’08, 2008:163-170
    [33]. Richard L. and Gary K. The Measurement of Observer Agreement for Categorical Data[J]. Biometrics, 1977, 33(1):159-174
    [34].齐浩亮.信息检索模型理论和方法的研究[D].哈尔滨工业大学博士学位论文, 2007
    [35]. Kalervo J. and Jaana K. IR Evaluation Methods for Retrieving Highly Relevant Documents[C]//Proceedings of SIGIR’00, 2000:41-48
    [36]. Gerard S., Andrew W. and Chung Y. (1975), A Vector Space Model for Automatic Indexing[J]. Communications of the ACM. 1975, 18(11):613–620.
    [37]. Stephen R., Steve W., Susan J., Miceline B., and Mike G. Okapi at TREC-3[C]//Proceedings of the Third Text REtrieval Conference (TREC), 1994.
    [38]. Rocchio J. Relevance Feedback in Information Retrieval[J]. In the SMART Retrieval System. 1971:313-323
    [39]. Jay P. and Bruce C. A Language Modeling Approach to Information Retrieval[C]//Proceedings of the ACM SIGIR’98, 1998:275-281.
    [40]. Frederick J. and Robert M. Interpolated Estimation of Markov Source Parameters from Sparse Data[C]//Proceedings of the Workshop on Pattern Recognition in Practice, 1980:381-397
    [41]. David M. and Linda P. A Hierarchical Dirichlet Language Model[J]. Natural Language Engineering, 1995, 1(3):1-19.
    [42]. Hermann N., Ute E. and Reinhard K. On Structuring Probabilistic Dependences in Stochastic Language Modeling[J]. Computer, Speech, and Language, 1994, 8(1):1-38
    [43]. Chengxiang Z. and John L. Model-based Feedback in the Language Modeling Approach to Information Retrieval[C]//Proceedings of CIKM 2001, 2001:403-410
    [44]. Mark M. and Cornelis R. The Potential and Actual Effectiveness of InteractiveQuery Expansion[C]//Proceedings of SIGIR’97, 1997:324-332
    [45]. Larry F. and Mei D. Automatic Feedback using Past Queries: Social Searching[C]//Proceedings of SIGIR’97, 1997:306-313
    [46]. Paul C., Claudiu F., and Wolfgang N. Personalized Query Expansion for the Web[C]//Proceedings of SIGIR 2007, 2007:7-14.
    [47]. Shuang L., Fang L., Clement Y. and Weiyi M. An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases[C]//Proceedings of SIGIR 2004, 2004: 266-272.
    [48]. Xuehua S., Bin T. and Chengxiang Z. Context-sensitive Information Retrieval using Implicit Feedback[C]//Proceedings of SIGIR 2005, 2005:43-50.
    [49]. Mehran S. and Timothy H. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets[C]//Proceedings of WWW 2006, 2006:377-386
    [50]. Reiner K. and Jason Z. Mining Anchor Text for Query Refinement[C]// Proceedings of WWW 2001, 2001:282-289.
    [51]. Hang C., Ji-Rong W., Jian-Yun N. and Wei-Ying M. Probabilistic Query Expansion using Query Logs[C]//Proceedings of WWW 2002, 2002:325-332.
    [52]. Doug B. and Adam B. Agglomerative Clustering of a Search Engine Query Log[C]//Proceedings of KDD 2000, 2000:407-416
    [53]. Ji-Rong W., Jian-Yun N. and Hong-Jiang Z. Clustering User Queries of A Search Engine[C]//Proceedings of WWW 2001, 2001:162-168.
    [54]. Chien-Kang H., Lee-Feng C. and Yen-Jen O. Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs[J]. Journal of the American Society for Information Science and Technology. 2003, 54(7):638-649.
    [55]. Bruno F., Paulo G., Bruno P., Berthier N. and Nivio Z. Concept-based Interactive Query Expansion[C]//Proceedings of CIKM 2005, 2005:696-703.
    [56]. Bernard J., Amanda S., Chris B., and Sherry K. Defining A Session on Web Search Engines[J]. Journal of The American Society for Information Science and Technology, 2007, 58(6):862-671.
    [57]. Daqing H., Ayse G. and David H. Combining Evidence for Automatic Web Session Identification[J]. Information Processing and Management, 2002, 38(5):727-742.
    [58]. Seda O. Automatic New Topic Identification using Multiple LinearRegression[J]. Information Processing and Management, 2006, 42(4):934-950.
    [59]. Soo R. and Hong X. Patterns and Sequences of Multiple Query Reformulations in Web Searching: A Preliminary Study[C]//Proceedings of the 64th ASIST Annual Meeting. 2001(38):246-255.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700