用户名: 密码: 验证码:
基于WordNet的语义相似性度量及其在查询推荐中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语义相似性度量一直以来都是人工智能、心理学、认知科学等领域的研究热点,并有着非常广泛的应用。作为自然语言处理技术的重要内容,它所依赖的语言知识表示中最重要的初始环节就是语义词典。一部能够表达概念关系的语义词典是自然语言处理工作中不可或缺的基础性资源。美国Princeton大学开发的WordNet就是语义词典的优秀范例。其基本思想简单明确,形式化做得彻底。目前,WordNet已成为一个事实上的国际标准,其框架的合理性已被词汇语义学界和计算词典学界所公认。
     与此同时,伴随数据爆炸性增长,人们越来越依赖搜索系统来获取信息。查询推荐技术成为近年来搜索领域的研究热点。其可弥补当前Web搜索方式在表达能力上的局限性,辅助用户更好地表达查询意图。随着查询推荐技术应用研究的深入,查询词信息稀疏、内部信息缺失严重,使得查询推荐技术面临许多挑战,并严重制约查询推荐技术的进一步推广和应用。而将语义相似性度量研究成果推广到查询推荐的研究中,可有效解决查询词信息稀疏等关键问题,是未来发展的重要方向。
     基于上述背景,本文首先探讨了国内外有关语义相似性度量和查询推荐的研究现状,从语义层面表示数据,围绕语义相似性度量,建立了基于WordNet概念拓扑结构的信息内容IC模型;提出基于概念自身内容IC和路径信息混合的语义相似度算法;进而将算法应用到相似查询判定中。本文主要创新和贡献如下:
     1.在WordNet语义相似性参数研究方面,提出了基于概念拓扑结构的信息内容IC模型。概念信息内容IC是概念语义相似度算法的参数,对语义相似度算法的性能具有决定性作用。本文提出的新模型不需要任何语料库的参与,概念节点所包含的信息内容,取决于该节点及其子孙节点的拓扑结构,IC值是该节点自身及其子孙节点排列方式的函数,包括该概念节点的深度,子孙节点的数目以及每个子孙节点的深度。实验结果表明:该模型性能明显优于其它IC模型,能够有效的区分开不同概念,使得概念的信息内容IC获取更为精准。
     2.在WordNet语义相似性度量方法研究方面,提出基于概念自身内容IC和路径信息混合的语义相似度算法。该算法不仅反映了概念节点在语义分类树中的路径信息,也反映了语义密度信息,即:将概念的信息内容IC和在语义分类树中的路径信息都考虑在内。实验结果证实:此算法较国内外学术界已有算法更接近人类的判断,性能更优。
     3.在相似查询度量方面,提出基于语义的相似查询度量方法。相似查询度量是后续查询推荐的核心问题。该方法从语义层面表示数据,兼顾用户检索词项的相似性以及用户点击文档内容的相似性。在此新方法基础上,通过实验聚类相似查询,形成相似查询扩展字典。实验结果显示:该算法能更精准地捕获相似查询,为后续的查询推荐奠定了良好的基础。
     4.在查询推荐方法研究方面,提出了基于主题的查询推荐方法。该方法充分考虑了用户查询主题与session中query的关联性、推荐query与初始query在语义上的包含关系、相似程度等因素。实验结果证实:基于本文提出的推荐方法,能更准确捕获用户查询意图,大幅提高搜索准确率。
     本文研究成果具备一定的学术理论价值,并已初步成功应用到了信息检索领域,未来可进一步推广到网页分类、问答系统、广告推送、电子商务等多种信息领域,具备较高的商业应用价值和宽广的应用前景。
Semantic similarity metric is a hot topic for many years in artificial intelligence, psychology, and cognitive science. Nowadays, it has been successfully applied in many fields. As a key issue of natural language processing, the most important aspect is semantic dictionary. One semantic dictionary that can express the relations between concepts is indispensable resources. WordNet developed by Princeton University is an excellent example. Its basic idea is simple and clear. Currently, WordNet has become a de facto international standard and the reasonableness of its framework has been recognized by lexical semantics field and computing dictionary filed.
     At the same time, with the explosive growth of data, more and more people rely on search engine to obtain information. Query suggestion becomes a hot topic, which can help users to better articulate query intention. With query suggestion more and more important, query information sparse problems make query suggestion face many challenge. This is seriously restricting query suggestion for further application. Using semantic similarity measure to promote the research of query suggestion is an effective solution, which is important direction for further research.
     Based on the discussion above, the dissertation represents data from the level of semantic and focuses on concepts'semantic similarity. Furthermore semantic similar measure is applied into similar query metric. The main contributions of this dissertation are as follows.
     1. The dissertation proposed an IC model in WordNet based on concept's topology. Different from previous work, the new model is corpora independent. The information content of a concept is the function of the topology of itself and its descendants. Experiment shows that the new model is able to provide more accurate similarity evaluation and achieves significant performance than related work.
     2. The dissertation proposed an effective algorithm for semantic similarity metric of word pairs in WordNet. Different from previous work, in the new algorithm not only path length, but also IC values have been taken into account, which can distinguish different concept pairs effectively. We evaluate our algorithm on the data set of Rubenstein and Goodenough, which is traditional and widely used. Coefficients of correlation between human ratings of similarity based on seven algorithms are calculated. Experiments show that the coefficient of our proposed algorithm with human judgment is0.8820, which demonstrate that our new algorithm significantly outperformed others.
     3. The dissertation proposed a query similarity metric algorithm based on semantic analysis. Different from previous work, the new algorithm represents data from the level of semantic. It takes full consideration the information of keywords and user clickthrough, mining the relations of queries. Experiments show that clustering queries based on the new algorithm can more accurately capture the similarity query than related works.
     4. This dissertation presents a query suggestion algorithm which is topic oriented. Different from previous work, the new algorithm takes full consideration of query relations in meaning; similarity values, query context and so on, and then suggests the similar queries to user. Experiments show that the new algorithm can effectively improve the precision of Web search.
     The achievements of this paper have high academic value. They have been successfully applied into the field of information retrieval. Furthermore they can be extended to web page classification, Q-A system, advertisement pushing, E-commerce and so on, which indicate a larger commercial value and broader application prospects.
引文
[1]Yuhua Li, Zuhair A. Bandar, David McLean. An Approach for Measuring Semantic Similarity Between Words Using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering,2003,15 (4):871-882.
    [2]Rohini K. Srihari, Zhongfei Zhang, Aibing Rao. Intelligent Indexing and Semantic Retrieval of Multimodal Documents. Information Retrieval,2000,2(2):245-275.
    [3]Roy Rada, Hafedh Mili, Ellen Bicknell, Maria Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics,1989,19(1):17-30.
    [4]S.Patwardhan, S. Banerjee, T. Pedersen. Using Measures of Semantic Relatedness for Word Sense Disambiguation. In Proceedings of 4th International Conference on Computational Linguistics and Intelligent Text Processing,2003:241-257.
    [5]Hideki Kozima. Computing Lxical Cohesion as a Tool for Text Analysis. Doctoral Thesis, Computer Science and Information Math., Graduate School of Electro-Comm., University of Electro-Comm.,1994.
    [6]Ali Ghobadi Tapeh, Maseud Rahgozar. A Knowledge-based Question Answering System for B2C eCommerce. Knowledge-Based System,2008,21 (8):946-950.
    [7]Yolanda Blanco-Fernandeza, Jose J. Pazos-Ariasa, Alberto Gil-Sollaa, Manuel Ramos-Cabrera, Martin Lopez-Noresa, Jorge Garcia-Duquea, Ana Fernandez-Vilasa, Rebeca P. Diaz-Redondo, Jesus Bermejo-Munoz. A Flexible Semantic Inference Methodology to Reason about User Preferences in Knowledge-based Recommender Systems. Knowledge-Based System,2008,21 (4):305-320.
    [8]John Atkinson, Anita Ferreira, Elvis Aravena. Discovering Implicit Intention-level Knowledge from Natural-Language Texts. Knowledge-Based System,2009,22 (7): 502-508.
    [9]Mark Stevenson, Mark A. Greenwood. A Semantic Approach to IE Pattern Induction. In Proceedings of 43rd Annual Meeting on Association for Computational Linguistics,2005:379-386.
    [10]颜伟,荀恩东,基于WordNet的英文词语相似度计算,第二届全国学生计算语言学研讨会论文集,2004。
    [11]http://www.cnnic.net.cn/uploadfiles/2009/9/21/104149.doc,2013-05-16
    [12]Hang Cui, Ji-Rong Wen, Jian-Yun Nie, Wei-Ying Ma. Probabilistic Query Expansion Using Query Logs. In Proceedings of the 11th International Conference on World Wide Web,2002:325-332
    [13]R. Song, Z. Luo, J. R. Wen, Y. Yu, and H. W.Hon. Identifying Ambiguous Queries in Web Search. In Proceedings of the 16th International Conference on World Wide Web,2007:1169-1170.
    [14]Bernard J. Jansen, Amanda Spink, Judy Bateman, Tefko Saracevic, T. Real life Information Retrieval:A Study of User Queries on the Web. ACM SIGIR Forum, 1998,32(1):5-17.
    [15]Wang Jimin, Chen Chong, Peng Bo. Analysis of the User Log for a Large Scale Chinese Search Engine. Journal of South China University of Technology (Natural Science),2004,32 (SUPPL):1-5.
    [16]A. Spink, B. J. Jansen. A Study of Web Search Trends. Webology,2004,1(2).
    [17]M. Stro hmaier, M. Kroll, C. Korner. Intentional Query Suggestion:Making User Goals More Explicit, In Proceedings of the 2009 Workshop on Web Search Click Data, 2009:68-74.
    [18]Nemeth Y, Shapira B, Taieb-Maimon M. Evaluation of the Real an4 Perceived Value of Automatic and Interactive Query Expansion. In Proceedings of SIGIR,2004: 526-527
    [19]李亚楠,王斌,李锦涛,搜索引擎查询推荐技术综述.中文信息学报.2010,24(6):75-84,2010
    [20]https://adwords.google.cn/,2013-03-26
    [21]http://e.baidu.com/pro/,2013-03-26
    [22]Alice Lee, Michael Chau. The Impact of Query Suggestion in E-commerce Websites. Lecture Notes in Business Information Processing,2012, Vol.108:248-254.
    [23]Mohammad Al Hasan, Nish Parikh, Gyanit Singh, Neel Sundaresan. Query Suggestion for E-commerce Sites. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining,2011:765-774.
    [24]J. Jeon, W. B. Croft, and J. H. Lee. Finding Similar Questions in Llarge Question and Answer Archives. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management,2005:84-90.
    [25]P. A. Chirita, C. S. Firan, and W. Nejdl. Personalized Query Expansion for the Web. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2007:7-14
    [26]http://baike.baidu.com/link?url=qS8QTM82tkmMFnI5Wdq4fOv4jKeo27vPWO4 -5CxA33UL P-D60t0z73Y0ArKn Luc
    [27]D.S.Blough. The Perception of Similarity. Aviam Visual Cognition. Department of Psychology, Brown University, http://www.pigeon.psy.tufts.edu/avc/dblough/, 2006.
    [28]Nuno Alexandre Lopes Seco. Computational Models of Similarity in Lexical Ontologies. Master Thesis, University College Dublin,2005.
    [29]刘群,李素建.基于《知网》的词汇语义相似度计算.计算语言学及中文信息处理,2002,7(2):59-76。
    [30]A.Tversky. Features of Similarity. Psychological Review.1977,84(4):327-352.
    [31]Kenneth Ward Church, Patrick Hanks. Word Association Norms, Mumal Information, and Lexicography. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics. Vancouver, B. C,1989:76-83.
    [32]Fred Keller, M.Lapata. Using the Web to Obtain Frequencies for Unseen Bigrams. Computational Linguistics.2003,29(3):459-484.
    [33]E Attneave. Dimensions of similarity. American Journal of Psychology.1950, 63(4):516-556.
    [34]E Gregory Ashby, Nancy A. Perrin. Toward a Unified Theory of Similarity and Recognition Psychological Review.1988,95(1):124-150.
    [35]Eleanor Rosch. Cognitive Reference Points. Cognitive Psychology.1975,7(4): 532-547-
    [36]Emst Z. Rothkopf. A Measure of Stimulus Similarity and Errors in Some Paired-Associate Learning Tasks. Journal of Experimental Psychology 1957,53(2):94-101-
    [37]Michael Sussna. Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network. In Proceedings of the Second International Conference on Information and Knowledge Management.1993:67-74.
    [38]Fellbaum C. WordNet:An Electronic Lexical Database. Language, Speech, and Communication, MIT Press, Cambridge, USA,1998.
    [39]http://wordnet.princeton.edu/man2.1/wnstats.7WN.html,2012-06-15
    [40]http://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html#toc,2013-06-15
    [41]Mohsen Pourvali, Mohammad Saniee Abadeh. Automated Text Summarization Base on Lexicales Chain and Graph Using of WordNet and Wikipedia Knowledge Base, International Journal of Computer Science Issues,2012,9(1):343-349.
    [42]Christos Bouras, Vassilis Tsogkas. A Clustering Technique for News Articles Using Knowledge-Based Systems,2012,36(6):115-128.
    [43]Che-Yu Yang, Shih-Jung Wu. Semantic Web Information Retrieval Based on the wordnet. International Journal of Digital Content Technology and Its Applications, 2012,6(6):294-302.
    [44]Che-Yu Yang, Hua-Yi Lin. An automated Semantic Annotation Based-on Wordnet Ontology. In Proceeding of 6th International Conference on Networked Computing and Advanced Information Management,2010:682-687.
    [45]Jer Lang Hong. Data Extraction for Deep Web Using WordNet. IEEE Transactions on Systems, Man, and Cybernetics-Part C:Applications and Reviews, 2011,41(6):854-868.
    [46]Andrzej Sieminski. WordNet Based Word Sense Disambiguation. In Proceeding of the Third International Conference on Computational Collective Intelligence: Technologies and Applications,2011:405-414.
    [47]S. G. Kolte, S. G. Bhirud. Word Sense Disambiguation Using WordNet Domains. In Proceeding of 1st International Conference on Emerging Trends in Engineering and Technology,2008:1187-1191.
    [48]Egoitz Laparra, German Rigau. Integrating WordNet and FrameNet Using a Knowledge-based Word Sense Disambiguation Algorithm. In Proceeding of International Conference Recent Advances in Natural Language Processing,2009: 208-213.
    [49]Zhao Tian-Zhong, Miao Zhuang, Zhang Ya-Fei, Xu Wei-Guang, Lu Jian-Jiang. Reusing WordNet for Building Domain Ontology. Journal of System Simulation, 2007,19(19):4583-4586.
    [50]Giovanni Semeraro, Pasquale Lops, Marco Degemmis. A Content-collaborative Recommender that Exploits WordNet-Based User Profiles for Neighborhood Formation. User Modelling and User-Adapted Interaction,2007,17(3):217-255.
    [51]Malik Muhammad Saad Missen, Mohand Boughanem. Using Wordnet's Semantic Relations for Opinion Detection in Blogs. In Proceeding of the 31th European Conference on IR Research on Advances in Information Retrieval,2009:729-733.
    [52]Maristella Agosti, Franco Crivellari, Giorgio Maria Di Nunzio. Web Log Analysis: A Review of a Decade of Studies about Information Acquisition, Inspection and Interpretation of User Interaction. Data Mining and Knowledge Discovery,2012, 24(3):663-696.
    [53]Z.Bar-Yossef and M.Gurevich. Mining Search Engine Query Llogs Via Suggestion Sampling. In Proceedings of PVLDB,2008,1(1):54-65.
    [54]Wei Wu, Hang Li, Jun Xu, Learning Query and Document Similarities From Click-through Bipartite Graph with Metadata. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining,2013:687-696.
    [55]E. Eftheimiadis. Query Expansion. Annual Review of Information Science Technology.1996:31:121-187.
    [56]J.Koenemann. Relevance Feedback:Usage, Usability, Uutility. Ph.D. Dissertation, Rutgers University, Dept.of Psycholog y.1996.
    [57]A. Spink, R. M. Losee. Feedback in Information Retrieval. Annual Review of Information Sciences Technology,1996,31:33-78.
    [58]J. Xu, W. B. Croft. Query Expansion Using Local and Global Document Analysis. In Proceedings of 19th ACM SIGIR Conference on Research and Development in Information Retrieval,1996:4-11.
    [59]N.J.Belkin. Intelligent Information Retrieval:Whose intelligence? In Proceedings of the 5th Internationalen Sypmosiums for Informations Science,1996:25-31.
    [60]N.J.Belkin, C.Cool, J.Head, J.Jeng, D.Kelly, S.J.Lin, Lobash, L.Park, P.Savage Knepshield, and C.Siko ra. Relevance Feedback Versus Local Context Analysis as Term Suggestion Devices. In Proceedings of the Eighth Text Retrieval Conference, 2000.
    [61]N.J.Belkin. Helping People Find What They Don't Know. Communications of the ACM.2000,43(8):58-61.
    [62]Xu Jinxi, Croft W. Bruce. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Transactions on Information Systems,2000,18(1): 79-112.
    [63]C.-K.Huang, L.-F.Chien, and Y.-J.Oyang. Relevant Term Suggestion in Interactive Web Search Based on Contextuallnformation in Query Session Llogs. Journal of the American Society for Information Science and Technology. 54(7):638-649,2003.
    [64]Zhiyong Zhang, Olfa Nasraoui. Mining Search Engine Query Logs for Query Recommendation. In Proceedings of the 15th International Conference on World Wide Web,2006:1039-1040.
    [65]Roberto Zanon, Simone Albertini, Moreno Carullo and Ignazio Gallo. A New Query Suggestion Algorithm for Taxonomy-based Search Engines. In Proceedings of the 4th International Conference on Knowledge Discovery and Information Retrieval, 2012:151-156.
    [66]Silviu Cucerzan, Ryen W. White. Query Suggestion Based on User Landing Ppages. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and development in Information Retrieval,2007:875-876.
    [67]Bruno M. Fonseca, Paulo B. Golgher, Edleno S. de Moura, Nivio Ziviani. Using Association Rules to Discovery Search Engines related Queries. In Proceedings of the First Conference on Latin American Web Congress,2003:66-71.
    [68]R.Jones. B.Rey, O.Madan, and W.Greiner. Generating Query Substitutions. In Proceedings of the 15th International Conference on World Wide Web,2006: 387-396.
    [69]Larry Fitzpatrick, Mei Dent. Automatic Feedback Using Past Queries:Social Searching. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997:306-313.
    [70]Ji-Rong Wen, Jian-Yun Nie, Hong-Jiang Zhang. Query Clustering Using User Logs. ACM Transactions on Information Systems,2002,20(1):59-81.
    [71]王继民,彭波,孟涛.基于搜索引擎日志发现相近Web查询.北京邮电大学学报,2005,28(S2):44-48.
    [72]R.Baeza-Yates, C.Hurtado, M.Mendoza. Query Recommendation Using Query Logs in Search Engines. In Proceedings of the 2004 International Conference on Current Trends in Database Technology,2004:588-596
    [73]Gloria Bordogna, Alessandro Campi, Giuseppe Psaila, Stefania Ronchi. Disambiguated Query Suggestions and Personalized Content-similarity and Novelty Ranking of Clustered Results to Optimize Web Searches. Information Processing & Management,2012,48(3):419-437.
    [74]H. Ma, H.X. Yang, I. King, and M. R. Lyu. Learning Latent Semantic Relations from Click through Data for Query Suggestion. In Proceedings of the 17th ACM Conference on Information and Knowledge Management,2008:709-718.
    [75]Doug Beeferman, Adam Berger. Agglomerative Clustering of Search Engine Query Log. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000:407-416.
    [76]Q. Mei, D. Zhou, and K. Church. Query Suggestion Using Hhitting Time. In Proceedings of the 17th ACM Conference on Information and Knowledge Management,2008:469-478.
    [77]Yang Song, Li-wei He. Optimal Rare Query Suggestion With Implicit User Feedback. In Proceedings of the 19th International Conference on World Wide Web, 2010:901-910.
    [78]Paolo Boldi, Francesco Bonchi and Carlos Castillo, the Query-flow Graph:Model and Applications. In Proceedings of the 17th ACM Conference on Information and Knowledge Management,2008:609-617.
    [79]Yan Chen,Yan-Qing Zhang, A Personalized Query Suggestion Agent based on Query-Concept Bipartite Graphs and Concept Relation Trees. International Journal of Advanced Intelligence Paradigms,2009,4(1):398-417.
    [80]Bai Lv, Guo Jia-feng, Cao Lei, Cheng Xue-qi. Long Tail Query Recommendation Based on Query Intent. Chinese Journal of Computers,2013:36(3).
    [81]Giannis Varelas. Epimenidis Voutsakis, Paraskevi Raftopoulou, Euripides G.M. Petrakis, Evangelos E. Milios. Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the web. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, 2005:10-16.
    [82]Philip Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence,1995:448-453.
    [83]Francis Winthrop Nelson, Henry Kucera. Frequency Analysis of English Usage: Lexicon and Grammar, Houghthon Mifflin, Boston.1982.
    [84]Nuno Seco, Tony Veale, Jer Hayes. An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence,2004:1089-1090.
    [85]David Sanchez, Montserrat Batet, David Isern. Ontology-based Information Content Computation. Knowledge-Based System.2011,24 (2):297-303.
    [86]Herbert Rubenstein, John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM,1965,8(10):627-633.
    [87]http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi?version=yes.2012-6-8
    [88]Zhu Zhang, Jahna Otterbacher, Dragomir Radev. Learning Cross-document Structural Relationships Using Boosting. In Proceedings of the 12th International Conference on Information and Knowledge Management,2003:124-130.
    [89]Mario Jarmasz, Stan Szpakowicz. Roget's Thesaurus and Semantic Similarity. In Proceedings of the Conference on Recent Advances in Natural Language Processing, 2003:212-219.
    [90]Mona Talat Diab. Word Sense Disambiguation Within a Mmultilingual Framework. Doctoral Thesis, University of Maryland at College Park College Park, MD, USA,2003.
    [91]Timothy Baldwin, Colin Bannard, Takaaki Tanaka, Dominic Widdows. An Empirical Model of Multiword Expression Decomposability. In Proceedings of the ACL 2003 Workshop on Multiword Expressions:Analysis, Acquisition and Treatment, 2003:89-96.
    [92]Z. Wu, M. Palmer. Verb semantics and lexical selection. In Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics,1994:133-138.
    [93]C. Leacock, M. Chodorow, Combining Local Context and WordNet Similarity for Word Sense Identification, WordNet:An Electronic Lexical Database, MIT Press. 1998:265-283.
    [94]Philip Resnik.WordNet and Distributional Analysis:A Class-based Approach to Lexical Discovery. In AAAI workshop Statistically-based Natural Language Processing techniques,1992:56-64.
    [95]Philip Resnik. Disambiguating Noun Groupings With Respect to WordNet Senses. In Proceedings of the 3rd Workshop on Very Large Corpora,1995:54-68.
    [96]Dekang Lin. An Information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison,1998:269-304.
    [97]Jay J. Jiang David, W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proceedings of International Conference on Research in Computational Linguistics,1997:19-33.
    [98]Jingfang Xu, Gu Xu. Learning Similarity Function for Rare Queries. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining,2011:615-624.
    [99]蒋棋夏,相似性搜索中的近似算法研究。清华大学博士学位论文,计算机应用技术,2012.4
    [100]Amanda Spink, Dietmar Wolfram, Major Bernard.1. Jansen, Tefko Saracevic. Searching the Web:the Public and Their Queries. Journal of the American Society for Information Science and Technology,2001,52(3):226-234.
    [101]Craig Silverstein, Monika Henzinger, Hannes Marais.et al. Analysis of a Very Large Web Search Engine Query log. In SIGIR Forum,1999,33(1):6-12.
    [102]余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析.中文信息学报,2007,21(1):109-114.
    [103]E. Balfe, B. Smyth. An analysis of Query similarity in Collaborative Web Search. In Proceedings of the 27th European Conference on Advances in Information Retrieval Research,2005:330-344.
    [104]L Egghe, C Michel. Construction of Weak and Strong Similarity Measures for Ordered Sets of Documents Using Fuzzy Set Ttechniques. Information Processing & Management,2003,39(5):771-807.
    [105]Xiaojun Wan. A Novel Document Similarity Measure Based on Earth Mover's Distance. Information Sciences,2007,177(18):3718-3730.
    [106]Hongyuan Zha. Generic summarization and Eeyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In Proceedings of the 25th SIGIR Conference,2002:113-120.
    [107]Mansaf Alam, Kishwar Sadaf. A Review on Clustering of Web Search Result. In Proceedings of the 2nd International Conference on Advances in Intelligent Systems and Computing.2013:153-159.
    [108]Jiawei Han, Micheline Kamber, Jian Pei. Data Mining:Concepts and Techniques. Morgan Kaufmann, the 3rd Edition,2011.
    [109]Michael Steinbach, George Karypis, Vipin Kumar. A Comparison of Document Clustering Techniques. In Proceedings of KDD Workshop on Text Mining,2000: 109-111.
    [110]Ashish Jaiswal, Nitin Janwe. Hierarchical Document Clustering:A Review. In Proceedings of the 2nd National Conference on Information and Communication Technology,2011:37-41.
    [111]Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE:An Efficient Clustering Algorithm for Large Databases. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of data,1998:73-84.
    [112]Sudipto Guhay, Rajeev Rastogi, Kyuseok Shim. ROCK:A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on Data Engineering,1999:512-521.
    [113]GeorgeKarypis, Eui-Hong Han, Vipin Kumar. CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer,1999,32(8):68-75.
    [114]Vivek Kumar Singh, Tanveer Jahan Siddiqui, Manoj Kumar Singh. Evaluating Hard and Soft Flat-clustering Algorithms for Text Documents. In Proceedings of the 3rd International Conference on Intelligent Human Computer Interaction,2011: 63-76.
    [115]Kenneth Wai-Ting Leung, Wilfred Ng,Dik Lun Lee. Personalized Concept-Based Clustering of Search Engine Queries. IEEE Transactions on Knowledge and Data Engineering,2008,20(11):1501-1518.
    [116]Btihal El Ghali, Abderrahim El Qadi. Probabilistic Query Expansion Method Using Recommended Past User Queries. In Proceedings of Second International Conference on Innovative Computing Technology (INTECH),2012:406-411.
    [117]Makoto P. Kato, Tetsuya Sakai,Katsumi Tanaka. Structured Query Suggestion for Specialization and Pparallel Movement:Effect on Search behaviors. In Proceedings of the 21st Annual Conference on World Wide Web,2012:389^398.
    [118], Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, Hang Li. Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion. ACM Transactions on Intelligent Systems and Technology,2011,3(1).
    [119]Rosie Jones, Benjamin Rey and Omid Madani, Wiley Greiner. Generating Query Substitutions. In Proceedings of the 15th International Conference on World Wide Web,2006:387-396.
    [120]Ryen W. White, Mikhail Bilenko, Silviu Cucerzan. Studying the Uuse of Popular Destinations to Enhance Web Search interaction. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2007:159-166.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700