文本语义相似度计算方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

文本语义相似度计算方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Semantic Similarity Measurement for Text
作者：刘宏哲
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：概念相似度 ; 句子相似度 ; 文档相似度 ; 语义相似度计算
英文关键词：Concept Similarity ; Sentence Similarity ; Text Similarity ; Semantic
英文关键词：Similarity Computing
学位年度：2012
导师：须德
学科代码：081203
学位授予单位：北京交通大学
论文提交日期：2012-06-01

摘要

随着计算机和互联网技术的发展,文本数据的数量大幅度地增长,但是这类数据对于计算机来说难于理解和使用,解决这一问题的途径之一是借助语义相似度计算。遗憾的是,现有的语义相似度计算研究成果在应用时大都需要较多的附加信息,例如大规模语料库以及完备本体等,这些附加信息在实际应用领域中通常难以获得,因此成果的应用范围受到了一定的限制；此外,迄今为止研究是在不同时期、不同前提下割裂地进行的,对概念、句子和文档等研究对象的语义相似度计算研究没有形成统一的理论体系。针对以上问题,论文在不完备附加信息前提下,从概念、句子和文档三个对象层面研究文本数据的语义相似度和相关度计算,计算过程包括语义提取、语义描述、语义相似度计算三个主要阶段。各研究对象与本体的语义关系将基于本体结构进行提取,用研究对象在本体中的语义“指纹”描述对象本身,构建基于本体结构的语义向量,从而进行语义相似度计算。
     研究成果主要包括以下三方面：
     1.提出了一种基于树结构和以树为主体的图结构的相似度和相关度计算方法。通过分析本体的树结构,可以发现概念节点的祖先概念节点和后代概念节点与当前概念节点语义相关,据此特点可以找出概念节点在本体树中的相关概念节点；根据概念节点在本体中所处位置的结构信息计算概念节点密度,实现基于树结构本体的概念语义提取,语义描述和语义相似度计算。在树结构本体相似度计算方法的基础上,进一步提出了基于以树为主体的图结构的概念相关度计算方法。针对特定语义相关度计算需要,将以树为主体的图结构本体转化为树结构本体,计算概念节点间的语义相关度。本方法在领域数据中得到很好的应用,在标准数据集WordNet上的实验也证明：与经典的计算方法相比,在不完备附加信息背景下,本方法获得很好的皮尔森线性相关系数值(Correlation)。
     2.提出了一种基于树结构本体的句子相似度计算方法。利用本体概念与句子中关键词之间建立的语义索引,构建句子与本体间的直接和间接语义联系,据此提取描述句子的语义向量,从而计算句子间的语义相似度。应用微软研究院的意译语料库(MSRP)对本方法进行验证,实验结果表明：与相关的计算方法相比,本方法在不完备附加信息应用前提下获得了较好的准确率和召回率。
     3.提出了一种基于树结构本体的文档相似度计算方法。除利用本体概念与文档中的关键词建立的语义索引来构建文档与本体间的直接和间接语义联系外,还利用本体的层次结构信息估算文档关键词的权重,据此构建基于本体的文档语义向量来计算文档间的语义相似度。用Michael D.LEE50标准文档相似度测试数据集进行验证,实验结果表明：与相关的方法比较,本方法在不完备附加信息应用前提下获得了较好的皮尔森线性相关系数值。
     简而概之,与已有的计算方法相比,论文提出的三种语义相似度计算方法在应用时,所需附加辅助信息少,计算过程简单高效,经过相关的数据集测试具有较好的计算精度,因此有良好的领域适应性。
     图39幅,表20个,参考文献120篇。
With the development of computer and Internet technology, the quantity of information grow substantially, this kind of information is difficult to understand and use for computer, and the semantic computing is the basic to solve these problem. The existing semantic computing research mostly rely on large-scale corpus as well as the compete Ontology, these kind of information is difficult to obtain in practical applications; also the past research is conducted in different period, under different research premise, it has not formed the unified theory. In view of the above problems, this paper studies semantic similarity measure of text with concept, sentence and document levels under incomplete information background, and the similarity computing process includs semantic extraction, semantic description and semantic computation of three stages. The relations between semantic objects and Ontology is extracted based on the Ontology structure, using semantic" fingerprint" of the semantic objects in the Ontology to describe the objects themselves, and then forming semantic vector for the semantic objects, thus semantic computation is conducted.
     Research includes the following three aspects:
     1. Research on concept similarity measurement based on the tree structure and the tree based graph structure. Through the observation of the tree structure, we found the ancestor Concept Node and the descendant Concept Node of a Concept Node are semantically related to the Concept Node in the Ontology, the structure information of the position of the node in Ontology can use to compute Concept Node Density, concept semantic extraction, semantic description and semantic computation method are proposed based on that. Based on the tree based similarity measurement, we propose the semantic relativity measurement based on the tree based gragh Ontology structure. For the need of computation, tree based graph structure is transformed into a tree structure. Except the method is well applied in domain data, it is also applied to WordNet, experiments show that:Compared with the related method, the method obtains the very good Pearson linear correlation coefficient value under the incomplete information background.
     2. Research on sentence similarity computing based on Ontology. Using the relations between the Ontology concepts and key words in the sentences to establish semantic index to extract the direct and indirect semantic relation, Ontology based semantic vector is represented to calculate the semantic similarity between sentences, thus the sentence similarity computing method is proposed. This method is applied in the Microsoft Research Institute of paraphrase corpus (MSRP), experiments show that: Compared with the related similarity computing methods, this method obtains good accuracy and recall rate in the incomplete additional information background.
     3. Research on text similarity computing based on Ontology. In addition to using the relations between the Ontology concepts and key words in the document to establish semantic index to extract the direct and indirect semantic relation, also using Ontology hierarchy information to do the text key words weight estimation, Ontology based semantic vector is proposed to calculate the semantic similarity between texts. This method is applied in the Michael D. LEE50standard document similarity testing data, experiment shows that: Compared with related methods, this method achieves good Pearson linear correlation coefficient value under the incomplete information background.
     In summary, the common advantage of above three methods is that they require little additional information, and they are simple and effective, so they have good domain adaptbility.

引文

[1]P.Achananuparp, X. Hu, X.Zhou, X. Zhang. Utilizing Sentence Similarity and Question Type Similarity to Response to Similar Questions in Knowledge-Sharing Community. The First WWW Workshop on Question Answering on the Web.
    [2]A. A. Marco, L. SeungJin. A Graph Modeling of Semantic Similarity between Words. International Conference on Semantic Computing,2007.355-362.
    [3]A. Tversky. Features of Similarity. Psychological Review,1977. Vol.84 (4):327-352.
    [4]B. Alexander, G. Hirst. Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computing Linguist.2006. Vol.32 (1):13-47.
    [5]C. Leacock, M. Chodorow. Combining local context and WordNet sense similarity for word sense identification. In WordNet, an Electronic Lexical Database,1998.265-283
    [6]C. Songmei, L. Zhao. An Improved Semantic Similarity Measure for Word Pairs. Proceedings of the 2010 International Conference on e-Education, e-Business, e-Management and e-Learning,2010.212-216
    [7]D. Lin. An information-theoretic definition of similarity. In Proceedings of International Conference on Machine Learning,1998.296-304
    [8]D. Yang, D.M.W.Powers. Measuring semantic similarity in the taxonomy of WordNet, Proceedings of the Twenty-eighth Australasian conference on Computer Science Newcastle, Australia,2005. Vol.38:315-322
    [9]I. Dagan, O. Glickman, B. Dagan. The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL Workshop.2005
    [10]G. Hirst, D. St-Onge. Lexical Chains as representations of context for the detection and correction of malapropisms. In WordNet, an Electronic Database,1998.305-332
    [11]G.A. Miller, W.G. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes,1991. Vol.6(1):1-28
    [12]G.Varelas, E.Voutsakis, P.Raftopoulou. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. Proceedings of the 7th annual ACM international workshop on Web information and data management.2005.10-16
    [13]J.J.Jiang, D.W.Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Computational Linguistics,1997.19-33
    [14]J.W. Kim, K.S. Candan. CP/CV:concept similarity mining without frequency information from domain describing taxonomies.15th International Conference on Information and Knowledge Management,2006.483-492
    [15]L.Lei, Z.MaoSheng, L.Ruzhan. Measuring Word Similarity Based on Pattern Vector Space Model.2009 International Conference on Artificial Intelligence and Computational Intelligence,2009. Vol.3:72-76
    [16]G.Jike, Q.Yuhui. Concept Similarity Matching Based on Semantic Distance.2008 Fourth International Conference on Semantics, Knowledge and Grid,2008.380-383
    [17]W.Xiaoyun, Z.Jianfeng. An Improvement on the Model of Ontology-Based Semantic Similarity Computation.2009 First International Workshop on Database Technology and Applications,2009.509-512
    [18]L.Zhiqiang, S.Werimin, Y.Zhenhua. Measuring Semantic Similarity between Words Using Wikipedia.2009 International Conference on Web Information Systems and Mining 2009. 251-255
    [19]S.Wan, R.A.Angryk. Measuring semantic similarity using wordnet-based context vectors. Systems, Man and Cybernetics,2007.908-913.
    [20]P.Resnik. Semantic Similarity in Taxonomy:An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Articial Intelligence Research,1999.Vol.11:95-130.
    [21]H.Rubenstein, J.B.Goodenough. Contextual correlates of synonymy. Communications of the ACM,1965. Vol.8 (10):627-633.
    [22]Q.Peng, L.Zhao, Y.Yu, W. Fang. A New Measure of Word Semantic Similarity Based on WordNet Hierarchy and DAG Theory.2009 International Conference on Web Information Systems and Mining,2009,181-185.
    [23]R.Rada, H.Mili, E.Bicknell, M.Bletner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics,1989,19(1).17-30.
    [24]R.Richardson and A.F.Smeaton. Using WordNet in a Knowledge-Based Approach to Information Retrieval. Working Paper, CA-0395, School of Computer Applications, Dublin City University, Ireland,1995
    [25]P.W.Lord, R.D.Stevens, A.Brass. Investigating Semantic Similarity Measures across the Gene Ontology. The Relationship between Sequence and Annotation, Bioinformatics,2003. Vol.19 (10):1275-1283.
    [26]S.Patwardhan. Incorporating dictionary and corpus information into a vector measure of semantic relatedness. MSc Thesis, University of Minnesota,2003.
    [27]S. Bin, F.Liying, Y.Jianzhuo, W.Pu, Z.Zhongcheng, Ontology-Based Measure of Semantic Similarity between Concepts. World Congress on Software Engineering,2009.Vol. 2:109-112.
    [28]T.Pedersen, S.Patwardhan, J. Michelizzi. Wordnet: similarity--measuring the relatedness of concepts. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04).2004.1024-1025.
    [29]Y.Li, Z.A. Bandar, D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering,2003.Vol.15 (4):871-882.
    [30]Z.B. Wu, M. Palmer. Verb semantics and lexical selection. In Proceedings of Association for Computational Linguistics,1994.133-138
    [31]Z. ZhongCheng, Y. Jianzhuo, F. Liying. Measuring Semantic Similarity Based On WordNet. 2009 Sixth Web Information Systems and Applications Conference,2009.89-92
    [32]F. Christiane. WordNet: An Electronic Lexical Database. MIT Press,1998.
    [33]M.Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Management,1993.67-74.
    [34]M.B.Fleischman, E.Hovy. Multi-document person name resolution. Proceedings of the Workshop on Reference Resolution and its Applications,2004.1-8.
    [35]I.Gurevych, M.Strube. Semantic similarity applied to spoken dialogue summarization. In Proceedings of the 20th International Conference on Computational Linguistics,2004. 764-770.
    [36]H.Hassan, A.Hassan, and O.Emam. Unsupervised information extraction approach using graph mutual reinforcement. In Proceedings of the Conference on Empirical Methods in Natural Language Processing,2006.501-508
    [37]B.Stephan, B.Roberto, C.Marco. Semantic Kernels for Text Classification based on Topological Measures of Feature Similarity. Sixth International Conference on data mining,2006.808-812.
    [38]D.Inkpen and D.Alain. Semantic similarity for detecting recognition errors in automatic speech transcripts. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing,2005.49-56.
    [39]D.McCarthy. Relating WordNet senses for word sense disambiguation. In Proceedings of the ACL Workshop on Making Sense of Sense:Bringing Psycholinguistics and Computational Linguistics Together,2006.17-24.
    [40]L.Hongzhe, B.Hong, X.De, Concept Vector for Similarity Measurement based on Hierarchical Domain Structure. Computing and informatics,2011.Vol.30:881-900
    [41]L.Hongzhe, B.Hong, X.De, Concept vector for semantic similarity and relatedness based on WordNet structure, Journal of Systems and Software,2012, Vol 85(2):370-381
    [42]D.Q.Yang and D.M.W. Powers. Measuring semantic similarity in the taxonomy of WordNet. In Proceedings of the 28th Australasian Computer Science Conference,2005.315-322.
    [43]C.Valerie and H.Xueheng. Using Semantic Similarity in Ontology Alignment. The Sixth International Workshop on Ontology Matching collocated with the 10th International Semantic Web Conference,2011.61-72
    [44]R.Araujo, H.S.Pinto. Towards semantics-based Ontology similarity. Proceeding Workshop on Ontology Matching, International Semantic Web Conference,2007.
    [45]P.Jennifer. Computer Correction of Real-word Spelling Errors in Dyslexic Text. Ph.D. thesis, Birkbeck, London University.2007.
    [46]B.Jacob, C.Benjamin. Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia, http://www.infosci.cornell.edu/weblab/papers/Bank2008.pdf 2008
    [47]D.Metzler, Y.Bernstein, W.B.Croft, A.Moffat, J.Zobel. Similarity measures for tracking information flow. Proceedings of Information and Knowledge Management,2005.517-524.
    [48]S.Banerjee and T.Pedersen. Extended gloss overlap as a measure of semantic relatedness. In Proceedings of International Joint Conference on Artificial Intelligence,2003.805-810.
    [49]S.P.Ponzetto, M.Strube, Knowledge Derived From Wikipedia for Computing Semantic Relatedness, Journal of Artificial Intelligence Research,2007, Vol. (30):181-212.
    [50]J.Allan, A.Bolivar and C.Wade. Retrieval and novelty detection at the sentence level. In Proceedings of SIGIR,2003.314-321.
    [51]K.Lund and C.Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments & Computers,1996, Vol.28 (2): 203-208.
    [52]T.C.Hoad and J.Zobel. Methods for identifying versioned and plagiarized documents. Journal of the American Society of Information Science and Technology,2003. Vol.54(3): 203-215.
    [53]R.Mihalcea, C.Corley, C.Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of Association for the Advancement of Artificial Intelligence,2006.775-780
    [54]R.Malik, V.Subramaniam, S.Kaushik. Automatically Selecting Answer Templates to Respond to Customer Emails. In Proceedings of International Joint Conference on Artificial Intelligence,2007.1659-1664.
    [55]H.Chukfong, A.A.M.Masrah, A.K.Rabiah, C.D.Shyamala, Word sense disambiguation based sentence similarity. Proceedings of the 23rd International Conference on Computational Linguistics,2010.418-426
    [56]Y. Li, D.McLean, Z.A.Bandar, J.D.O'Shea. Sentence similarity based on semantic nets and corpus statistics. IEEE Transaction on knowledge and data engineering,2006. Vol.18(8): 1138-1150.
    [57]A.Palakorn, H.Xiaohua, S.Xiajiong. The Evaluation of Sentence Similarity Measures. Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery,2008.305-316
    [58]A. Islam, D. Inkpen. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD),2008, Vol.2 (2), No.10
    [59]A.Islam and D.Inkpen. Semantic similarity of short texts. Recent Advances in Natural Language Processing,2007.227-231.
    [60]T.K.Landauer, P.W. Foltz, D. Laham. Introduction to Latent Semantic Analysis. Discourse Processes,1998, Vol.25(2-3):259-284.
    [61]C.Burgess, K.Livesay, K.Lund. Explorations in Context Space:Words, Sentences, Discourse. Discourse Processes. Vol.25 (2-3):211-257.
    [62]D.L.Micheal, B.M.Pincombe. An Empirical Evaluation of Models of Text Document Similarity. Cognitive Science Society,2005.1254-1259
    [63]D.L.T.Rohde. Methods for binary multidimensional scaling. Neural Computation,2002, Vol.14 (5):1195-1232.
    [64]R.N.Shepard, P. Arabie. Additive clustering representations of similarities as combinations of discrete overlapping properties. Psychological Review,1979,86(2):87-123.
    [65]G.Salton, C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management,1988. Vol.24(5):513-523.
    [66]R.Besancon, M.Rajman, J.Chappelier. Textual similarities based on a distributional approach. The Tenth International Workshop on Database and Expert Systems Applications,1999.180-184.
    [67]J.W.Cooper, A.R Coden, E.W.Brown. A novel method for detecting similar documents. Proceeding of the Annual Hawaii International Conference on System Sciences.2002. 1153-1159.
    [68]O.Vladimir, P.Asle. Ontology based semantic similarity comparison of documents.14m International Workshop on Database and Expert Systems Applications.2003.735-738.
    [69]M. E. Rorvig. Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of TREC topic-document sets. Journal of the American Society for Information Science,1999. Vol.50 (8):639-651.
    [70]G.Evgeniy, M.Shaul. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. International Joint Conferences on Artificial Intelligence, 2007.1606-1611
    [71]B.Dolan, C.Quirk, and C.Brockett. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20thInternational Conference on Computational Linguistics.2004
    [72]R.D.Burke, K.J.Hammond. Question Answering from Frequently-Asked Question Files: Experiences with the FAQ Finder System. Technical Report TR-97-05, Univ. of Chicago, Dept. of Computer Science 1997.
    [73]A.Eugene, L.Steve, and G.Luis. Learning Search Engine Specific Query Transformations for Question Answering. In the Proceedings of the 10th World Wide Web Conference, 2001.
    [74]G.Ian, N.Yiu Kai. Eliminating Redundant and Less-Informative RSS News Articles Based on Word Similarity and a Fuzzy Equivalence Relation.18th IEEE International Conference on Tools with Artificial Intelligence,2006.465-473
    [75]M.Ramiz. A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Systems with Applications,2009. Vol. 36(4):7764-7772.
    [76]F.Mandreoli, R.Martoglia, P. Tiberio. Searching Similar (Sub)Sentences for Example-Based Machine Translation, In Proceeding of 2002 Italian Symposium on Sistemi Evoluti per Basi di Dati,2002.
    [77]H.Anna. Similarity Measures for Text Document Clustering. In Proceedings of the 6th New Zealand Computer Science Research Student Conference,2008.49-56
    [78]W.Xia, D.Yihong, Z.Yi. Similarity Measurement about Ontology-based Semantic Web Services. In Workshop on Semantics for Web Services,2006.
    [79]D.M.W Powers. Evaluation:From precision, recall and F-measures to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies 2011, Vol. 2(1):37-63
    [80]S.A.Alvarez. An Exact Analytical Relation among Recall, Precision, and Classification Accuracy in Information Retrieval. Technical Report BC-CS-02-01, Computer Science Department, Boston College,2002
    [81]H.Rejwanul, K.N. Sudip, W.Andy, R. C. Marta, E.B. Rafael. Sentence Similarity-Based Source Context Modelling in PBSMT.2010 International Conference on Asian Language Processing,2010.257-260
    [82]L.Ru, L. Shuanghong, Z. Zezheng. The Semantic Computing Model of Sentence Similarity Based on Chinese FrameNet.2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology,2009.Vol.3:255-258
    [83]L.Yi, L.Qiang. Chinese Sentence Similarity Based on Multi-feature Combination.2009 WRI Global Congress on Intelligent Systems,2009. Vol.3:751-756.
    [84]W. Xiaohua, W. Zheru, C. Zhiqun, W.Rongbo. Chinese Sentence Similarity Measure Based on Words and Structure Information.2008 International Conference on Advanced Language Processing and Web Information Technology,2008.27-31
    [85]H. ChukFong, Masrah, M.A.A.Murad, S.C.Doraisamy, R.A.Kadir. Measuring Sentence Similarity from Both the Perspectives of Commonalities and Differences.22nd IEEE International Conference on Tools with Artificial Intelligence 2010.Vol.1:318-322
    [86]L.Lin, Z.Yiming, Y.Boqiu, W.Jun, H.Xia. Sentence Similarity Measurement Based on Shallow Parsing. Sixth International Conference on Fuzzy Systems and Knowledge Discovery,2009. Vol.7:487-491
    [87]S.Jianfang, L. Zongtian, Z.Wen. Sentence Similarity Measure Based on Events and Content Words. Sixth International Conference on Fuzzy Systems and Knowledge Discovery,2009. Vol.7:623-627
    [88]L. MingChe, Z. JiaWei, L.WenXiang, Y. HengYu. Sentence Similarity Computation Based on POS and Semantic Nets.2009 Fifth International Joint Conference on INC, IMS and IDC,2009.907-912
    [89]N.Anh, M.AIistair. Compressed inverted files with reduced decoding overheads. Proceedings of the 21st international Conference on Research and Development in information Retrieval,1998.290-297
    [90]K.Sadakane, H.Imai. Text Retrieval by using k-word Proximity Search. Proceedings of international Symposium on Database Applications in Nontraditional Environments. Research Project on Advanced Databases,1999.23-28
    [91]S. Brin, L.Page. The anatomy of alarge-scale hypertextual Websearchengine. Proceedings of the Seventh International World Wide Web Conference,1998. Vol.30(1-7):107-117
    [92]H.Williams, J.Zobei, P.Anderson. What's next? Index structures for efficient phrase guerying. Proceedings of the Tenth Australasian Database Conference,1999.141-152.
    [93]C.J.Carolyn, B.C.Donald. Improving the retrievel of very short queries. Info, Process.Manage.2002. Vol.38(1):1-36.
    [94]L.Kyung Soon, W.B.Croft. A clusterbased resampling method for pseudo relevance feed back. Proeeedings of the 31 st annual international ACM SIGIR conference on Research and development in information retrieval,2008.235-242.
    [95]K. Lee, Y. Park. Document re-ranking model using clusters. Information processing and Management,2001. Vol.37(1):1-14.
    [96]C.Guihong, N.Jian-Yun. Selecting good expansion terms for pseudo-relevance feedback. Proceedings of the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval.2008.243-250.
    [97]X. Jinxi, W.B.Croft.Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval,1996.4-11.
    [98]R.W.P.Luk, K.F.Wong. Pseudo-Relevance Feedback and Title Re-Ranking for Chinese IR. Proceedings of NTCIR Workshop 4,2004.315-326.
    [99]K.S.Jones, S.Walker. A probabilistic model of information retrieval:development and comparative experiments. Infoormation Processing and Management,2000.Vol.36(6): 779-808.
    [100]W.Xuanhui, F.Hui. A study of methods for negative relevance feedback. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008.219-226.
    [101]L.YangJi. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. Proceedings of the 20th international conference on computational Linguistics.2004.480-481.
    [102]A.Izzat, I.S.Zakaria. Documents Similarities Algorithms for Research Papers Authenticity, The Second International Conference on Communications and Information Technology, 2012.210-214
    [103]M. Mozgovoy, S. Karakovskiy, V. Klyuev.Fast and reliable plagiarism detection system.37th Annual Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports,2007.11-14
    [104]刘紫玉,黄磊,基于领域本体模型的概念语义相似度计算研究,铁道学报,2011.Vol.33(1)：52-57.
    [105]周博,岑荣伟,刘奕群,张敏,金奕江,马少平.一种基于文档相似度的检索结果重排序方法.中文信息学报,2010.Vol.24(3)：19-23
    [106]李文清,孙新,张常有,冯烨.一种本体概念的语义相似度计算方法,自动化学报,2012. Vo1.38(12)：229-235
    [107]刘宏哲,须德.基于本体的语义相似度和相关度计算研究综述,计算机科学,2012,Vo1.39(2)：8-13.
    [108]罗志高.国外英语语料库简介,重庆科技学院学报(社会科学版),2008年第11期.129-130
    [109]李赛红.解构英国国家语料库,外语教学与研究(外国语文双月刊),2002.Vol.34(4):308-313
    [110]http://www.wikipedia.org/
    [111]http://www.worldlingo.com/ma/enwiki/zh_cn/FrameNet
    [112]http://verbs.colorado.edu/-mpalmer/projects/verbnet.html
    [113]http://www.keenage.com/
    [114]http://thesaurus.com/
    [115]http://wordnet.princeton.edu/.2010.5.10
    [116]http://www.natcorp.ox.ac.uk/
    [117]http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/
    [118]http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
    [119]http://www.cs.technion.ac.il/～gabr/resources/data/wordsim353/wordsim353.html
    [120]http://wn-similarity.sourceforge.net/

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700