用户名: 密码: 验证码:
基于引用聚类的多文档自动文摘技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联网的普及带来了网络电子期刊文献的剧增,这给研究人员(尤其是初级研究人员)高效准确地从海量信息中挖掘所需要的信息带来了巨大挑战。因此,如何对领域知识进行自动概括以提高研究人员获取信息的效率变得日益重要。多文档自动文摘是自然语言处理中的一个重要研究课题,它可以实现对同一主题的多篇文献的汇总和压缩,通过提供一个简洁、全面的文摘,来减少研究人员阅读文献的时间并避免信息超载的发生。
     为了对研究人员所关心领域的相关文献进行概述,本文在现有的多文档文摘技术的基础之上,研究了基于引用聚类的多文档自动文摘技术,着重在引用聚类和文摘生成的研究。
     在引用聚类部分,基于向量空间模型,通过采用不同的文本表示方式和文本相似度计算方法,得到了引用聚类的六种聚类指标,即文献摘要相似性指标、基于查询的文献摘要相似性指标、文献引用上下文相似性指标、基于查询的文献引用上下文相似性指标、文献共引互信息指标以及文献共引位置临近性得分指标。在此基础上,根据文献的引用位置与其主题间的相关性特点,提出了一种基于引用位置距离的聚类评价方法,并基于该方法对六种聚类指标的聚类效果进行了比较。
     引用聚类的目的是为了对与研究人员信息需求相关的多篇文献按照主题的相似程度进行分组管理,为之后的研究做铺垫。
     在文摘生成部分,为了对主题簇中多篇文献的主要内容进行浓缩、提炼,采用不同的多文档文摘技术(LexRank、Query Sensitive LexRank、MMR以及LexRankMMR),根据句子的重要程度,从每个簇的候选句子集中抽取重要性高的句子生成不同长度的段落以对每个簇中的多篇文献进行概括描述。之后通过实验对生成段落的质量以及由生成段落组成的文摘的质量进行了评价。
The popularity of the Internet has brought a sharp increase in electronic literatures, which brings a huge challenge for researchers, especially junior researchers to acquire useful information from massive amount of information effectively and accurately. Therefore, how to summarize domain knowledge so that to improve the efficiency researchers access to information is becoming more and more important. Multi-document summarization is an important research topic in natural language processing. It can summarize and compress documents on the same topic, which can relieve researchers from reading all of the documents and avoid information overload by providing a concise and comprehensive summary.
     In order to summarize the related works in the domain of interest, based on the existing multi-document summarization technologies, we study citation clustering based automatic multi-document summarization, and mainly study citation clustering and summary generation.
     In the section of citation clustering, based on Vector Space Model (VSM), by different text representation and similarity computation methods, we get six clustering indicators, namely, publication abstract similarity (PAS), publication query-sensitive abstract similarity (PQAS), publication citation context similarity (PCCS), publication query-sensitive citation context similarity (PQCCS), publication co-cite mutual information (PCMI) and publication co-cite proximity score (PCPS). And based on the relationship between cited positions and topics of the citations, we propose a citation cited proximity based clustering evaluation method to evaluate the clustering results based on the six indicators.
     The purpose of citation clustering is to group the user query related documents into different clusters so as to prepare for summary generation.
     In the section of summary generation, in order to condense the multiple documents on the same or similar topic in each grouped cluster, we use different summarization methods, such as LexRank, Query Sensitive LexRank, MMR and LexRankMMR, to generate a paragraph of different length by extracting important sentences from the candidate sentence set to describe these documents. Finally, we evaluate each generated paragraph and the summary composed of these paragraphs by experiments.
引文
[1]Liu, X.. Generating metadata for cyberlearning resources through information retrieval and meta-search. Journal of the American Society for Information Science and Technology.2013,64(4),771-786.
    [2]Borner, K., Chen, C., Boyack, K. W.. Visualizing knowledge domains. Annual review of information science and technology,2003,37(1),179-255.
    [3]郭燕慧,钟义信,马志勇,等.自动文摘综述.情报学报,2002,21(5):582-591.
    [4]薛为民,陆玉昌.文本挖掘技术研究.北京联合大学学报(自然科学版),2005,19(4),59-63.
    [5]江铭虎.自然语言处理,高等教育出版社,2006.
    [6]孙春葵,钟义信.关于自然语言处理中的文摘生成及其相关技术.计算机科学,1999,26(10):16-19.
    [7]Luhn, H. P.. The Automatic Creation of Literature Abstracts. IBM Journal of research and development,1958,2(2):159-165.
    [8]Baxendale, P. B.. Machine-made index for technical literature-an experiment. IBM Journal of Research and Development,1958,2(4),354-361.
    [9]Edmundson, H. P., Oswald, V. A.. Automatic indexing and abstracting of the contents of documents. Planning Research Corporation,1959.
    [10]IBM Corporation. Advanced System Development Division. ACSI-matic Auto-Abstracting Project, Final Report, Yorktown Heights, New York,1960,1.
    [11]IBM Corporation. Advanced System Development Division. ACSI-matic Auto-Abstracting Project, Final Report, Yorktown Heights, New York,1961,3.
    [12]Edmundson, H. P.. Problems in automatic abstracting. Communications of the ACM,1964,7(4),259-263.
    [13]Edmundson, H. P.. New methods in automatic extracting. Journal of the ACM (JACM),1969,16(2),264-285.
    [14]Rush, J. E., Salvador, R., Zamora, A.. Automatic abstracting and indexing. Ⅱ. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria. Journal of the American Society for Information Science,1971,22(4),260-274.
    [15]Pollock, J. J., Zamora, A.. Automatic abstracting research at chemical abstracts service. Journal of Chemical Information and Computer Sciences,1975, 15(4),226-232.
    [16]Paice, C. D.. Information retrieval and the computer. Macdonald and Jane's, London,1977.
    [17]Schank R C, Abelson R P. Scripts, plans, goals, and understanding:an inquiry into human knowledge structures. Lawrence Erlbaum Associates,1977.
    [18]Reimer, U., Hahn, U.. Text condensation as knowledge base abstraction. InProceedings of the Fourth Conference onArtificial Intelligence Applications, 1988:338-344.
    [19]Hahn, U., Reimer, U.. Text Summarization Based on Terminological Logics. InEuropean Conference of Artificial Intelligence (ECAI),1998:165-169.
    [20]Fum, D., Guida, G., Tasso, C.. Forward and backward reasoning in automatic abstracting. InProceedings of the 9th conference on Computational linguistics, 1982,1:83-88.
    [21]Rau L F. Knowledge organization and access in a conceptual information system. Information Processing & Management,1987,23(4):269-283.
    [22]Rau, L. F., Jacobs, P. S., Zernik, U.. Information extraction and text summarization using linguistic knowledge acquisition. Information Processing & Management,1989,25(4):419-428.
    [23]Jacobs, P. S., Rau, L. F.. SCISOR:Extracting information from on-line news. Communications of the ACM,1990,33(11),88-97.
    [24]DeJong, G.. An overview of the FRUMP system. In Strategies for natural language processing ed. W. G. Lehnert and M. H. Ringle, Lawrence Erlbaum Associates,1982.
    [25]J.I. Tait. Automatic Summarising of English Texts. Ph.D. Thesis, University of Cambridge and Computer Laboratory Technical Report No.47,1982.
    [26]0no K, Sumita K, Miike S. Abstract generation based on rhetorical structure extraction. In Proceedings of the 15th conference on Computational linguistics. Association for Computational Linguistics,1994,1:344-348.
    [27]Miike S, Itoh E, Ono K, et al. A full-text retrieval system with a dynamic abstract generation function. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc.,1994:152-161.
    [28]Kupiec, J., Pedersen, J., Chen, F.. A trainable document summarizer. InProceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,1995:68-73.
    [29]Lin C Y, Hovy E. Identifying topics by position. In Proceedings of the fifth conference on applied natural language processing. Association for Computational Linguistics,1997:283-290.
    [30]Zhang J, Chan R H Y, Fung P, et al. A comparative study on speech summarization of broadcast news and lecture speech[C]//INTERSPEECH.2007:2781-2784.
    [31]Xie S, Liu Y. Improving supervised learning for meeting summarization using sampling and regression [J]. Computer Speech & Language,2010,24(3):495-514.
    [32]Radev D R, McKeown K R. Generating natural language summaries from multiple on-line sources. Computational Linguistics,1998,24(3):470-500.
    [33]Barzilay R, McKeown K R, Elhadad M. Information fusion in the context of multi-document summarization. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics,1999:550-557.
    [34]McKeown K, Hatzivassiloglou V, Barzilay R, et al. Columbia multi-document summarization:Approach and evaluation. In Proceedings of the Document Understanding Workshop,2001.
    [35]McKeown, K. R., Barzilay, R., Evans, D., et al. Tracking and summarizing news on a daily basis with Columbia's Newsblaster. InProceedings of the second international conference on Human Language Technology Research,2002:280-285.
    [36]Radev D R, Jing H, Budzikowska M. Centroid-based summarization of multiple documents:sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization. Association for Computational Linguistics,2000:21-30.
    [37]Radev D R, Jing H, Stys M, et al. Centroid-based summarization of multiple documents. Information Processing & Management,2004,40(6):919-938.
    [38]White M, Korelsky T, Cardie C, et al. Multidocument summarization via information extraction. In Proceedings of the first international conference on Human language technology research. Association for Computational Linguistics, 2001:1-7.
    [39]Lin C Y, Hovy E. From single to multi-document summarization:A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics,2002:457-464.
    [40]Carbonell J, Goldstein J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM,1998:335-336.
    [41]Harabagiu S M, Lacatusu V F, Morarescu P. Multi-Document Summarization with GISTEXTER. In Proceedings of the LREC,2002.
    [42]王兵.美国机编文摘概况.情报学报,1985,4(2):166-171.
    [43]苏海菊,王永成.中文科技文献文摘的自动编制.情报学报,1989,8(6):433-439.
    [44]莫燕,王永成.中文文献摘要的自动编制.现代图书情报技术,1993,(3):10-12.
    [45]王永成,许慧敏.OA中文文献自动摘要系统.情报学报,1997,16(2):128-132.
    [46]王永成,许慧敏.OA-1.4中文自动摘要系统.高技术通讯,1998,(1):19-23.
    [47]吴岩,刘挺,王开铸,等.中文自动文摘原理与方法探索.中文信息学报,1998,12(2):8-16.
    [48]吴岩,李秀坤,王开铸.HIT-97I型英文自动文摘系统.情报学报,1998,17(5):358-364.
    [49]刘挺,王开铸.基于篇章多级依存结构的自动文摘研究.计算机研究与发展,1999,36(4):479-488.
    [50]刘挺,吴岩,王开铸.中文自动文摘系统CAAS的研究与实现.哈尔滨工业大学学报,1999,31(6),59-62.
    [51]李蕾,钟义信,郭祥昊.面向特定领域的理解型中文自动文摘系统.计算机研究与发展,2000,37(4):493-497.
    [52]孙春葵,李蕾,杨晓兰,等.基于知识的文本摘要系统研究与实现.计算机研究与发展,2000,37(7):874-881.
    [53]Chunkui SUN, Yixin ZHONG. A Study of the Techniques of Automatic Abstracting and Knowledge Acquisition Systems中国邮电高校学报(英文版),2001,8(4).
    [54]胡舜耕,刘晓宇,钟义信.基于多Agent技术的自动文摘系统的研究和设计.电子学报,2001,29(2):247-249.
    [55]姚天顺.自然语言理解——一种让机器懂得人类语言的研究.清华出版社,广西科学技术出版社,1995.
    [56]李小滨,徐越.自动文摘系统EAAS软件学报,1991,(4):12-18.
    [57]吴立德,等.大规模中文文本处理.复旦大学出版社,1997.
    [58]薛翠芳,郭炳炎.中文自动文摘系统.第五届中国人工智能联合学术会议,1998:200-206.
    [59]万敏,罗振声,季姮,等.基于概念统计的英文自动文摘研究.计算机工程与应用,2002,38(24):7-9.
    [60]季姮,罗振声,万敏,等.基于概念统计和语义层次分析的英文自动文摘研究.中文信息学报,2003,2:14-20.
    [61]郑义,黄萱菁,吴立德.文本自动综述系统的研究与实现.计算机研究与发展,2003,40(11):1606-1611.
    [62]秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术.哈尔滨工业大学信息检索研究室论文集,2004,(6).
    [63]Zhao, L., Huang, X., Wu, L.. Fudan University at DUC 2005. InProceedings of DUC,2005,2005.
    [64]Zhao, L., Huang, X., Wu, L.. Fudan University at DUC 2006. InProceedings of DUC,2006,2006.
    [65]徐永东.多文档自动文摘关键技术研究(博士学位论文).哈尔滨:哈尔滨工业大学,2007.
    [66]赵林.面向查询的多文档自动文摘关键技术研究(博士学位论文).上海:复旦大学,2008.
    [67]周进华,刘贵全,陈恩红.基于概念共现图的多文档自动摘要研究.中国科学技术大学学报,2009,39(11):1218-1223.
    [68]王红玲,周国栋,朱巧明.面向冗余度控制的中文多文档自动文摘.中文信息学报,2012,(2),474-479.
    [69]Nenkova Ani, Kathleen McKeown. Automatic summarization. Now Publishers Inc, 2011.
    [70]Mani, I.. Automatic summarization. John Benjamins Publishing,2001.
    [71]刘挺,于开铸.自动文摘的四种主要方法.情报学报,1999,18(1):11-19.
    [72]王志琪,王永成,刘传汉.论自动文摘及其分类1).情报学报,2005,24(2):214-221.
    [73]金博,史彦军,滕弘飞,等.自动文摘技术及应用.计算机应用研究,2004(12):13-15.
    [74]金旭,杨炳儒,菅志刚.自动文摘方法分析.计算机应用研究,2004(9):5-6,11.
    [75]Jones, P. A., Paice, C. D.. A'select and generate'approach to automatic abstracting. In14th Information Retrieval Colloquium, Springer London,1993: 141-154.
    [76]L. Eikvil. Information Extraction from World Wide Web:a survey. Technical Report 945, Norwegian Computing Center,1999.
    [77]李保利,陈玉忠,俞士汶.信息抽取研究综述.计算机工程与应用,2003,39(10):1-5.
    [78]Kim J T, Moldovan D I. Acquisition of linguistic patterns for knowledge-based information extraction. IEEE Transactions on Knowledge and Data Engineering,1995,7(5):713-724.
    [79]秦兵,刘挺,李生.多文档自动文摘综述.中文信息学报,2005,19(6):13-20.
    [80]Mani, I., Maybury, M. T.. Advances in automatic text summarization. MIT Press,1999.
    [81]Harman, D., Over, P.. The Effects of Human Variation in DUC Summarization Evaluation. InProceedings of the ACL-04 Workshop:Text Summarization Branches Out,2004:10-17.
    [82]Salton, G., Singhal, A., Mitra, M., et al. Automatic text structuring and summarization. Information Processing & Management,33(2),1997,193-207.
    [83]Jing H, Barzilay R, McKeown K, et al. Summarization evaluation methods: Experiments and analysis. AAAI Symposium on Intelligent Summarization,1998:51-59.
    [84]Nenkova A., Passonneau R., McKeown K.. The Pyramid Method:Incorporating Human Content Selection Variation in Summarization Evaluation. ACM Transactions on Speech and Language Processing (TSLP),2007,4(2):Article 4.
    [85]Radev, D. R., Tam, D. Summarization evaluation using relative utility. InProceedings of the twelfth international conference on Information and knowledge management, ACM,2003:508-511.
    [86]张瑾,王小磊,许洪波.自动文摘评价方法综述.中文信息学报,2008,22(3):81-88.
    [87]Radev, D. R., Tam, D., Erkan, G. Single-document and multi-document summary evaluation using Relative Utility. InProceedings of the twelfth international conference on Information and knowledge management, ACM,2003.
    [88]Soricut R, Marcu D. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Association for Computational Linguistics,2003,1:149-156.
    [89]Lin C Y. Rouge:A package for automatic evaluation of summaries. In Text Summarization Branches Out:Proceedings of the ACL-04 Workshop.2004:74-81.
    [90]Nenkova, A., Passonneau, R.. Evaluating content selection in summarization: The pyramid method. In Proceedings of HLT-NAACL,2004:145-152.
    [91]G. Salton, A. Wong, C. S. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM,1975:18(11):613-620.
    [92]王晓冉.向量空间模型.张华平《网络信息内容安全》讲义,2010.
    [93]郭庆琳,吴克河,吴慧芳,等.基于文本聚类的多文档自动文摘研究.计算机研究与发展,2007,44(2):140-144.
    [94]Salton,G.. Automatic Information Organization and Retrieval. McGraw-Hill, New York,1968, Ch.4.
    [95]Salton, G. Buckley, C.. Term-weighting approaches in automatic text retrieval. Information Processing & Management,1988,24(5):513-523.
    [96]Jain, A. K., Murty, M. N., Flynn, P. J.. Data clustering:a review. ACM computing surveys (CSUR),1999,31(3):264-323.
    [97]Han J., Kamber M., Pei J.. Data Mining:Concepts and Techniques,2nd edition. Morgan kaufmann,2006.
    [98]Andrews N O, Fox E A. Recent developments in document clustering. Computer Science, Virginia Tech, Blacksburg, VA, Technical Report TR-07-35,2007.
    [99]Jain, A. K., Dubes, R. C.. Algorithms for clustering data. Prentice-Hall, Inc.,1988.
    [100]Zeng H J, He Q C, Chen Z, et al. Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM,2004:210-217.
    [101]Cutting D R, Karger D R, Pedersen J 0, et al. Scatter/gather:A cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval,1992:318-329.
    [102]刘远超,王晓龙,徐志明,等.文档聚类综述.2006,20(3):55-62.
    [103]Li Yanjun. High performance text document clustering [D]. Wright State University,2007.
    [104]WyseN., Dubes R., & Jain A.. A critical evaluation of intrinsic dimensionality algorithms. Pattern Recognition in Practice,1980:415-425.
    [105]Bekkerman R, El-Yaniv R, Tishby N, et al. On feature distributional clustering for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,2001:146-153.
    [106]Liu T, Liu S, Chen Z, et al. An evaluation on feature selection for text clustering. In Proceedings of the 12th International Conference on Machine Learning,2003,3:488-495.
    [107]Jardine, N., van Rijsbergen, C. J.. The use of hierarchic clustering in information retrieval. Information storage and retrieval,1971,7(5),217-240.
    [108]Erkan G, Radev D R. LexRank:Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR),2004,22: 457-479.
    [109]Chandrasekaran, K., Gauch, S., Lakkaraju, P., et al.. Concept-based document recommendations for CiteSeer authors. In Proceedings of the 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, 2008:83-92. Springer Berlin Heidelberg.
    [110]He, Q., Pei, J., Kifer, D., et al.. Context-aware citation recommendation. In Proceedings of the 19th international conference on World Wide Web,2010: 421-430. ACM.
    [111]He, Q., Kifer, D., Pei, J., et al.. Citation recommendation without author supervision. InProceedings of the fourth ACM international conference on Web search and data mining,2011:755-764. ACM.
    [112]Walker, D., Xie, H., Yan, K. K., et al.. Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics:Theory and Experiment,2007,2007(06).
    [113]Lao, N., Cohen, W. W.. Relational retrieval using a combination of path-constrained random walks. Machine learning,2010,81(1):53-67.
    [114]Strehl A, Ghosh J, Mooney R. Impact of similarity measures on web-page clustering. In AAAI-2000:Workshop on Artificial Intelligence for Web Search. 2000:58-64.
    [115]Huang A. Similarity measures for text document clustering. In Proceedings of NZCSRSC,2008:49-56.
    [116]K. M. Hammouda, M. S. Kamel. Efficient phrase-based document indexing for web document clustering. IEEE Transactions on knowledge and data engineering, 2004,16(10):1279-1296.
    [117]Zu Eissen, S. M., Stein, B., Potthast, M.. The suffix tree document model revisited. InProceedings of the 5th International Conference on Knowledge Management,2005:596-603.
    [118]van Raan, A. F.. Fractal geometry of information space as represented by co-citation clustering. Scientometrics,1991,20(3),439-449.
    [119]王秀艳,崔雷.基于文本主题相似性的专题文献检索结果的聚类分析.情报学报,2011,30(5):456-463.
    [120]Small H G. Cited documents as concept symbols. Social studies of science, 1978,8(3):327-340.
    [l2l]Small, H.. Citation context analysis. In B. Dervin & M. J. Voigt, Eds, Progress in Communication Sciences, Ablex Publishing,1982,3:287-310.
    [122]Moravcsik M J, Murugesan P. Some results on the function and quality of citations. Social studies of science,1975,5(1):86-92.
    [123]Nanba, H., Okumura, M.. Towards multi-paper summarization using reference information. InInternational Joint Conference on Artificial Intelligence, LAWRENCE ERLBAUM ASSOCIATES LTD,1999,16:926-931.
    [124]S. Teufel, A. Siddharthan, D. Tidhar. Automatic classification of citation function. InProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics,2006:103-110.
    [125]J. Schneider. Verification of bibliometric methods' applicability for thesaurus construction. PhD thesis, Department of Information Studies, Royal School of Library and Information Science,2004.
    [126]马晓雷,文秋芳.基于文本聚类的被引内容分析——一种分析领域知识的新方法.图书情报工作,2011,55(4):110-113.
    [127]Nakov, P. I., Schwartz, A. S., Hearst, M.. Citances:Citation sentences for semantic analysis of bioscience text. InProceedings of the SIGIR'04 workshop on Search and Discovery in Bioinformatics,2004:81-88.
    [128]Kataria, S., Mitra, P., Bhatia, S.. Utilizing context in generative bayesian models for linked corpus. In TheAssociationfor theAdvancementofArtificialIntelligence, 2010:1340-1345.
    [129]Aaron Elkiss, Siwei Shen, Anthony Fader et al. Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology,2008,59(1),51-62.
    [130]A. S. Schwartz, M. Hearst. Summarizing key concepts using citation sentences. In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology,2006:134-135.
    [131]Aljaber, B., Stokes, N., Bailey, J., et al. Document clustering of scientific texts using citation contexts. Information Retrieval,2010,13(2), 101-131.
    [132]Bradshaw, S.. Reference directed indexing:Redeeming relevance for subject search in citation indexes. InResearch and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg.2003:499-510.
    [133]Ritchie, A., Robertson, S., Teufel, S.. Comparing citation contexts for information retrieval. InProceedings of the 17th ACM conference on Information and knowledge management,2008:213-222.
    [134]Ritchie, A.. Citation context analysis for information retrieval. Ph. D Thesis, University of Cambridge,2008.
    [135]Liu, X., Zhang, J., Guo, C.. Full-text citation analysis:A new method to enhance scholarly networks. Journal of the American Society for Information Science and Technology,2013,64(9),1852-1863.
    [136]Rubin, R.. Foundations of library and information science (3rd ed.). New York:Neal-Schuman Publishers,2010.
    [137]Garfield, E..Citation indexing:Its theory and application in science, technology, and humanities, Philadelphia:ISI Press,1983.
    [138]Leydesdorff L, Amsterdamska 0. Dimensions of citation analysis. Science, Technology, and Human Values,1990:305-335.
    [139]Small, H.. Co-citation in the scientific literature:A new measure of the relationship between two documents.Journal of the American Society for information Science,1973,24(4):265-269.
    [140]Marshakova, I. V.. A system of document connection based on references. Scientific and Technical Information Serial of VINITI,1973,6(2):3-8.
    [141]M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation,1963,14:10-25.
    [142]Garfield, E.. From Bibliographic Coupling to Co-Citation Analysis Via Algorithmic Historio-Bibliography:A Citationist's Tribute to Belver C. Griffith. Drexel University, Philadelphia, PA,2001.
    [143]Small H, Griffith B C. The structure of scientific literatures I: Identifying and graphing specialties. Science studies,1974,4(1):17-40.
    [144]Small, H., Sweeney, E., Greenlee, E.. Clustering the Science Citation Index using co-citations. II. Mapping science. Scientometrics,1985,8(5),321-340.
    [145]Wijaya D T, Bressan S. Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks:a comparative performance analysis of algorithms. International Journal of Web Information Systems,2006,2(2):69-76.
    [146]Persson 0. The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science,1994,45(1):31-38.
    [147]Tarrant D, Carr L. Using the Co-Citation Network to Indicate Article Impact. Alt-Metrics Workshop @ Web Science 2011, Koblenz, Germany,2011.
    [148]White, H. D. Griffith, B.C.. Author cocitation:Aliterature measure of intellectual structure. Journal of the American Society for Information Science, 1981,32(3),163-171.
    [149]Gipp B, Beel J. Citation Proximity Analysis (CPA)-A new approach for identifying related work based on Co-Citation Analysis. In Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09). 2009,2:571-575.
    [150]BEEL J, GIPP B. METHOD AND SYSTEM FOR DETECTING A SIMILARITY OF DOCUMENTS: WIPO Patent 2010078859.2010.
    [151]Callahan A, Hockema S, Eysenbach G. Contextual cocitation:Augmenting cocitation analysis and its applications. Journal of the American Society for Information Science and Technology,2010,61(6):1130-1143.
    [152]Liu S, Chen C. The effects of co-citation proximity on co-citation analysis. In Proceedings of ISSI.2011:474-484.
    [153]Liu S, Chen C. The proximity of co-citation. Scientometrics,2012,91(2): 495-511.
    [154]Boyack K W, Small H, Klavans R. Improving the Accuracy of Co-citation Clustering Using Full Text. Journal of the American Society for Information Science and Technology,2013:155-165.
    [155]Saurabh Sharma, Vishal Gupta. Recent Developments in Text Clustering Techniques. International Journal of Computer Applications,2012,37(6):14-19.
    [156]Beil, F., Ester, M., Xu, X.. Frequent term-based text clustering. InProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2002:436-442.
    [157]Luo, C., Li, Y., Chung, S. M.. Text document clustering based on neighbors. Data& Knowledge Engineering,2009,68(11),1271-1288.
    [158]刘金岭.基于语义密度的文本聚类研究.计算机工程,2010,36(5):81-83.
    [159]Tombros, A.. The effectiveness of query based hierarchic clustering of documents for information retrieval (Doctoral dissertation). University of Glasgow,2002.
    [160]Koshman, S., Spink, A., Jansen, B. J.. Web searching on the Vivisimo search engine. Journal of the American Society for Information Science and Technology, 2006,57(14),1875-1887.
    [161]Osinski, S., Weiss, D.. Carrot2:Design of a Flexible and Efficient Web Information Retrieval Framework. InAdvances in Web Intelligence:Third International Atlantic Web Intelligence Conference,2005,3528:439-444.
    [162]刘博晓.基于引用关系和聚类分析的文献检索优化研究.情报理论与实践,2012,35(006):101-104.
    [163]Fang, Y. C., Parthasarathy, S., Schwartz, F.. Using clustering to boost text classification. InICDM Workshop on Text Mining,2001.
    [164]Liu, B., Chin, C. W., Ng, H. T.. Mining topic-specific concepts and definitions on the web. InProceedings of the 12th international conference on World Wide Web, ACM,2003:251-260.
    [165]Aggarwal, C. C., Yu, P. S.. Finding generalized projected clusters in high dimensional spaces. ACM,2000,29(2):70-81.
    [166]Buchholz S, Marsi E. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning. Association for Computational Linguistics,2006:149-164.
    [167]赵世奇,刘挺,李生.一种基于主题的文本聚类方法.中文信息学报,2007,21(2),58-62.
    [168]姚清耘,刘功申,李翔.基于向量空间模型的文本聚类算法.计算机工程,2008,34(18):39-41.
    [169]Kaufman, L. Rousseeuw, P.J.. Clustering by means of Medoids, in Statistical Data Analysis Based on the L,-Norm and Related Methods, edited by Y. Dodge, North-Holland,1987,405-416.
    [170]赵书慧.K中心点算法——PAM的分析与实现.福建电脑,2008(6):104-105.
    [171]R. Xu, D. C. Wunsch:Clustering, IEEE Press,2008.
    [172]Manning C D, Raghavan P, Schutze H. Introduction to information retrieval. Cambridge:Cambridge University Press,2008.
    [173]Goldstein, J., Mittal, V., Carbonell, J., et al. Multi-document summarization by sentence extraction. InProceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization,2000,4:40-48.
    [174]Qazvinian, V. Radev, D. R.. Scientific paper summarization using citation summary networks. InProceedings of the 22nd International Conference on Computational Linguistics. Association for Computational Linguistics,2008,1: 689-696.
    [175]Qazvinian, V., Radev, D. R., Ozgur, A.. Citation summarization through keyphrase extraction. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics,2010:895-903.
    [176]Qazvinian, V. Radev, D. R.. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics,2010:555-564.
    [177]Mohammad, S., Dorr, B., Egan, M., et al. Using citations to generate surveys of scientific paradigms. InProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics,2009:584-592.
    [178]Abu-Jbara, A. Radev, D.. Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Association for Computational Linguistics,2011,1:500-509.
    [179]Kaplan, D., Iida, R., Tokunaga, T.. Automatic extraction of citation contexts for research paper summarization:A coreference-chain based approach. InProceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries. Association for Computational Linguistics,2009:88-95.
    [180]Nanba H, Kando N, Okumura M. Classification of research papers using citation links and citation types:towards automatic review article generation, The American Society for Information Science/the 11th SIC Classification Research Workshop.2000:117-134.
    [181]Mei, Q. Zhai, C.. Generating impact-based summaries for scientific literature. In Proceedings of ACL-08:HLT. Association for Computational Linguistics,2008:816-824.
    [182]Hoang, C. D. V., Kan, M. Y.. Towards automated related work summarization. InProceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics,2010:427-435
    [183]Contractor, D., Guo, Y., Korhonen, A.. Using argumentative zones for extractive summarization of scientific articles. In Proceedings of the 24th International Conference on Computational Linguistics,2012:663-678.
    [184]刘向威.NLP技术在中文信息检索中的应用研究(博士学位论文).天津:天津大学,2005.
    [185]Nenkova A, Vanderwende L. The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005-101,2005.
    [186]Seki, Y.. Sentence extraction by tf/idf and position weighting from newspaper articles. In Proceedings of the 3rd national institute of informatics test collection information retrieval (NTCIR) workshop,2002.
    [187]Lin, Chin -Yew, Eduard Hovy. The automated acquisition of topic signatures for text summarization. In Proceedings of the 18th conference on Computational linguistics. Association for Computational Linguistics,2000,1:495-501.
    [188]Morris, J., Hirst, G.. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational linguistics,1991,17(1):21-48.
    [189]Song, Y. I., Han, K. S., Rim, H. C.. A term weighting method based on lexical chain for automatic summarization. InComputational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg,2004:636-639.
    [190]Siddharthan, A., Nenkova, A., McKeown, K.. Syntactic simplification for improving content selection in multi-document summarization. InProceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics,2004.
    [191]Mihalcea, R., Tarau, P.. TextRank:Bringing order into texts. In Proceedings of EMNLP,2004,4(4).
    [192]McDonald R. A study of global inference algorithms in multi-document summarization. Advances in Information Retrieval. Springer Berlin Heidelberg, 2007:557-564.
    [193]Page, L., Brin, S., Motwani, R., et al. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab,1998.
    [194]0tterbacher J, Erkan G, Radev D R. Biased LexRank:Passage retrieval using random walks with question-based priors. Information Processing & Management, 2009,45(1):42-54.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700