用户名: 密码: 验证码:
基于内容和协作的科技文献过滤方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
面对因特网上日益增多的在线可读文本,文本过滤旨在帮助用户获取自己感兴趣的文本,实现信息服务的个性化,因此它具有广泛的应用背景和较高的实用价值。
     文本过滤的形式可大致分为两种:内容过滤和协作过滤。内容过滤主要采用自然语言处理、人工智能、概率统计等技术对文本进行内容分析,然后与用户模型进行相似度计算,主动将相似度高的文本发送给该用户模型的注册用户。协作过滤主要利用兴趣相似用户的评价进行预测和推荐。目前它已被成功地应用于个性化推荐系统中。但随着系统规模的扩大,它的效能会逐渐降低,暴露出矩阵稀疏性、扩展性和早期级别等问题。
     本文首先对文本过滤的两种形式进行了描述,然后对协作过滤技术进行了较深入的探讨。针对协作过滤方法的某些缺点,提出了一种改进的过滤算法-基于信息项的协作过滤算法。该算法有效地解决了稀疏性和扩展性等问题。本文还提出了一种结合内容过滤和协作过滤的文本过滤方法,该方法充分利用两种过滤技术的优点,有效地解决了早期级别等问题,使过滤系统的性能得到了提高。最后,本文介绍了用户兴趣模型构造方法,即显式反馈学习和隐式反馈学习方法以及实验系统中用户兴趣模型的三种刷新依据(注册RG、查询QY、反馈FB)。
     为了对我们提出的改进的协作过滤算法和结合过滤方法进行评价,我们研制了一个中文计算机科技文献自动过滤原型系统。实验结果表明,改进的协作过滤算法优于基于用户的协作过滤算法;结合两种过滤技术后的系统具有更好的性能。
Text Filtering is of great value and used widely with.the increasing online readable text information because it can help users get information which they are interested in and realize personalized information service.
    There are two kinds of text filtering: Content-based Filtering and Collaborative Filtering. Content-based Filtering mainly adopts some technologies such as natural language processing, artificial intelligence and probability statistic to analyze text content, then calculate degree of similarity between content vector and user profiles vector and select high correlative text to registered users. Collaborative Filtering mainly makes use of users' opinions who have similar interest to predict and recommend. Now it has been used in personalized recommendation system. But with the system scale enlarging, its efficiency gradually declines and some problems such as Sparsity, Scalability and Early rater will appear.
    First, two kinds of text filtering approach are described, then collaborative filtering technologies are deeply studied. Aiming at some problems of collaborative filtering technologies, we have explored item-based collaborative filtering algorithm, which solves effectively Sparsity and Scalability problems. Second, a new text filtering approach that combines content-based filtering with collaborative filtering is proposed, which makes full use of the advantages of content-based filtering and collaborative filtering and solves effectively Early rater problem and improves system performance. Lastly, the construction approach for User Profile is described, which includes explicit feedback learning and implicit feedback learning. Three foundations of the experiment system for updating User Profiles are also described at the end of the thesis.
    In order to evaluate our new collaborative filtering algorithm and combined approach, we have developed a Prototype System for Chinese computer science literature automatic filtering. The results of experiment prove that improved filtering algorithm is better than user-based filtering algorithms and combined filtering approach has better system performance.
引文
[1] Oard D.W., Marchionini G.. A Conceptual Framework for Information Filtering (Tech. Rep. No. CS-TR-3643). University of Maryland, Computer Science Department. Postscript version.1996.
    [2] Belkin N J, Croft W B. Information Filtering and information Retrieval: two sides of the same coin. Communication of ACM, 1992,35(12): 29~38.
    [3] Peter W. Foltz, Suzan T. Dumais. Personalized Information Delivery An Analysis of Information Filtering Methods. Communication of ACM, 1992,35(12): 51~60.
    [4] 林鸿飞.基于混合模式的文本过滤模型.计算机研究与发展,2001.9,38(9):1127~1131.
    [5] 林鸿飞,杨元生.用户兴趣模型的表示和更新机制.计算机研究与发展,2002.7,39(7):843~847.
    [6] 张永奎,郭文宏,牛伟霞,李荣陆.网上中文信息过滤技术的研究.第一届中文信息处理发展国际研讨会,2001.4,上海.
    [7] Badrul Sarwar, George Karypis, Joseph konstan, John Riedl. Item-based Collaborative Filtering Recommendation Alogorithms. In Proceedings of the Tenth International World Wide Web Conference on WWW, 2001, 285~295.
    [8] 林鸿飞.中文文本过滤的逻辑模型.东北大学博士论文,2000.5.
    [9] T.W.malone, K.R.Grant, F.A.Turbak, S.A.Brobst, and M.D.Cohen. Intelligent Information Sharing Systems. Communications of the ACM, 30(5): 390~402,1987.
    [10] 牛伟霞.科技文献过滤中的用户兴趣模型研究.山西大学硕士学位论文,2001.6.
    [11] Goldberg,D.,Nichols,D.,Terry,D. Using Collaborative Filtering to Weave an Information Tapestry. Communications of the ACM.1992,12.
    [12] Konstan,J.A., Miller,B.N., Maltz,D., Herlocker,J.L.. GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 1997, 40(3), 77~87.
    [13] Resnick,P., Iacovou,N., Suchak,M., Bergstrom,P., Riedl,J.. GroupLens: An open architecture for collaborative filtering of netnews. Proceedings of 1994 Conference on Computer Supported Collaborative Work, 1994, 175~186.
    [14] Shardanand,U., Maes,P.. Social Information Filtering: Algorithms for Automating "Word of Mouth". Proceedings of ACM CHI'95. Denver, CO., 1995, 210~217.
    [15] Hill,W., Stead,L., Rosenstein,M., Fumas,G.W.. Recommending and Evaluating Choices in a Virtual Community of Use. Proceedings of ACM CHI'95 Conference on human factors in computing systems. Denver, CO., 1995, pp. 194~201.
    [16] Dahlen,B.J., Konstan,J.A., Herlocker,J.L., Good,N., Borchers,A., Riedl,J.. Jump-starting movielem: User benefits of starting a collaborative filtering system with "dead data". University of Minnesota TR 98~017.
    [17] Marko Balabanovie, Yoav Shoham. Fab: Content-Based, Collaborative Recommendation. Communications of the ACM, 1997, 10(3): 66~72.
    
    
    [18] Mark Clayool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Matthew Sartin. Combining Content-Based and Collaborative Filters in an Online Newpaper. In Proceedings of the SIGIR-99 Workshop on Recommender Systems: Algorithms and Evaluation.
    [19] Ian M.Sboroff, Charles K.Nicholas. Combining Content and Collaborative in Text Filtering. In Proceedings of the IJCAI'99 Workshop on Machine Learning in Information Filtering, 86~91.
    [20] Kurt Bollacker, Steve Lawrence, and C. Lee Giles. A system for automatic personalized tracking of scientific literature on the web. In Digital Libraries 99-The Fourth ACM Conference on Digital Libraries, 105~113, New York, 1999. ACM Press.
    [21] Steve Lawrence, and C. Lee Giles, Kurt Bollacker. Digital Libraries and Autonomous Citation Indexing. http://www.researchindex.com/aci-computer99.pdf.
    [22] Steve Lawrence, Kurt Bollacker, and C. Lee Giles. Indexing and Retrieval of Scientific Literature. Eighth International Conference on Information and Knowledge Management, CIKM 99, Kansas City, Missouri, November 2~6, 139~146, 1999.
    [23] 黄萱菁.大规模中文文本的检索、分类与摘要研究.复旦大学博士论文,1998.5.
    [24] 林鸿飞,战学刚,姚天顺.基于概念的文本结构分析方法.计算机研究与发展,2000.3,37(3):324~328.
    [25] 林鸿飞,李业丽,姚天顺.中文文本过滤的信息分流机制.计算机研究与发展,2000.4,37(4):470~476.
    [26] 卢增祥,关宏超,李衍达.利用Bookmark服务进行网络信息过滤.软件学报,2000.4,11(4):545~550.
    [27] 路海明,徐晋晖,卢增祥,李衍达.一种基于奇异值分解的双语信息过滤算法.中文信息学报,1999,13(3):18~25.
    [28] 蔡登,卢增祥,李衍达.信息协同过滤.智能信息检索论坛,2002.
    [29] 阮彤,冯东雷,李京.基于贝叶斯网络的信息过滤模型研究.计算机研究与发展,2002.12,39(12):1564~1570.
    [30] 牛伟霞,张永奎.用潜在语义方法进行信息过滤.计算机工程与应用.2001.5,37(9):57~59.
    [31] 张永奎.基于分类模板的用户模型构造方法.山西大学学报,2002.5,25(2):109~111.
    [32] 李荣陆.基于浏览行为的用户兴趣模型研究.山西大学硕士学位论文,2002.4.
    [33] 吴立德等.大规模中文文本处理.复旦大学出版社,1997.7.
    [34] Terveen, L., Hill, W., Amento, B., McDonald, D. and Creter, J. PHOAKS: A System for Sharing Recommendations. Communications of the ACM, 1997, 40(3), 59~62.
    [35] Masahiro Morita and Yoichi Shinoda. Information filtering based on user behavior analysis and best match text retrieval. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994, 272~281.
    [36] Daniel Billsus, Michael J.Pazzani. Learing Collaborative Information Filters. In Proceedigs of ICML'98, 46~53.
    
    
    [37] John S.Breese, David Heckerman, Carl Kadie. Empirical Analysis of Predictive Alogorithms for Collaborative Filtering. In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann,July 1998.
    [38] Herlocker,J.L., Konstan,J.A., Borchers,A., Riedl,J.. An algorithmic framework for performing collaborative filtering. Proceedings of the 1999 Conference on Research and Development in Information Retrieval.
    [39] Lyle H.Ungal, Dean P.Foster. Clustering Methods for collaborative Filtering. In Workshop on Recommender Systems at the 15th National Conference on Artificial Intelligence.
    [40] Sarwar,B.M., Karypis,G., Konstan,J.A., and Riedl,J. Analysis of Recommendation Alogorithms for E-Commerce. In Proceeding of the ACM EC'00 Conference Minneapolis,MN. 2000, 158~167.
    [41] 李荣陆,张永奎,牛伟霞.基于概念的信息过滤技术探讨.第六届计算语言学联合学术会议,2001.8.
    [42] 白丽君,张永奎,陈鑫卿.协作过滤研究概述.电脑开发与应用,2002.11,15(11):2~3.
    [43] 白丽君,张永奎,李荣陆.基于智能agent的用户兴趣发现和更新.计算机工程,2003.2,29(2):236~237.
    [44] Prem Melville, Raymond J.Mooney, Ramadass Nagarajan. Content-Boosted Collaborative Filtering for Improved Recommendations. In Proceedings of the Eighteenth National Conference on Artificial In telligence(AAAI-2002), Edmonton, Canada, July 2002.
    [45] 赖茂生,王延飞,赵丹群.计算机情报检索.北京大学出版社,1996.6.
    [46] 赵亮,胡乃静,张守志.个性化推荐算法设计.计算机研究与发展,2002.8,39(8):986~990.
    [47] 曾春,刑春晓,周立柱.个性化服务技术综述.软件学报,2002.10,13(10):1952~1961.
    [48] 韩客松,王永成.中文全文标引的主题词标引和主题概念标引方法.情报学报,2001.2,20(2):212~216.
    [49] 王继成,萧嵘等.Web信息检索研究进展.计算机研究与发展,2001.2,38(2):187~193.
    [50] 邹涛,王继成,张福炎等.文本信息检索技术综述.计算机科学,1999.9,26(9):72~75.
    [51] 王继成,邹涛等.基于Internet的信息资源发现技术与实现.计算机研究与发展,1999.11,36(11):1369~1374.
    [52] 李晓黎.WEB信息检索与分类中的数据采掘研究.中国科学院博士论文,2001.5.
    [53] 欧洁,林守勋,刘桂林.个性化智能信息提取中的用户兴趣发现.计算机科学,2001.3,28(3):112~115.
    [54] 赵仲孟,张蓓,沈均毅.对搜索引擎未来发展的探讨.计算机科学,2001.3,28(3):60~61.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700