用户名: 密码: 验证码:
数据挖掘技术在图书馆中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术的发展,图书馆中需要存储和传播的信息量越来越大,信息的种类和形式越来越丰富。由于用户对信息和文献资料类型的需求越来越多样化、越来越广泛,因此个性化的信息服务成为了新的发展趋势。个性化服务需要用户的兴趣、图书间的关联等信息的支持,而这些信息能够通过对图书馆的日常业务数据分析和挖掘获得。
     本文以图书馆借阅数据为对象,在对其特点分析的基础上,选择适用于图书馆数据的挖掘算法——关联规则中的FP-growth算法和聚类算法,并对FP-growth算法进行了减化;以北京林业大学图书馆为例,对其借阅数据进行挖掘,应用关联规则算法在图书馆优化书架书库的管理、发现学科间的隐性关联、指导读者的借阅行为和提供个性化服务等方面,应用聚类算法在图书馆对读者借阅规律的分析和把握、馆藏图书质量判断等方面均获得了具有积极意义的信息。同时,也证明所采用的算法的有效性与可行性;最后,设计和实现了图书馆数据挖掘信息系统,该系统将提高图书馆个性化服务的质量和水平,更好地满足读者的借阅需求。
With the development of information technology, libraries need to store and spread more and more large quantities of information, so the forms and types of information have become more and more abundant. Due to the various and extensive requirements of clients for information and documents data, the personalized information service has become a new development trend. The personalized service needs supports such as customer’s interests, the association between books and information and etc., which can be collected from daily data analysis and data mining.
     The object of the study is the library’s data, on the basis of analysis of which features the algorithms fit for the data mining– FP-growth algorithm and cluster algorithm are chosen and the FP-growth algorithm is improved. Take the library in Beijing forestry for example, this paper mines its reading requirement data and applies the association algorithm in the library to optimize the management of the bookshelves and stack, find the hidden association between the disciplines, instruct the readers’reading activities and supply the personalized service. The cluster algorithm is applied in the library to analyze the reader’s lending rule and grasp and judge the books’quality. Both of applications get the positive information. At the same time, the paper proves the algorithm’s validity and feasibility. In the end, the data mining information system in the library is designed and realized. The system will improve the quality and level of the personalized service in the library, and meet the reader’s reading requirement.
引文
[1] 鲍翠梅,王尊新,白如江.数据挖掘技术及其在图书馆中的应用[J].情报杂志,2004,(9).
    [2] 蔡会霞,朱洁,蔡瑞英.关联规则的数据挖掘在高校图书馆中的应用[J].南京工业大学学报, 2005,27(1).
    [3] 蔡文彬.网络环境下图书馆的角色定位[J].大学图书馆学报,2000,(3).
    [4] 陈京民.数据仓库与数据挖掘技术[M].北京:电子工业出版社,2004.
    [5] 陈文庆,许棠.关联规则挖掘 Apriori 算法的改进与实现[J].微机发展,2005,15(8).
    [6] 陈源蒸.“馆藏资源数字化”与“社会资源馆藏化”的抉择[J].大学图书馆学报,2000,(4).
    [7] 董秀敏.大学图书馆的信息服务与信息资源建设前瞻[J].大学图书馆学报,2000,(2).
    [8] 范明,李川.在 FP-树中挖掘频繁模式而不生成条件 FP-树[J].计算机研究与发展,2003,40 (8).
    [9] 傅守灿.电子图书馆及其相关技术和问题研究[J].现代图书情报技术,1996,(3).
    [10] 耿晓中,张冬梅.数据挖掘综述[M].长春:长春师范学院学报,2006.6,25(3).
    [11] 郭海明.数字图书馆个性化服务方式综述[J].津图学刊,2003,(6):33-37.
    [12] 贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,(1):10-13.
    [13] 黄兰.数据挖掘技术在图书馆工作中的应用[J].图书馆学研究,2005,30(152):15-17.
    [14] 韩家玮.数据挖掘概念与技术[M].范明,孟晓峰译.北京:机械工业出版社,2001.
    [15] 姜园,张朝阳,仇佩亮,周东方.用于数据挖掘的聚类算法[J].电子与信息学报,2005.4,(27) :655-662.
    [16] 纪秋颖,林健.基于核方法的聚类算法及其应用[J].北京航空航天大学学报,2006.6,32(6).
    [17] 李朝葵,凌云.数据挖掘及其在图书馆中的应用[J].情报杂志,2002,(6):33-34.
    [18] 李桂影,冯更中.基于文献聚类的国内外数字图书馆研究的比较分析[J].大学图书馆学报, 2007,(3).
    [19] 李盼池.基于核聚类算法的高校图书借阅信息分类方法[J].现代情报,2003.9(9).
    [20] 李玮平.基于数据挖掘的图书馆读者需求分析.[J].图书馆论坛,2004.24(3).
    [21] 李云强.数据挖掘中关联规则算法的研究[J].大众科技,2006,(1):89-90.
    [22] 刘晓东.数据挖掘在图书馆工作中的应用[J].情报技术,2005.8.
    [23] 刘炜.数字图书馆引论[M].上海:上海科学技术文献出版社,2003.
    [24] 卢共平.数字图书馆的个性化信息服务[J].图书情报工作,2002,(8):10-12.
    [25] 马建霞.图书馆数字资源访问统计研究[J].图书馆杂志,2005,24(8):25-29.
    [26] 马文峰.论数字图书馆个性化推荐系统[J].现代图书情报技术,2003,(2):16-18.
    [27] 邱桃荣,白小明,张丽萍.基于粒计算的Apriori算法及其在图书管理系统中的应用[J].微计 算机信息,2006,22(7).
    [28] 宋丽哲,牛振东,宋瀚涛.数字图书馆的个性化服务[J].计算机工程,2004,(3):46-48.
    [29] 谭观音,李继宏.高校图书馆期刊选订的模糊决策[J].现代情报,2003,(9):149-151.
    [30] 徐瑞,乔志萍,李伟华.单维关联规则快速Apriori算法研究[J].微电子学与计算机,2005,22 (2).
    [31] 徐勇,周森鑫.一种改进的关联规则挖掘方法研究[J].计算机技术与发展.2006.16(3).
    [32] 杨红梅,毛燕梅.北方工业大学图书馆数字资源建设实践[J].现代图书情报技术,2005,(2):90-93.
    [33] 袁媛,杜小勇,马文峰.数字图书馆信息服务平台的建设[J].现代图书情报技术,2003,(5): 8-10.
    [34] 张利,吴慰慈.21 世纪图书馆读者服务的发展趋势[J].大学图书馆学报,2000,(4).
    [35] 赵丹群.数据挖掘:原理、方法及其应用[J].现代图书情报技术,2000,(6):33-34.
    [36] 赵宏波,孟雅玲.数据挖掘在电信客户关系管理中的应用[J].电信技术,2001,(12):9-12.
    [37] 赵继海,史国祥.数字图书馆的用户管理与服务[J].情报学报,2001,(2):224-231.
    [38] 朱明,数据挖掘[M].合肥:国科学技术大学出版社,2002.
    [39] 朱玉全,孙志辉,季小俊.基于频繁模式树的关联规则增量式更新算法[J].计算机学报, 2003,26(1).
    [40] AGRAWAL R, SRIKANT R. Fast algorithms for mining association rules [A]. Santiago: Proc. 1994 Int’l Conf.Very Large Data Bases(VLDB’94),1994,12(5):487-499.
    [41] AGRAWAL R, SHAFER J.Parallel mining of association rule [J].IEEE Trans.On Knowledge and Data Engineering, 1996,6(6).
    [42] AGARWAL R C, AGGARWAL C C, PRASAD V V V .A tree projection algorithm for generation of frequent Itemsets [J].Journal of Parallel and Distributed Computing,2001,61(3).
    [43] CHEUNG DW, LEE SD,KAO B.A general incremental technique for maintaining discovered association rules [A].Proceedings of Database Systems for Advanced Applications,1997,14(2):185 -194.
    [44] D TSUR, J D ULLMAN, S ABITBOUL, C CLIFTON,R MOTWANI,S NESTOROV. Query flocks:A generalization of association-rule mining [J]. Seattle, Washington: SIGMOD'98,1998, 10(3):1-12.
    [45] GUHA S, RASTOGI R, SHIM K.CURE: An Efficient Clustering Algorithm for Large Databases [C].Seattle: Proceedings of the ACM SIGMOD Conference,1998,11(6):73-84.
    [46] HAN J, FU Y.Discovery of multiple-level association rules from large databases [J].Zurich, Switzerland VLDB'95,1995,2(6):420-431.
    [47] HAN J, KAMBER M, TUNGA K H. Spatial Clustering Methods in Data Mining [J].A Survey [C].Geographic DataMining and Knowledge Discovery,2001,8(10).
    [48] HAN J, J PEI, Y YIN. Mining frequent patterns without candidate generation [J].In: Proc of 2000 ACM-SIGMOD Int’l Conf on Management of Data. Dallas, TX: ACM Press,2000,3(9):1-12.
    [49] HONG CHENG, PHILIP S YU, JIAWEI HAN.AC-Close:Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery [J].Urbana:University of Illinois at Urbana-Champaign,IBM T.J.Watson Research Center,2004,19(6):40-43.
    [50] LIU J, PAN Y, WANG K,etal.Mining frequent item sets by opportunistic projection [A].The 8th ACM SIGKDD International Conference on Knowledge Discovery and DataMining,2002, 9(20): 229-238.
    [51] N PASQUIER,Y BASTIDE,R TAOUIL,L LAKHAL.Discovering frequent closed itemsets for association rules [J].Jerusalem,Israel:ICDT'99,1999,6(7):398-416.
    [52] PASQUIERN,BASTIDE Y,TAOUILR,etal.Discovering frequent closed item sets for association rules [A].Proc.7thInt’l Conf.Database Theory(ICDT’99),1999,9(20):38-41.
    [53] PARK J S,CHEN M-S,YU P S.An effective Hash-based algorithm for mining association rules [A].Proceedings of 1995 ACM-SIGMOD Int’l ConfonManagement of Data(SIGMOD’95),1995, 12(14).
    [54] PARK J S,CHEN M S,YU P S.Effective parallel data mining for association rules [Z].In ACM Int’l Conf on Information and Knowledge Management,1995,3(2).
    [55] PEI J,HAN J,MAO R.CLOSET:An efficient algorithm for mining frequent closed item sets [Z].In Proc.ACM SIGMOD Workshop on Research Issues in DataMining and Knowledge Discovery, 2000,8(3):23-46.
    [56] RAMIN AFSHAR.Mining frequent max and closed sequential patterns [D].Alberta:University of Alberta,2000,6(4).
    [57] SAVASERE A,OMIECINSKI E,NAVATHE S.An efficient algorithm for mining association rules in large databases [A].VLDB’95,1995,8(24):432-443.
    [58] S BRIN,R MOTWANI,and C SILVERSTEIN.Beyond market basket: Generalizing association rules to correlations [J].Arizona:SIGMOD'97,1997,2(8):265-276.
    [59] SCHUSTER A,WOLFF R,TROCK D.A high-performance distributed algorithm for mining association rules [Z].In Proc of 3rd IEEE Int’l Conf on DataMining(ICDM’03).2003,4(8).
    [60] SHENOY P,HARITSA J R,SUDARSHAN S,etal.Turbo-charging vertical mining of large Databases [Z].SIGMOD Conference,2000,7(32):22-33.
    [61] TUNG A K H,HOU J,HAN J.Spatial Clustering in the Presence of Obstacles [C].Heidelberg: Proceedings of the 17 th ICDE,2001,3(5):359-367.
    [62] WANG J,HAN J,PEI J.Closet+:Searching for the best strategies for mining frequent closed item sets [Z].Proc of ACM SIGKDD’03.Washington DC,2003,9(3):78-92.
    [63] ZAKIM J,GOUDA K. Fast vertical mining using diffsets [Z].Washington,DC:In Proc. Of ACM SIGKDD’03,2003,23(4):58-69.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700