用户名: 密码: 验证码:
加权频繁模式挖掘算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术尤其是网络技术的快速发展,人们收集、存储和传输数据的能力不断提高,各类应用领域的数据量大量产生,数据挖掘日益成为数据分析和决策支持的一种重要工具。频繁模式挖掘是数据挖掘领域内的一个基本问题,它被广泛应用到多种领域和挖掘任务中,例如Web挖掘、零售业数据挖掘、科学研究、关联规则挖掘、序列模式挖掘、路径分析等。
     在传统的频繁模式挖掘方法中,事务数据库(Transaction Database,TDB)中的每个项都被看成是同等重要的,而在实际应用中,每个项通常具有不同的重要性。为解决这个问题,本文研究了加权频繁模式挖掘(Weighted Frequent Pattern Mining,WFPM)问题,其主要特征就是在挖掘过程中通过给TDB中各个项目赋予不同的权值来体现它们的重要性不同。经过十几年的研究,已经奠定了较成熟的频繁模式挖掘算法理论基础,但对于加权频繁模式挖掘算法及其应用的研究工作尚处在初始阶段,值得深入研究和探讨。在加权事例中,加权支持度度量不再满足向下闭合特性(亦称反单调性(Antimonotonicity)),传统的频繁模式挖掘算法已经不再适用,因此在加权频繁模式算法中,研究的主要关注点是如何使加权支持度或兴趣度度量满足向下闭合特性(Downward Closure Property),以便高效地剪枝搜索空间。本文主要以基于Web频繁遍历路径模式挖掘和零售业数据挖掘为实例,详细阐述了基于加权有向图(Weighted Directed Graph,WDG)的加权频繁遍历模式挖掘和加权强相关频繁模式挖掘的有关思想。
     图遍历模式挖掘一直是数据挖掘研究的热点之一,图及其遍历可广泛地模拟现实世界的多种数据访问形式,比较有代表性的实例就是Web路径访问,Web结构可被抽象为一个加权有向图:顶点代表网页或站点,有向边代表网页间的网页或站点间的超级链接,权值代表Web结构元素的不同重要性,然而传统的遍历模式挖掘算法仅仅考虑了非加权的遍历模式挖掘。为解决加权遍历模式挖掘问题,本文主要从以下几个方面做了深入研究:
     归纳了加权频繁遍历模式挖掘(Weighted Frequent Traversai Pattern Mining,WFTPM)的种类,将Web加权频繁遍历模式挖掘,从用户类型角度分为个体加权频繁遍历模式挖掘(IndividualWFTPM,IWFTPM)和整体加权频繁遍历模式挖掘(Holistic WFTPM,HWFTPM);从采取的挖掘方法角度分为普通遍历模式挖掘和序列遍历模式挖掘。
     提出了加权有向图模型,总结了加权有向图的种类,提出了两类加权有向图——顶点加权有向图(Vertex-Weighted Directed Graph,VWDG)和边加权有向图(Edge-Weighted Directed Graph,EWDG),并详细阐述了两类WDG间的转换方法,简化了基于加权有向图的频繁遍历模式挖掘问题,完善了基于图遍历的模式挖掘问题的理论基础。
     基于加权有向图模型,本文做了以下几方面工作。
     针对挖掘IWFTPM问题,提出了加权遍历模式间的一种新颖特性——可拓展特性,把对候选模式的剪枝问题转化为判断候选模式的可扩展性问题。基于加权模式间的可拓展特性,提出了一种基于加权有向图结构的频繁遍历模式挖掘算法——WFTPMiner(Weighted Frequent Traversal PatternMiner)算法,并提出了实现该算法的两种策略——EGTG(Evaluated by Global Topology of Graph)策略和ELTG(Evaluated by Local Topology of Graph)策略。
     当最小支持度阈值较小时,用算法WFTPMiner挖掘加权频繁模式将导致效率低下,为克服以上不足,提出了一种修订的加权支持度表示法,然后利用加权FP-tree模式增长方法,提出了两种高效的基于加权有向图的频繁遍历模式挖掘算法:CWFTPMiner(Closed Weighted Frequent TraversalPattern Miner)和WTMaxMiner(Weighted Traversal-based Maximal frequent pattern Miner),前者用于挖掘闭合加权频繁遍历模式,后者用于挖掘最大加权频繁遍历模式。此外,在实现这两种算法的过程中,详细分析了闭合约束和加权约束的不同结合顺序可能造成的信息丢失现象,提出了两种约束的正确结合顺序。
     对加权FP-tree模式增长算法的弊端和遍历路径中相关项目的连续性特点,把遍历模式看作是序列模式,提出了一种基于图遍历的加权序列模式挖掘算法WTSPMiner(Weighted Traversal-basedSequential Pattern Miner),该算法采用一种改进的加权前缀投影模式增长方法,运用分而治之策略,把挖掘原来加权序列数据库的任务分解成一组挖掘局部加权投影数据库的小任务,通过将加权约束添加到挖掘过程中,实现加权频繁序列遍历模式的挖掘。
     为解决HWFTP挖掘问题,提出了一种挖掘基于统计理论的加权频繁遍历模式挖掘算法SWFTPMiner(Statistical theory-based Weighted Frequent Traversal Pattern Miner),该算法利用统计理论中的置信度概念,首先清除掉原始遍历数据库T中带有噪声权值的“异常点”(Outlier),从而得到能反映整体绝大多数用户遍历行为的修订加权有向图G_r及遍历数据库T_r,然后具体提出了两种从修订的T_r中挖掘WFTPs的策略方法——逐层挖掘策略和分而治之挖掘策略。
     此外,本文以零售业为例,提出了一种加权强相关频繁模式挖掘算法WHFPMiner(WeightedHighly-correlated Frequent Pattern Miner),在算法中,提出了一种新的目标兴趣度度量标准——加权h-置信度,通过采用修订的加权支持度,证明了加权h-置信度具有反单调性和交叉加权支持性,最终基于加权FP-tree模式增长方法,利用加权h-置信度的两种特性快速地挖掘出包含有相似加权支持度水平的项目的加权频繁模式。
     广泛的实验性能分析表明,本文提出的算法是高效的和可伸缩的,能够有效的完成各自的加权频繁挖掘任务。更重要的是,本文中提到的很多算法具有广阔的应用领域,例如,基于加权有向图的频繁遍历模式挖掘算法可以很方便地应用到很多能够被建模为加权有向图的实际应用中。
With the development of information technology, especially emerging of the network technology, our, abilities to collect, store and transfer data have been improved dramatically, and the great amount of data from diverse fields has been generated. Data mining technology has increasingly become an essential tool for data analysis and decision support. Frequent pattern mining is an important problem in the data mining area with a wide range of applications and mining tasks such as Web mining, retailing data mining, association rules mining, sequential pattern mining, path analysis, and so on.
     Traditional methods for mining frequent patterns treat each item in TDB (Transaction Database) uniformly, while each item has usually different importance in the real applications. To solve this problem, this paper studies the problem of WFPM (Weighted Frequent Pattern Mining) whose principal characteristic is to assign each item of TDB a weight to reflect its importance in the mining process. In the last decade, the FPM algorithms have achieved a more mature theoretical foundation, however the WFPM and its applications are at an early stage of development, and it is worthy of in-depth study and exploration. The downward closure property (i.e., antimonotonicity) of the unweighted support measure in the weighted case no longer exists and previous algorithms cannot be applied directly. So the main focus in the WFPM algorithms is how to let weighted support or interesting measure satisfy the downward closure property so as to prune the search space efficiently. Taking Web-based frequent traversal path mining and the data mining in retailing for examples, this dissertation sets forth the related methods about mining weighted frequent traversal pattern based on WDG (Weighted Directed Graph) and weighted highly-correlated frequent pattern thoroughly.
     Data mining on graph traversals has been an active research during recent years. Graph and traversals on it are widely used to model several classes of accessing data in the real world, and the most representative example is Web path accessing. The structure of Web can be simulated by a WDG, in which the vertices represent the related web pages or websites, and every directed edge between two wertices stands for the hyperlink from one vertex to another one, and the weights of vertices or edges reflect their own different importance. However, traditional traversal patterns mining algorithms hardly considered weighted traversals. To solve the problem of weighted traversal pattern mining, the dissertation makes an intensive study as follows.
     The dissertation generalizes the categories of WFTPM (Weighted Frequent Traversal Pattern Mining). From the user-type perspective, Web weighted frequent traversal pattern mining can be divided into two classes: one is the IWFTPM (Individual WFTPM), the other is HWFTPM (Holistic WFTPM), and from the mining methods, into common TPM and sequential TPM.
     In addition, this dissertation proposes the WDG model and divides the categories of WDG into two types, i.e., VWDG (Vertex-Weighted Directed Graph) and EWDG (Edge-Weighted Directed Graph). Moreover, the transformational methods between VWDG and EWDG are presented in detail, which simplifies the problem of WFTPM based on WDG traversals and improves the theoretical basis of patterns mining based on graph traversals.
     Based on the WDG model, the thesis accomplishes the several following contributions.
     For the problem of IWFTPM, the dissertation devises a novel property among the weighted traversal patterns—scalable property, and converts candidate patterns' pruning problem into corresponding patterns' scalability-checking problem. Based on the scalable property among the weighed patterns, the paper puts forward a WDG structure-based frequent traversal pattern mining algorithm named WFTPMiner (Weighted Frequent Traversal Pattern Miner), and presents two strategies to implement the algorithm—strategy EGTG (Evaluated by Global Topology of Graph ) and strategy ELTG (Evaluated by Local Topology of Graph).
     When the minimum support threshold is lower, algorithm WFTPMiner is inefficient to mine weighted frequent patterns. To overcome this deficiency of WFTPMiner, a revised expression of weighted support is presented. Then, based on the weighted FP-tree pattern growth method, two efficient algorithms—CWFTPMiner (Closed Weighted Frequent Traversal Pattern Miner) and WTMaxMiner (Weighted Traversal-based Maximal frequent pattern Miner) is devised, among which the former is used to mine closed WFTPs and the latter is used to mine maximal WFTPs. In addition, in the process of implementing above two algorithms, the thesis analyzes the possible information loss case resulting from the wrong order of incorporating the closure constraint with weight constraint, and brings forward the right sequence of combination between two above-mentioned constraints.
     By analyzing the shortcoming of weighed FP-tree pattern growth method and the continuity property of the items in the traversal path, the paper regards a traversal pattern as a sequence pattern. Based on the above fact, a WDG-based weighted sequential pattern mining algorithm named WTSPMiner (Weighted Traversal-based Sequential Pattern Miner) is presented. This algorithm adopts an improved weighted prefix-projected pattern growth approach to decompose the task of mining original sequence database with weight constraint into a series of smaller tasks of mining locally weighted projected database with a 'divide-and-conquer' strategy. The algorithm pushes the weight constraint into the mining process to accomplish the mining of weighted frequent sequence traversal patterns.
     To solve the problem of HWFTPM, an algorithm called SWFTPMiner (Statistical theory-based Weighted Frequent Traversal Pattern Miner) is devised. Adopting the notion of confidence level in statistical theory, the algorithm first eliminates the outliers with noisy weights from the original traversal database T to get the revised WDG G_r and traversal database T_r which reflect the most users' traversal behaviors of the holistic. Then, based on the revised G_r and T_r, two mining strategies, respectively called level-wise mining strategy and divide-and-conquer mining strategy, are presented to mine the WFTPs from T_r in detail.
     In addition, taking the retailing as a example, the thesis brings forward an algorithm called WHFPMiner (Weighted Highly-correlated Frequent Pattern Miner), in which a new objective interesting measure called weighted h-confidence is developed to mine weighted frequent patterns with strong correlation. Adopting the revised weighted support, we prove that the weighted h-confidence has two properties, i.e., anti-monotone property and cross weighted support property. Based on the weighted FP-tree pattern growth method, the algorithm applies two properties of weighted h-confidence to mine the weighted frequent patterns involving items with similar level of weighted support quickly and effectively.
     The extensive performance analysis show that suggested approaches in this dissertation are efficient and scalable, and are competent for respective task of mining weighted frequent patterns. More importantly, most algorithms presented by the paper have a broad application fields, for example, WDG-based frequent pattern mining algorithms can be applied to various real applications whose characteristic can be simulated by the WDG model expediently.
引文
1.Han J,Kamber M.数据挖掘:概念与技术[M],第二版,范明,孟小峰译.北京:机械工业出版,2006.1-2
    2.Codd E F.A Relational model of data for large shared data banks[J].Commun.ACM,1970,13(6):377-387
    3.陈安,陈宁,周龙骧,等.数据挖掘技术及应用[M].北京:科学出版社,2006.7-10,289-291
    4.Piatesky-Shapiro G,ed.Proceedings of the AAAI-91 Workshop on Knowledge Discovery in Databases (KDD'91).Menlo Park,CA:AAAI Press,1991
    5.Fayyad U M,Uthurusamy R,Piatetsky-Shapiro G,eds.Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases(KDD'93).Menlo Park,CA:AAAI Press,1993.
    6.Fayyad U M,Uthurusamy R,eds.Proceedings of the AAAI-91 Workshop on Knowledge Discovery in Databases(KDD'94).Menlo Park,CA:AAAI Press,1994
    7.Fayyad U M,Uthurusamy R,eds.Proceedings of the First International Conference on Knowledge Discovery and Data Mining(KDD'95).Menlo Park,CA:AAAI Press,1995
    8.薛惠锋,张文宇,寇晓东,等.智能数据挖掘技术[M].西安:西北工业大学出版社,2005.12.
    9.Chen M,Han J,Yu P.Data mining:an overview from a database perspective[J].IEEE Trans.on knowledge and Data Engineering,1996,8(6):866-883
    10.周皓峰.关联规则挖掘的拓展性研究[D]:[博士学位论文].上海:复旦大学信息科学与工程学院,2003
    11.毛国君,段立娟,王实,等.数据挖掘原理与算法[M],北京:清华大学出版社,2005.37-42.
    12.Brachman R J,Anand T.The process of knowledge discovery in databases:a human-centered approach [C].In:Fayyad U M,Piatesky-Shapiro G,Smyth P,et al,eds.Advances in Knowledge Discovery and Data Mining.Menlo Park,CA:AAAI/MIT Press,1996.37-57
    13.朱明.数据挖掘[M].合肥:中国科学技术大学出版,2002.9-13,18-20
    14.Agrawal R,Imielinski T,Swami A.Mining association rules between sets of items in large databases [C].In:Buneman P,Jajodia S,eds.Proc.of 1993 ACM-SIGMOD Int.Conf.on Management of Data(SIGMOD'93).New York:ACM Press,1993.707-216.
    15.Agrawal R,Srikant R.Fast algorithms for mining association rules[C].In:Jorge B B,Jarke M,Zaniolo C,eds.Proc.of the 20~(th) Int.Conf.Very Large Data Bases(VLDB'94).San Francisco:Morgan Kaufmann,1994.487-499
    16.Park J S,Chen M-S,Yu P.An effective hash based algorithm for mining association rules[C].In:Michael J C,Donovan A S,eds.Proc.of the 1995 ACM SIGMOD Conference on Management of Data(SIGMOD'95).New York:ACM Press,1995.175-186
    17.Srikant R,Agrawal R.Mining generalized association rules[C].In:Dayal U,Peter M D,Nishio S,eds.Proc.of the 21~(st) Int.Conf.Very Large Data Flases(VLDB'95).San Francisco:Morgan Kaufmann,1995.407-419
    18. Savasere A. An efficient algorithm for mining association rules in large databases[C]. In: Dayal U, Peter M D, Nishio S, eds. Proc. of the 21~(st) Int. Conf. Very Large Data Bases (VLDB'95). San Fran- cisco: Morgan Kaufmann ,1995.432—444
    
    19. Zaki M J. Generating non-redundant association rules[C]. In: Raymond N, eds. Proc. of the 6~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'OO), New York: ACM Press.2000.34—43
    
    20. Liu B, Hsu W, Ma Y. Mining association rules with multiple minimum supports. In: Zaki M J, Ho C, eds. Proc. of the 5~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'OO), New York: ACM Press, 1999.337—341
    
    21. Han J, Fu Y. Mining multiple-level association rules in large databases [J]. IEEE Transactions on Knowledge and Data Engineering, 1999,11(5):798—805
    
    22. Das A, Ng W, Woon Y. Rapid association rule mining[C]. In: Proc. of the 10~(th) ACM International Conference on Information and Knowledge Management (CIKM'01). New York: ACM Press,2001.474—481
    
    23. Agrawal R, Srikant R. Mining sequential patterns[C]. In: Yu P S , Chen A L P,eds. Proc. of the 11~(th) International Conference on Data Engineering (ICDE'95). Washington, DC, USA: IEEE Computer Society, 1995.3—14
    
    24. Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. In: Peter M G A, Mokrane B, Georges G, eds. Proc. 5~(th) Int. Conf. Extending Database Technology (EDBT'96). Heidelberg: Springer, 1996.3—17
    
    25. Zaki M J. SPADE: an efficient algorithm for mining frequent sequences [J]. Machine Learning. 2001,42(1-2), 2001.31—60
    
    26. Ayres J, Gehrke J, Yiu T, et al. Sequential pattern mining using a bitmap representation[C]. In: Zaki M J , Wang J T, Toivonen H,et al, eds. Proc. of the 8~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM Press, 2002. 429—435
    
    27. Cheng H, Yan X, Han J. IncSpan: incremental mining of sequential patterns in large databases[C]. In: Kim W, Kohavi R, Gehrke J et al,eds. Proceedings of the 10~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). New York: ACM Press, 2004.527—532
    
    28. Han J, Pei J, Asi B M, et al. FreeSpan: frequent pattern-projected sequential pattern mining[C]. In: Proc. of the 6~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'OO). New York: ACM Press, 2000.355—359
    
    29. Han J, Pei J, Asi B M, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C]. In: Young DC, ed. Proc. of the 17~(th) International Conference on Data Engineering (ICDE'01). Washington, DC, USA: IEEE Computer Society, 2001.215—224
    
    30. Han J, Pei J, Asi B M, et al. Mining sequential patterns by pattern-growth: the prefixspan approach [J]. IEEE Transactions on Knowledge and Data Engineering,2004,16(10), 2004: 1424—1440
    
    31. Chiu D, Wu Y, Chen A L. An efficient algorithm for mining frequent sequences by a new strategy without support counting[C]. In: Proc. of the 20~(th) International Conference on Data Engineering (ICDE'04). Washington, DC, USA: IEEE Computer Society, 2004.375—386
    32.Pei J,Han J,Wang W.Mining sequential patterns with constraints in large databases[C].In:Proc.of the 11~(th) ACM International Conference on Information and Knowledge Management(CIKM'02).New York:ACM Press,2002.18-25
    33.Kim C,Lim J H,Ng R,et al.SQUIRE:Sequential Pattern Mining with Quantities[C].In:Proc.of the 20~(th) International Conference on Data Engineering(ICDE'04).Washington,DC,USA:IEEE Computer Society,2004.827-827
    34.Yang J,Yu P S,Wang W et al.Mining long sequential patterns in a noisy environment.In:Proc.of the 2002 ACM SIGMOD International Conference on Management of Data(SIGMOD'02).New York:ACM Press,2002.406-417
    35.Wang K,Xu Y,Yu J X.Scalable sequential pattern mining for biological sequences[C].In:Grossman D,Gravano L,Zhai C,et al,eds.Proc.of the 2004 ACM CIKM International Conference on Information and Knowledge Management(CIKM'04).New York:ACM Press,2004.178-187
    36.Seno M,Karypis G.SLPMiner:an algorithm for finding frequent sequential patterns using length-decreasing support constraints[C].In:Perner P,ed.Proc.of the 2~(nd) IEEE International Conference on Data Mining(ICDM'02).Heidelberg:Springer,2002.418-425
    37.Chou P A.Optimal partitioning for classification and regression trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1991,13(4):340-354
    38.Murthy S.Automatic construction of decision trees from data:a multi-disciplinary survey[J].Data Mining and Knowledge Discovery,1998,2(4):345-389
    39.周傲英,钱卫宁.从多角度分析现有聚类算法.软件学报,2002,13(8):1382-1394.
    40.Knorr E,Ng R.Algorithms for mining distance-based outliers in large datasets[C].In:Gupta A,Shmueli O,Widom J,eds.Proc.of the 24~(th) Int.Conf.of Very Large Data Bases(VLDB'98).San Francisco,CA:Morgan Kaufmann,1998.392-403
    41.Mannila H,Toivonen H,Verkamo A.Discovery of frequent episodes in event sequences[J].Data Mining and Knowledge Discovery,1997,1(3):259-289
    42.Yang J,Wang W,Yu P S.Mining asynchronous periodic patterns in time series data.IEEE Trans.Knowledge Data Engineering,2003,15(3):613-628
    43.Yang J.,Wang W,Yu P S.Mining Surprising Periodic Patterns,Data Mining and Knowledge Discovery.9(2).2004:189-216.
    44.Assfalg J,Kriegel H-P,Kr(o|¨)ger P,et al.Similarity search on time series based on threshold queries.In:IoannidisY E,Scholl M H,Schmidt J W,et al,eds.Proc.of 10~(th) Int.Conf.on Extending Database Technology(EDBT'06).Heidelberg:Springer,2006.276-294
    45.孙茂艳,谢康林.基于客户关系属性的市场营销数据挖掘.计算机工程与应用,2005,41(18):215-218
    46.李莉平.数据挖掘技术在现代市场营销中的应用[J],云南则经大学学报,2006,22(5):40-45
    47.W Lin,Alvarez S A,Ruiz C.Efficient adaptive-support association rule mining for recommender systems[J].Data Mining and Knowledge Discovery,2002,6(1):63-105
    48.邢东山,沈钧毅,宋擒豹.从Web日志中挖掘用户浏览偏爱路径.计算机学报,2003,26(11):1515-152
    49.Perkowitz M,Etzioni O.Adaptive web sites:conceptual cluster mining[C].In:Proc.of the 6~(th) Joint Int.Conf.on Artificial Intelligence(IJCAI'99).San Francisco:Morgan Kaufrnann,1999.264-269
    50.Tauscher L,Greenberg S.How people revisit web pages:empirical findings and implications for the design of history systems[J].International Journal of Human Computer Studies,Special issue on World Wide Web Usability,1997,47(1),1997:97-137+41
    51.Srivastava J,Cooley R,Deshpande M,et al.Web usage mining:discovery and applications of usage patterns from web data.SIGKDD Explorations,20001(2):12-23
    52.郭维.Web日志挖掘中GITC算法的改进[J].计算机工程,2008,34(4):60-62
    53.房伟,逢玉俊.入侵检测中孤立点挖掘的方法研究[J].沈阳大学学报,2006,18(2):40-43
    54.Hsu C,Chen C,Liu B,et al.Identification of hot regions in protein-protein interactions by sequential pattern mining[J/OL],http://www.biomedcentral.com/1471-2105/8/S5/S8:1-15
    55.Tan P,Vipin Kumar M S.数据挖掘导论[M],范明,范宏建译.北京:人民邮电出版,2006.2-3
    56.Han J,Cheng H,Xin D.Frequent pattern mining:current status and future directions[J].Data Mining and Knowledge Discovery,2007,15(1):55-86.
    57.Goethals B,Zaki M J.Workshop on frequent itemset mining implementations.In:Goethals B,Zaki M J,eds.Proc.of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations(FIMI'03).Europe:CEUR-WS.org,2003.1-13
    58.Grahne G,Zhu J.Efficiently using prefix-trees in mining frequent itemsets[C].In:Goethals B,Zaki M J,eds.Proc.of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations(FIMI'03).Europe:CEUR-WS.org,2003.123-132
    59.Han J,Pei J,Yin Y.Mining frequent patterns without candidate generation[C].In:Chen W,Naughton J F,Bernstein P A,eds.Proc.of the 2000 ACM SIGMOD International Conference on Management of Data(SIGMOD'00).New York:ACM Press,2001.1-12
    60.Han J,Pei J.Mining frequent patterns by pattern-growth:methodology and implications[J].SIGKDD Explorations,2000,2(2):14-20
    61.Han J,Pei J,Yin Y,et al.Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J].Data Mining and Knowledge Discovery,2004,8(1):53-87
    62.Pei J,Han J.H-Mine:hyper-structure mining of frequent pattems in large databases[C].In:Perner P,ed.Proc.of the 1~(st) IEEE International Conference on Data Mining(ICDM'01).Leipzig,Germany:IBal Publishing Report,2001.441-448
    63.Grahne G,Zhu J.Fast algorithms for frequent itemset mining using FP-trees[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(10):1347-1362
    64.石巍,傅彦.基于FP-参考树/表的频繁模式挖掘算法,计算机科学,33(6)2006:206-209
    65.Liu G,Lu H,Xu Y,et al.Ascending frequency ordered prefix-tree:efficient mining of frequent patterns [C].In:Lee Y,Li J,Whang K,et al,eds.Proc.of the 8~(th) International Conference on Database Systems for Advanced Applications(DASFAA '03).Heidelberg:Springer,2003.65-72
    66.Agarwal R C,Aggarwal C C,Prasad V.A tree projection algorithm for finding frequent itemsets[J].Journal on Parallel and Distributed Computing,2001,61(3):350-371
    67.Seno M,Karypis G.LPMiner:an algorithm for finding frequent itemsets using length-decreasing support constraint[C].In:Perner P,ed.Proc.of the 1~(st) IEEE International Conference on Data Mining (ICDM'01).Leipzig,Germany:IBal Publish ing Report,2001.505-512
    68.Liu J,Pan Y,Wang K,et al.Mining frequent item sets by opportunistic projection[C].In:Zaki M J,Wang J T,Toivonen H,et al,eds.Proc.of the 8~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'02).New York:ACM Press,2002.229-238
    69.Yen S,Chen A L,A graph-based approach for discovering various types of association rules[J].IEEE Transactions on Knowledge and Data Engineer,2001,13(5):839-845
    70.郭秀娟.基于关联规则数据挖掘算法的研究[D]:[博士学位论文].吉林:吉林大学建设工程学院,2004
    71.梁吉业,王俊红.基于概念格的规则产生集挖掘算法[J].计算机研究与发展,2004,41(8):1340-1346
    72.Kim W Y,Lee Y K,Han J.CCMine:efficient mining of confidence-closed correlated patterns[C].In:Dai H,Srikant R,Zhang C,eds.Proc.of the 8~(th) Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD'04).Heidelberg:Springer,2004.569-579
    73.Brin S,Motwani R,Silverstein C.Beyond market baskets:generalizing association rules to correlations [C].In:Peckham J,ed.Proc.of the 1997 ACM SIGMOD International Conference on Management of Data(SIGMOD'97).New York:ACM Press,1997.255-264
    74.Xiong H,Tan P N,Kumar V.Mining strong affinity association patterns in data sets with skewed support distribution[C].In:Proceedings of the 3~(rd) IEEE International Conference on Data Mining (ICDM'03).Washington,DC,USA:IEEE Computer Society,2003.387-394
    75.Xiong H,Tan P N,Kumar V.Mining hyperclique patterns with confidence pruning[R/OL].http://www.communitylens.org/tech_reports_upload/tr2003/03-006.pdf
    76.Huang Y,Xiong H,Wu W.A hybrid approach for mining maximal hyperclique patterns[C].In:Proc.of 16~(th) IEEE International Conference on Tools with Artificial Intelligence(ICTA'04).Washington,DC,USA:IEEE Computer Society,2004.354-361
    77.Kim W Y,Lee Y K,Cai Y D,et al.CoMine:efficient mining of correlated patterns[C].In:Proceedings of the 3~(rd) IEEE International Conference on Data Mining(ICDM 2003).Washington,DC,USA:IEEE Computer Society,2003.581-584
    78.Xiong H,Tan P N,Kumar V.Hyperclique pattern discovery[J].Data Mining and Knowledge Discovery,2006,13(2):219-242
    79.Omiecinski E R.Alternative interest measures for mining associations in databases[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(1):57-69
    80.吕橙,易艳红.一种挖掘Web用户访问模式的新办法MFP[J].小型微型计算机系统,2006,27(5):919-294
    81.徐承胜,陆玉昌.Web使用挖掘技术研究[J].小型微型计算机系统,2004,6(7):1177-1184
    82.陈晴光.基于Web访问信息挖掘的商业智能发现研究,计算机工程与设计,29(6),2008:1413-1416
    83.Cheung D W-L,Ng V,Fu A W-C,et al.Efficient mining of association rules in distributed databases [J].IEEE Transactions On Knowledge and Data Engineering,1996,8(6):911-922
    84.Srivastava J,Cooley R,Deshpande M,et al.Web usage mining:discovery and applications of usage patterns from web data[J],SIGKDD Explorations,2000,1(2):12-23
    85. Pei J, Han J, Mortazavi-Asl B, et al. Mining access patterns efficiently from Web Logs[C]. In: Terano T, Liu H, Chen A L P, eds. Proc. of the 4~(th) Pacific Asia Conference on Knowledge Discovery and Data Mining, (PAKDD'OO). Heidelberg: Springer, 2000.396-407
    
    86. Eirinaki M, Vazirgiannis M. Web mining for web personalization [J]. ACM Transactions on Internet Technology[J], 2003,3(1):1—27
    
    87. Kleinberg J M, Tomkins A. Application of linear algebra in information retrieval and hypertext analysis[C]. In: Proc. of the 18~(th) ACM Symp. On Principles of Database Systems (PODS'99). New York: ACM Press, 1999:185—193
    
    88. Jin X, Zhou Y, Mobasher B. Web usage mining based on probabilistic latent semantic analysis[C]. In: Kim W, Kohavi R, Gehrke J, et al, eds. Proc. of the 10th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (KDD'04), New York:ACM Press,2004.197—205
    
    89. David Gibson and Jon Kleinberg and Prabhakar Raghavan, Inferring Web communities from link to- pology[C].In: Akscyn R, ed. Proc. of the 9th ACM Conference on Hypertext and Hypermedia: Links, Objects, Time and Space-Structure in Hypermedia Systems (HYPERTEXT'98).New York:ACM Press, 1998.225—234
    
    90. Kleinberg J M. Authoritative sources in a hyperlinked environment [J]. Journal of the ACM, 1999,46(5):604—632
    
    91. Chakrabarti S, Dom B E, Kumar S R, et al. Mining the Web's link structure [J]. Com- puter,1999,32(8):60—67
    
    92. Madria S, Bhowmick S, Ng W, et al. Research issues in web data mining[C]. In: Mohania M K, Tjoa A Min, eds.Proc. of the 1~(st) International Conference on Data Warehousing and Knowledge Discovery ( DaWaK'99). Heidelberg: Springer, 1999.303—312
    
    93. Joshi K, Joshi A, Yesha Y. Warehousing and mining Web logs[C]. In: Shahabi C, ed. Proc. of the 2~(nd) International Workshop on Web Information and Data Management (WIDM'99). New York:ACM Press, 1999.63—68
    
    94. Zaiane O, Xin M, Han J. Discovering web access patterns and trends by applying OLAP and data mining technology on web logs[C]. In: Proc. of the IEEE Forum on Research and Technology Advances in Digital Libraries (IEEE ADL'98). Washington, DC, USA: IEEE Computer Society, 1998.19—29
    
    95. Schechter S, Krishnan M, Smith M D. Using path profiles to predict HTTP requests [J]. Computer Networks and ISDN Systems, 1998,30(1):457—467
    
    96. Chen M S, Park J S, Yu P S. Efficient data mining for path traversal patterns [J]. IEEE Transactions on Knowledge and Data Engineering. 1998,10(2): 209—221
    
    97. Chen M S, Park J S, Yu P S. Data mining for path traversal patterns in a web environment [J]. In: Proceedings of the 16~(th) International Conference on Distributed Computing Systems (ICDCS'96). Washington, DC, USA: IEEE Computer Society, 1996. 385—392
    
    98. Xing D, Shen J. Efficient data mining for web navigation patterns [J]. Information and Software Technology ,2004,46( 1 ):55—63
    
    99. Nanopoulos A, Manolopoulos Y. Mining patterns from graph traversals [J]. Data and Knowledge Engineering, 2001, 37(3): 243—266
    100.Nanopoulos A,Manolopouios Y.Finding generalized path patterns for web log data mining[C].In:Stuller J,Pokorny,J,Thalheim B,et al,eds.Proc.of Current Issues in Databases and Information Systems,East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications (ADBIS-DASFAA'00).Heidelberg:Springer,2000.215-228
    101.颜跃进.最大频繁项集挖掘算法的研究[D]:[博士学位论文].长沙:国防科技大学计算机学院,2005
    102.Burdick D,Calimlim M,Gehrke J.MAFIA:a maximal frequent itemset algorithm for transactional databases[C].In:Young DC,ed.Proc.of the 17~(th) international conference on data engineering (ICDE'01).Washington,DC,USA:IEEE Computer Society,2001.443-452
    103.Roberto J,Bayardo R J.Efficiently mining long patterns from databases[C].In:Haas L M,Tiwary A,eds.Proc.of the 1998 ACM-SIGMOD International Conference On Management Of Data (SIGMOD'98).New York:ACM Press,1998.85-93
    104.Gouda K,Zaki M J.Efficiently mining maximal frequent itemsets[C].In:Perner P,ed.Proc.of the 1~(st)IEEE International Conference on Data Mining(ICDM'01).Leipzig,Germany:IBaI Publishing Report,2001.163-170
    105.Yan Y,Li Z,Wang T,et al.Mining maximal frequent itemsets using combined FP-Tree[C].In:Webb G I,Yu X,eds.Proc.of the 17~(th) Australian Conference on Artificial Intelligence(AI'2004).Heidelberg:Springer,2004.475-487
    106.Grahne G,Zhu J.High performance mining of maximal frequent itemsets[C].In:Proc.of the 6~(th)SIAM International Workshop on High Performance Data Mining:Pervasive and Data Stream Mining,Philadelphia,USA:SIAM Press,2003.135-143
    107.Yang G.The complexity of mining maximal frequent itemsets and maximal frequent patterns[C].In:Kim W,Kohavi R,Gehrke J,et al,eds.Proc of the 10~(th) ACM SIGKDD International Conference On Knowledge Discovery In Databases(KDD'04).New York:ACM Press,2004.344-353
    108.Ramesh G,Maniatty WA,Zaki M J.Feasible itemset distributions in data mining:theory and application [C].In:Proc.of the 22~(nd) ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems(PODS'03).New York:ACM Press,2003.284-295
    109.宋余庆,朱玉全,孙志辉,等.基于FP-Tree的最大频繁项集挖掘及其更新算法[J].软件学报,2003,14(9):1586-1592
    110.Cong G,Tan K L,Anthony K H,et al.Mining frequent closed patterns in microarray data[C].In:Proc.of the 4~(th) IEEE International Conference on Data Mining(ICDM 2004).Washington,DC,USA:IEEE Computer Society,2004.363-366
    111.Pasquier N,Bastide Y,Taouil R,et al.Discovering frequent closed itemsets for association rules[C].In:Beeri C,Buneman P,eds.Proc.of the 7~(th) International Conference on Database Theory(ICDT'99).Heidelberg:Springer,1999.398-416
    112.Pei J,Han J.CLOSET:an efficient algorithm for mining frequent closed itemsets[C].In:Gunopulos D,Rastogi R,eds.Proc.of the 5~(th) ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery(DMKD'00).New York:ACM Press,2000.21-30
    113.Wang J,Han J,Pei J.CLOSET+:searching for the best strategies for mining frequent closed itemsets [C].In:Getoor L,Senator T E,Domingos P,et al,eds.Proc.of the 9~(th) ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining (KDD'03). New York: ACM Press, 2003.236—245
    
    114. Pudi V, Haritsa J R. Generalized closed itemsets for association rule mining [C]. In: Dayal U,Ramamritham K, Vijayaraman T M, eds. Proc. of the 19~(th) International Conference on Data Engineering (ICDE'03). Washington, DC, USA: IEEE Computer Society, 2003.714—716
    
    115. Zaki M J. CHARM: an efficient algorithm for closed itemset miningfC]. In: Grossman R L, Han J,Kumar V,et al, eds. Proc. of the 2~(nd) SIAM International Conference on Data Mining (SDM'02). Philadelphia ,USA: SIAM Press, 2002.457—473
    
    116. Han J, Wang J, Lu Y, et al. Mining top-k frequent closed patterns without minimum support[C]. In: Perner P, ed. Proc. of the 2~(nd) IEEE International Conference on Data Mining (ICDM'02). Heidelberg:Springer, 2002.211—218
    
    117. Wang J, Han J, Lu Y, et al.TFP: an efficient algorithm for mining top-k frequent closed itemsets [J]. IEEE Transactions on Knowledge and Data Engineering, 2005,17(5):652—664
    
    118. Tzvetkov P, Yan X, Han J.TSP: mining top-k closed sequential patterns[C]. In: Proc. of the 3~(rd) IEEE International Conference on Data Mining (ICDM'03). Washington, DC, USA: IEEE Computer Society, 2003. 347—354
    
    119. Bonchi F, Lunnhese C. On closed constrained frequent pattern mining[C]. In: Proc. of the 4~(th) IEEE International Conference on Data Mining (ICDM 2004). Washington, DC, USA: IEEE Computer Society, 2004.35—42
    
    120. Gade K, Wang J, Karypis G. Efficient closed pattern mining in the presence of tough block con-straints[C]. In: Kim W, Kohavi R, Gehrke J et al,eds. Proceedings of the 10~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). New York: ACM Press, 2004.138—147
    
    121. Jia L, Pei R, Pei D.Tough constraint-based frequent closed itemsets mining[C]. In: Proc. of the 2003 ACM Symposium on Applied Computing (SAC'03). New York: ACM Press, 2003.416—420
    
    122. Tan P N, V Kumar, Srivastava J. Selecting the right interestingness measure for association patterns[C]. In: Zaki M J , Wang J T, Toivonen H,et al, eds. Proc. of the 8~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02). New York: ACM Press, 2002.32—41
    
    123. Steinbach M, Tan P N, Xiong H, et al.Generalizing the notion of support [C]. In: Kim W, Kohavi R, Gehrke J et al,eds. Proc. of the 10~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). New York: ACM Press, 2004.689—694
    
    124. Xiong H, Shekhar S, Tan P N, et al. Exploiting a support-based upper bound of pearson's correlation coefficient for efficiently identifying strongly correlated pairs[C]. Proc. of the 10~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). New York: ACM Press, 2004.334—343
    
    125. Wang W, Yang J, Yu P S. Efficient mining of weighted association rules (WAR)[C]. In: Raymond N,eds. Proc. of the 6~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00), New York: ACM Press, 2000. 270—274
    126.Tao F,Murtagh F,Farid M.Weighted association rule mining using weighted support and significance framework[C].In:Getoor L,Senator T E,Domingos P,et al,eds.Proc.of the 9~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'03).New York:ACM Press,2003.661-666
    127.Cai C H,Fu AWC,Cheng W C,et al.Mining association rules with weighted items[C].In:Eaglestone B,Desai B C,Shao J,eds.Proc.of the 2~(nd) International Database Engineering and Applications Symposium(IDEAS'98).Washington,DC,USA:IEEE Computer Society,1998.68-77
    128.段军,戴居丰.基于多支持度的挖掘加权关联规则算法[J].天津大学学报,2006,39(1):114-118
    129.陆建江.加权关联规则挖掘算法的研究[J].计算机研究与发展,2002,39(10):1281-1286
    130.Yun U,Leggett J J.WLPMiner:weighted frequent pattern mining with length-decreasing support constraints[C].In:Ho T B,Cheung D W,Liu H,eds.Proc.of the 9~(th) Pacific-Asia Conference of Advances in Knowledge Discovery and Data Mining(PAKDD'05).Heidelberg:Springer,2005.555-567
    131.Seno M,Karypis G.Finding frequent patterns using length-decreasing support constraints[J].Data Mining and Knowledge Discovery,2005,10(3):197-228
    132.Zaki M J.Efficiently mining frequent trees in a forest[C].In:Zaki M J,Wang J T,Toivonen H,et al,eds.Proc.of the 8~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'02).New York:ACM Press,2002.71-80
    133.Inokuchi A,Washia T,Motoda H.An apriori-based algorithm for mining frequent substructures from graph data[C].In:Zighed D A,Komorowski H J,Zytkow J M,eds.Proc.of the 4~(th) European Conference on Principles of Data Mining and Knowledge Discovery(PKDD'04).Heidelberg:Springer,2000.13-23
    134.Washio T,Motoda H.State of the art of graph-based data mining[J].SIGKDD Explorations,2003,5(1):59-68
    135.Agarwal RC,Aggarwal CC,Prasad VVV.Depth first generation of long patterns.In:Raymond N,eds.Proc.of the 6~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00),New York:ACM Press,2000.108-118
    136.Liu G,Lu H,Yu J X.AFOPT:an efficient implementation of pattern growth approach[C].In:Goethals B,Zaki M J,eds.Proc.the ICDM 2003 Workshop on Frequent Itemset Mining Implementations (FIMI'03).Europe:CEUR-WS.org,2003
    137.Zaki M J,Gouda K.Fast vertical mining using diffsets[C].In:Getoor L,Senator T E,Domingos P,et al,eds.Proc.of the 9~(th) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'03).New York:ACM Press,2003.326-335
    138.Zaki M J,Parthasarathy S,Ogihara M,et al.New algorithms for fast discovery of association rules[C].In:Heckerman D,Mannila H,Pregibon D,eds.Proc.of the 3~(rd) ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD'97).New York:ACM Press,1997.283-286
    139.Leung C,Ng R,Mannila H.Ossm:A segmentation approach to optimize frequency counting.In:Proc.of the 18th International Conference on Data Engineering(ICDE'02).Washington,DC,USA:IEEE Computer Society,2002.583-592
    140.Bastide Y,Taouil R,Pasquier N,et al.Mining frequent patterns with counting inference[J].SIGKDD Exploration,2000,2(2):66-75
    141.Yun U,Leggett J J,Ong T.Mining weighted sequential patterns based on length-decreasing support constraints[C].In:Mehrotra S,Zeng DD,Chen H,et al,eds.Proc.of IEEE International Conference on Intelligence and Security Informatics(ISI'06).Heidelberg:Springer,2006.650-651
    142.Yun U,Leggett J J.WFIM:weighted frequent itemset mining with a weight range and a minimum weight[C].In:Proc.of the 5~(th) SIAM International Conference on Data Mining(SDM'05).April Philadelphia,USA:SIAM Press,2005.636-640
    143.Yun U,Leggett J J.WSpan:Weighted Sequential pattern mining in large sequence databases[C].In:Proc.Of the 3~(rd) International IEEE Conference on Intelligent Systems(IS'06).Washington,DC,USA:IEEE Computer Society,2006.512-517
    144.欧阳继红,王仲佳,刘大有.具有动态加权特性的关联规则算法[J].吉林大学学报(理学版)[J].2005,43(3):314-319
    145.王运鹏,胡修林,阮幼林.一种最大频繁模式的快速挖掘算法[J].计算机应用研究,2006,23(10):86-88
    146.吴瑞,张秀玲.基于FLAAT的加权偏爱模式的挖掘算法[J].计算机工程与应用,2005,41(19):82-184
    147.伊卫国,卫金茂,王名扬.基于项目集加权的增量关联规则算法研究[J].计算机工程与应用,2004,40(34):192-194
    148.刘缵武.应用图论[M].长沙:国防科技大学出版社,2006.130-160
    149.殷剑红,吴开亚.图论及其算法[M].合肥:中国科学技术大学出版社,2004.207-240
    150.Garofalakis M,Rastogi R,Shim K.SPIRIT:Sequential pattern mining with regular expression constraints [C].In:Atkinson M P,Orlowska M E,Valduriez P,et al.,eds.Proc.the 25th International Conference on Very Large Data Bases(VLDB'99).San Francisco:Morgan Kaufmann,1999.223-234
    151.Albert-Lorincz H,Boulicaut J F.Mining frequent sequential patterns under regular expressions:a highly adaptive strategy for pushing constraints[C].In:Barbara D,Kamath C,eds.Proc.of the 3~(rd)SIAM International Conference on Data Mining(SDM'03).Philadelphia,USA:SIAM Press,2003.316-320
    152.沈恒范.概率论与数理统计教程[M].北京:高等教育出版,1995.179-210
    153.Geng R,Dong X,Liu G,et al.Efficiently mining maximal frequent patterns from traversals on weighted directed graph using statistical theory[C].In:Proc.of the 5~(th) International Conference on Fuzzy Systems and Knowledge Discovery(FSKD'08),Washington,DC,USA:IEEE Computer Society (In press)
    154.Geng R,Xu W and Dong X.SWSPMiner:Efficient Mining of Weighted Sequential Patterns from Traversals on Weighted Directed Graph Using Statistical Theory[C].In:Proc.of the 7~(th) International Conference on Distributed Computing and Applications for Business,Engineering and Sciences (DCABES'08),Publishing House of Electronics Industry,2008.48-491
    155.Cheung YL,Fu AW.Mining frequent itemsets without support threshold:with and without item con- straints[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(9):1052-1069
    156.Wang K,He Y,Cheung D,Chin Y,Mining confident rules without support requirement[C].In:Proc.of the 10~(th) ACM International Conference on Information and Knowledge Management(CIKM'01).New York:ACM Press,2001.89-96
    157.Cohen E,Datar M,Fujiwara S,Gionis A,et al.Finding interesting associations without support pruning [C].In:Proc.of the 16~(th) International Conference on Data Engineering(ICDE'00).Washington,DC,USA:IEEE Computer Society,2000.489-499
    158.Bayardo R,Agrawal R,Gunopulous D.Constraint-based rule mining in large,dense databases[C].In:Proc.of the 15~(th) International Conference on Data Engineering(ICDE'99).Washington,DC,USA:IEEE Computer Society,1999.188-197
    159.Grahne G,Lakshmanan LVS,Wang X,Efficient mining of constrained correlated sets[C].In:Proc.of the 16~(th) International Conference on Data Engineering(ICDE'00).Washington,DC,USA:IEEE Computer Society,2000.512-521
    160.Wang K,He Y,Han J.Mining frequent itemsets using support constraints[C].In:Abbadi A E,Brodie M L,Chakravarthy S,et al,eds.Proc.of the 26~(th) International Conference on Very Large Data Bases(VLDB'00).San Francisco:Morgan Kaufmann,2000.43-52
    161.孙惠泉.图论及其应用[M].北京:科学出版社,2006.150-170

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700