用户名: 密码: 验证码:
数据流模式挖掘算法及应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着各行业对数据越来越重视和信息技术的快速发展,产生的数据越来越全面,同时数据量也在快速的增长;并且各行业又要求能及时对已产生的数据进行挖掘和分析,这使得数据流挖掘技术愈发重要。由于数据流具有海量性、实时性和动态变化性的特点,这就要求数据流上的挖掘算法有较高的时空效率。尽管数据流上数据挖掘技术取得了一定的进展,但是挖掘算法的时空效率仍然是当前数据挖掘领域中的研究焦点之一。
     本文主要研究了数据流模式挖掘算法,包括传统数据集类型中的频繁模式挖掘以及大数据集下的频繁模式挖掘、不确定数据流中的频繁模式挖掘、和高效用模式挖掘。本文首先对已有的频繁模式和高效用模式挖掘算法进行了回顾,详细的介绍了算法Apriori和FP-Growth等;然后在对典型的挖掘算法和最新研究成果进行分析研究的基础上,深入研究了传统数据中的频繁模式挖掘、不确定数据上的频繁模式挖掘和具有效用值的数据中的高效用模式挖掘算法。本文取得了如下的创新性研究成果:
     (1)在传统数据的频繁模式挖掘算法研究中,提出新的尾节点数据结构和一种最多两次MapReduce的并行挖掘算法。针对数据流中的频繁模式挖掘问题,采用尾节点和尾节点表来提高窗口内数据更新的时间效率和维护的空间效率;并通过提高窗口内频繁模式挖掘算法的时间效率,进而提高数据流中模式挖掘的整体时间效率。针对大数据下的数据流频繁模式挖掘问题,首先通过一次MapReduce找到局部频繁模式做为候选项集,然后通过给出的剪枝策略对候选项集进行剪枝,最后进行第二次MapReduce对候选项集中剩余项集进行支持数统计;在多数情况下,该算法不需要第二次MapReduce就可以有效的挖掘到所有的频繁模式。
     (2)在不确定事务数据的频繁模式挖掘算法研究中,提出具有更高压缩率的树结构来改进不确定数据集及数据流上的频繁模式挖掘算法。首先利用数组来存储事务项集的概率,然后将事务概率在数组中的索引和事务项集映射到一棵树上,从而可以有效的降低维护不确定数据集的树节点个数。在此基础上,结合滑动窗口技术,同时给出两种新的树结构分别来维护窗口中数据和挖掘过程中的子数据集,保证在挖掘的过程中使窗口中事务项集的信息不会从树上丢失;从而使频繁模式挖掘算法的时空效率得到较大的提升。另外,本文还提出一种新的具有权重的频繁模式挖掘模型和算法;该模型主要是将项的权重值引入到频繁模式的挖掘过程中,将权重值大的模式考虑到挖掘结果中。
     (3)在高效用模式挖掘算法研究中,提出避免使用高估效用值的不产生候选项集的挖掘算法。首先本文提出一个新的树结构来维护事务项集及效用值信息,通过该树结构可以得到项集的准确效用值,而不是高估效用值,从而保证不通过候选项集就可以挖掘到所有的高效用模式,因此可以提高算法的时空效率。在此基础上,结合滑动窗口技术,同时给出一个新的树结构维护窗口中数据,可以使算法通过一遍数据集扫描,在不产生候选项集的前提下就可从数据流中挖掘高效用模式。相对KDD会议和TKDE期刊上最新发表论文UP-Growth算法,新提出的算法的时间效率提高1到2个数量级。
Along with the rapid development of information science and the explosive accumulation of industrial data, data mining and analyzing is attracting more and more concern from all areas of industry. The demand for timely processing and analyzing of ongoing data makes it an important research topic that focuses on the mining techniques on data streams. Because data streams are massive, real-time and volatile, mining algorithms on data streams should be more efficient on both space and time. A number of valuable academic works have been performed on data mining over data streams, yet the improvement of mining efficiency is still a scientific hotspot that needs further research.
     This dissertation focuses on pattern mining algorithms over data streams, including frequent pattern mining over traditional datasets, frequent pattern mining over big data datasets, frequent pattern mining over uncertain data streams, and high utility pattern mining. Firstly we reviewed existing algorithms on frequent pattern mining and high utility pattern mining, with detailed introduction for Apriori and FP-Growth algorithm; then based on the study of classical mining algorithms and state of the art research works, performed in-depth researches on frequent pattern mining over traditional datasets, frequent pattern mining over uncertain datasets, and high utility pattern mining over datasets. Innovative research achievements of this thesis are presented as the following.
     (1) In the study on algorithms of mining frequent pattern over traditional datasets, this thesis proposes a tail-node data structure and a parallel algorithm with no more than two rounds of MapReduce. For the problem of frequent patter mining over data streams, tail-node&tail-node table are utilized in a sliding window approach to improve the time efficiency of data updating within the window, as well as the space efficiency of data structure maintenance; together with strategies to improve the time efficiency of within-window frequent pattern mining process, the overall performance of the mining algorithm is improved.
     For the problem of mining frequent patterns over streams of big data:firstly, one round of MapReduce is performed to find local frequent patterns as candidate itemset; secondly, a pruning strategy is utilized to prune the candidates; finally a second MapReduce is performed to count the support numbers of the remaining itemset; in most cases, only one round of MapReduce is needed for our algorithm to discover all frequent patterns.
     (2) In the study on algorithm of mining frequent pattern over uncertain transaction datasets, a tree structure with higher compression ratio is proposed to improve the efficiency of the mining algorithms.
     The probability values of transaction itemsets are stored in arrays and then mapped to a tree to reduce the number of nodes needed for maintaining transaction dataset; together with the sliding window approach, two new tree data structures are proposed to maintain the within-window data and sub datasets created in the process of mining, ensures no loss of information during the mining process, and significant improvement of time&space efficiency of the mining algorithm. Furthermore, this thesis also proposes a new weight based frequent pattern mining model; this model takes into consideration the item's weight factor of a transaction itemset in the mining process, and includes high weight patterns into the mining result.
     (3) In the study on algorithms of mining high utility patterns, new mining algorithms are proposed without over-estimated utility values and without generating candidate itemsets.
     An innovative tree structure is proposed for maintaining transaction itemsets and their utility information; accurate utility values instead of over-estimated values can be retrieved from this tree, so high utility patterns can be discovered without generating candidates, and the algorithm efficiency is improved.
     Based on this algorithm and the sliding window method, together with a new tree structure for maintaining the within-window data, the proposed algorithm mines high utility patterns from the data stream without generating candidate itemsets by one scan of dataset. Compared with the state of art algorithm UP-Growth published in KDD and TKDE, the proposed algorithms can be one or two orders of magnitude better in terms of running time.
引文
[1]Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases[C]. International Conference on Very Large Data Bases (VLDB 1994), Santiago, Chile,1994:487.
    [2]Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases[C].1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, United states,1993:207-216.
    [3]Jong S P, Ming-Syan C, Yu P S. An effective hash-based algorithm for mining association rules [J]. SIGMOD Record.1995,24(2):175-186.
    [4]Cheung D W, Han J, Ng V T, et al. A fast distributed algorithm for mining association rules[C].4th International Conference on Parallel and Distributed Information Systems,1996:31-42.
    [5]Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation[C]. ACM SIGMOD International Conference on Management of Data, Dallas, TX, United states, 2000:1-12.
    [6]陈安龙,唐常杰,陶宏才,等.基于极大团和FP-Tree的挖掘关联规则的改进算法[J].软件学报.2004,15(8):1198-1207.
    [7]陈慧萍,王建东,叶飞跃,等.基于FP-tree和支持度数组的最大频繁项集挖掘算法[J].系统工程与电子技术.2005,27(9):1631-1635.
    [8]郭宇红,童云海,唐世渭,等.基于FP-Tree的反向频繁项集挖掘[J].软件学报.2008,19(2):338-350.
    [9]何波.基于FP-tree的快速挖掘全局最大频繁项集算法[J].计算机集成制造系统.2011,17(7):1547-1552.
    [10]秦亮曦,苏永秀,刘永彬,等.基于压缩FP-树和数组技术的频繁模式挖掘算法[J].计算机研究与发展.2008,45(z1):244-249.
    [11]申彦,宋顺林,朱玉全.基于磁盘表存储FP-TREE的关联规则挖掘算法[J].计算机研究与发展.2012,49(06):1313-1322.
    [12]王黎明,赵辉.基于FP树的全局最大频繁项集挖掘算法[J].计算机研究与发展.2007,44(3):445-451.
    [13]谢志强,朱孟杰,杨静.基于FP-Tree的敏感性关联规则隐藏的研究[J].哈尔滨工程大学学报.2009,30(10):1134-1140.
    [14]杨君锐,黄威.基于前缀树的数据流频繁模式挖掘算法[J].华中科技大学学报(自然科学版).2010,38(07):107-110.
    [15]易彤,徐宝文,吴方君.一种基于FP树的挖掘关联规则的增量更新算法[J].计算机学报.2004,27(5):703-710.
    [16]于红,王秀坤,孟军.用有序FP-tree挖掘最大频繁项集[J].控制与决策.2007,22(5):520-524.
    [17]张锦,马海兵,胡运发.一种基于FP-Tree的频繁模式挖掘自适应算法[J].模式识别与人工智能.2005,18(6):763-768.
    [18]张玉芳,熊忠阳,彭燕,等.基于FP-Tree含正负项目的频繁项集挖掘算法[J].模式识别与人工智能.2008,21(2):246-253.
    [19]El-hajj M, ZaTane 0 R. COFI-tree mining:a new approach to pattern growth with reduced candidacy generation[C]. IEEE International Conference on Frequent Itemset Mining Implementations,2003:
    [20]Vo B, Hong T, Le B. DBV-Miner:A Dynamic Bit-Vector approach for fast mining frequent closed itemsets[J]. Expert Systems with Applications.2012,39(8):7196-7206.
    [21]Song M, Rajasekaran S. A transaction mapping algorithm for frequent itemsets mining[J]. IEEE Transactions on Knowledge and Data Engineering.2006,4(18):472-481.
    [22]Ye F Y, Wang J D, Shao B L. New algorithm for mining frequent itemsets in sparse database[C]. International Conference on Machine Learning and Cybernetics, Guangzhou, China,2005:1554-1558.
    [23]Burdick D, Calimlim M, Flannick J, et al. MAFIA:A maximal frequent itemset algorithm[J]. IEEE Transactions on Knowledge and Data Engineering.2005,17(11): 1490-1504.
    [24]Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-trees[J]. IEEE Transactions on Knowledge and Data Engineering.2005,10(17):1347-1362.
    [25]Grahne G, Zhu J. High Performance mining of maximal frequent itemsets[C].6th SIAM International Workshop on High Performance Data Mining,2003:135-143.
    [26]Pei J, Han J, Lu H, et al. H-mine:Hyper-structure mining of frequent patterns in large databases[C]. IEEE International Conference on Data Mining (ICDM 2001), San Jose, CA, United states,2001:441-448.
    [27]Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns[C]. 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000),2000:108-118.
    [28]Pei J, Han J, Mao R. CLOSET:An efficient algorithm for mining frequent closed itemsets[C]. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,2000:21-30.
    [29]Lin D I, Kedem Z M. Pincer search:A new algorithm for discovering the maximum frequent set[J]. IEEE Transactions on Knowledge and Data Engineering.1998,14(3):553-566.
    [30]Roberto J, Bayyardo, Jr. Efficiently mining long patterns from databases[C]. ACM SIGMOD international conference on Management of data,1998:85-93.
    [31]Chandra B, Bhaskar S. A novel approach for finding frequent itemsets in data stream [J]. International Journal of Intelligent Systems.2013,28(3):217-241.
    [32]Nori F, Deypir M, Sadreddini M H. A sliding window based algorithm for frequent closed itemset mining over data streams[J]. Journal of Systems and Software.2013,86(3): 615-623.
    [33]Berlingerio M, Pinelli F, Calabrese F. ABACUS:Frequent pattern mining-based community discovery in multidimensional networks[C]. Special Issues:ECML PKDD 2013 and ECML PKDD 2012, Van Godewijckstraat 30, Dordrecht,3311 GZ, Netherlands,2013: 294-320.
    [34]Rodriguez-Gonzalez A Y, Martinez-Trinidad J F, Carrasco-Ochoa J A, et al. Mining frequent patterns and association rules using similarities [J]. Expert Systems with Applications.2013,40(17):6823-6836.
    [35]Cameron J J, Cuzzocrea A, Leung C K. Stream mining of frequent sets with limited memory[C].28th Annual ACM Symposium on Applied Computing (SAC 2013), Coimbra, Portugal,2013:173-175.
    [36]Zaki M J, Hsiao C J. CHARM:An efficient algorithm for closed itemset mining[C]. IEEE International Conference on Data Mining (ICDM 2002),2002:457-473.
    [37]Lee G, Yun U, Ryu K H. Sliding window based weighted maximal frequent pattern mining over data streams[J]. Expert Systems with Applications.2013,1.
    [38]曾涛,唐常杰,朱明放,等.基于人工免疫和基因表达式编程的多维复杂关联规则挖掘方法[J].四川大学学报(工程科学版).2006,38(5):136-142.
    [39]柴玉梅,张卓,王黎明.基于频繁概念直乘分布的全局闭频繁项集挖掘算法[J].计算机学报.2012,35(05):990-1001.
    [40]陈安龙,唐常杰,傅彦,等.基于能量和频繁模式的数据流预测查询算法[J].软件学报.2008,19(6):1413-1421.
    [41]陈耿,朱玉全,杨鹤标,等.关联规则挖掘中若干关键技术的研究[J].计算机研究与发展.2005,42(10):1785-1789.
    [42]陈慧萍,朱峰,王建东,等.一种基于划分的带项目约束的频繁项集挖掘算法[J].系统工程与电子技术.2006,28(7):1082-1086.
    [43]杜奕,卢德唐,李道伦,等.时态约束下的频繁模式挖掘算法[J].模式识别与人工智能.2007,20(4):538-544.
    [44]冯文峰,郭巧,吴素妍.基于多层概要结构的数据流的频繁项集发现算法[J].北京理工大学学报.2006,26(6):512-516.
    [45]高杰,李绍军,钱锋.数据挖掘中关联规则算法的研究及应用[J].东南大学学报(自然科学版).2006,2006(S1):128-131.
    [46]耿生玲,李永明,刘震.关联规则挖掘的软集包含度方法[J].电子学报.2013,41(04):804-809.
    [47]郭宇红,童云海,唐世渭,等.带学习的同步隐私保护频繁模式挖掘[J].软件学报.2011,22(8):1749-1760.
    [48]贺志,田盛丰,黄厚宽.一种挖掘数值属性的二维优化关联规则方法[J].软件学报.2007,18(10):2528-2537.
    [49]胡春玲,吴信东,胡学钢,等.基于贝叶斯网络的频繁模式兴趣度计算及剪枝[J].软件学报.2011,22(12):2934-2950.
    [50]黄名选,严小卫,张师超.基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J].软件学报.2009,20(7):1854-1865.
    [51]吉根林,韦素云,鲍培明.一种基于DOM树的XML数据频繁模式挖掘算法[J].南京航空航天大学学报.2006,38(2):206-211.
    [52]李杰,徐勇,王云峰,等.面向个性化推荐的强关联规则挖掘[J].系统工程理论与实践.2009,29(8):144-152.
    [53]刘大有,王生生,虞强源,等.基于定性空间推理的多层空间关联规则挖掘算法[J].计算机研究与发展.2004,41(4):565-570.
    [54]刘君强,孙晓莹,王勋,等.挖掘最大频繁模式的新方法[J].计算机学报.2004,27(10):1327-1334.
    [55]陆建江,徐宝文,邹晓峰,等.模糊关联规则的并行挖掘算法[J].东南大学学报(自然科学版).2005,35(2):165-170.
    [56]马海兵,张锦,范颖杰,等.基于静态IS-树的频繁模式挖掘[J].模式识别与人工智能.2005,18(6):664-669.
    [57]马志新,陈晓云,王雪,等.最大频繁项集挖掘中搜索空间的剪枝策略[J].清华大学学报(自然科学版).2005,45(9):1748-1752.
    [58]毛宇星,陈彤兵,施伯乐.一种高效的多层和概化关联规则挖掘方法[J].软件学报.2011,22(12):2965-2980.
    [59]钱爱玲,瞿彬彬,卢炎生,等.多时间序列关联规则分析的论坛舆情趋势预测[J].南京航空航天大学学报.2012,44(06):904-910.
    [60]秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展.2005,42(02):217-223.
    [61]邱江涛,唐常杰,乔少杰,等.基于加权频繁项集的文本分类规则挖掘[J].四川大学学报(工程科学版).2008,40(6):110-114.
    [62]荣冈,刘进锋,顾海杰.数据库中动态关联规则的挖掘[J].控制理论与应用.2007,24(1):127-131.
    [63]宋余庆,王立军,吕颖,等.基于分类树的高效关联规则挖掘算法[J].江苏大学学报(自然科学版).2006,27(1):51-54.
    [64]万里,廖建新,朱晓民,等.一种基于频繁模式的时间序列分类框架[J].电子与信息学报.2010,32(2):261-266.
    [65]颜跃进,李舟军,陈火旺.一种挖掘最大频繁项集的深度优先算法[J].计算机研究与发展.2005,42(3):462-467.
    [66]杨君锐,何洪德,杨莉,等.分布式全局最大频繁项集挖掘算法[J].中南大学学报(自然科 学版).2012,43(09):3517-3523.
    [67]张玉芳,熊忠阳,彭燕,等.基于兴趣度含正负项目的关联规则挖掘方法[J].电子科技大学学报.2010,39(3):407-411.
    [68]朱玉全,宋余庆,杨鹤标,等.基于频繁模式树的关联分类规则挖掘算法[J].江苏大学学报(自然科学版).2006,27(3):262-265.
    [69]Wang S, Wang L. An implementation of FP-growth algorithm based on high level data structures of weka-JUNG framework[J]. Journal of Convergence Information Technology. 2010,5(9):
    [70]刘学军,徐宏炳,董逸生,等.挖掘数据流中的频繁模式[J].计算机研究与发展.2005,42(12):2192-2198.
    [71]张昕,李晓光,王大玲,等.数据流中一种快速启发式频繁模式挖掘方法[J].软件学报.2005,16(12):2099-2105.
    [72]宋威,李晋宏,徐章艳,等.一种新的频繁项集精简表示方法及其挖掘算法的研究[J].计算机研究与发展.2010,47(2):277-285.
    [73]Feng L, Wang L, Jin B. Research on maximal frequent pattern outlier factor for online high-dimensional time-series outlier detection[J]. Journal of Convergence Information Technology.2010,5(10):66-71.
    [74]童咏听,马世龙,李钰.一种有效压缩频繁模式挖掘的算法[J].北京航空航天大学学报.2009,35(5):640-643.
    [75]敖富江,杜静,陈彬,等.一种基于混合搜索的高效Top-K最频繁模式挖掘算法[J].国防科技大学学报.2009,31(2):90-93.
    [76]陈晓云,胡运发.N个最频繁项集挖掘算法[J].模式识别与人工智能.2007,20(4):512-518.
    [77]朱颢东,李红婵.关于Top-N最频繁项集挖掘的研究[J].电子科技大学学报.2010,39(5):757-761,773.
    [78]董祥军,王淑静,宋瀚涛,等.负关联规则的研究[J].北京理工大学学报.2004,24(11):978-981.
    [79]马占欣,陆玉昌.负关联规则挖掘中的频繁项集爆炸问题[J].清华大学学报(自然科学版).2007,47(7):1212-1215.
    [80]Rawat R, Jain N. A Survey on Frequent ItemSet Mining Over Data Stream[J]. International Journal of Electronics Communication and Computer Engineering (IJECCE).2013,4(1): 86-87.
    [81]Karp R M, Shenker S, Papadimitriou C H. A Simple Algorithm for Finding Frequent Elements in Streams and Bags [J]. ACM Transactions on Database Systems.2003,28(1): 51-55.
    [82]Li H, Lee S, Shan M. An efficient algorithm for mining frequent itemsets over the entire history of data streams[C]. First International Workshop on Knowledge Discovery in Data Streams,2004:
    [83]Liu X, Guan J, Hu P. Mining frequent closed itemsets from a landmark window over online data streams[J]. Computers and Mathematics with Applications.2009,57(6):927-936.
    [84]Manku G S, Motwani R. Approximate frequency counts over data streams[C].28th international conference on Very Large Data Bases (VLDB 2002),2002:346-357.
    [85]杨蓓,黄厚宽.挖掘数据流界标窗口Top-K频繁项集[J].计算机研究与发展.2010,47(03):463-473.
    [86]Giannella C, Han J, Pei J, et al. Mining frequent patterns in data streams at multiple time granularities[J]. Next generation data mining.2003,2003(212):191-212.
    [87]Cohen E, Strauss M J. Maintaining time-decaying stream aggregates[J]. Journal of Algorithms.2006,59(1):19-36.
    [88]Chang J H, Lee W S. Finding recently frequent itemsets adaptively over online transactional data streams, [J]. Information Systems.2006,31(8):849-869.
    [89]李海峰,章宁,朱建明,等.时间敏感数据流上的频繁项集挖掘算法[J].计算机学报.2012,35(11):2283-2293.
    [90]吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报.2010,36(05):674-684.
    [91]Chang J H, Lee W S. estWin:Online data stream mining of recent frequent itemsets by sliding window method[J]. Journal of Information Science.2005,31(2):76-90.
    [92]Leung C K S, Khan Q I. DSTree:A tree structure for the mining of frequent sets from data streams[C]. IEEE International Conference on Data Mining (ICDM 2007), Hong Kong, China,2007:928-932.
    [93]Leung C, Brajczuk D. Efficient Mining of Frequent Itemsets from Data Streams[J]. Sharing Data, Information and Knowledge.2008,2-14.
    [94]Chi Y, Wang H, Yu P S, et al. Moment:Maintaining closed frequent itemsets over a stream sliding window[C].4th IEEE International Conference on Data Mining (ICDM 2004), Brighton, United kingdom,2004:59-66.
    [95]Tanbeer S K, Ahmed C F, Jeong B, et al. Sliding window-based frequent pattern mining over data streams[J]. Information Sciences.2009,179(22):3843-3865.
    [96]Deypir M, Sadreddini M H. Eclat:An efficient sliding window based frequent pattern mining method for data streams[J]. Intelligent Data Analysis.2011,15(4):571-587.
    [97]张君维,杨静,张健沛,等.基于滑动窗口的敏感关联规则隐藏[J].吉林大学学报(工学版).2013,2013(01):172-178.
    [98]毛伊敏,李宏,杨路明,等.基于滑动窗口的数据流最大频繁项集的挖掘[J].高技术通讯.2010,20(11):1142-1148.
    [99]李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报.2008,19(10):2585-2596.
    [100]刘学军,徐宏炳,董逸生,等.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研 究与发展.2006,43(10):1738-1743.
    [101]Prasad Sistla A, Wolfson 0, Chamberlain S, et al. Querying the uncertain position of moving objects [J]. In Temporal Databases:Research and Practice, Spring Verlag. 1998,1399(310-337.
    [102]Khoussainova N, Balazinska M, Suciu D. Towards correcting input data errors probabilistically using integrity constraints[C]. MobiDE 2006:5th ACM International Workshop on Data Engineering for Wireless and Mobile Access, Chicago, IL, United states,2006:43-50.
    [103]De Carvalho J V, Ruiz D D. Discovering frequent itemsets on uncertain data:A systematic review[C].9th International Conference on International Conference on Machine Learning and Data Mining (MLDM 2013), New York, NY, United states,2013: 390-404.
    [104]Leung C K S, Cuzzocrea A, Fan J. Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models. Transactions on Large-Scale Data-and Knowledge-Centered Systems Ⅷ, Springer Berlin Heidelberg,2013:174-196.
    [105]Leung C K, Tanbeer S K. PUF-Tree:A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data. Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg,2013:13-25.
    [106]Liu C, Chen L, Zhang C. Summarizing probabilistic frequent patterns:a fast approach[C].19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2013),2013:527-535.
    [107]Peterson E A, Tang P. Mining probabilistic generalized frequent itemsets in uncertain databases [C].51st ACM Southeast Conference (ACMSE 2013), Savannah, GA, United states, 2013:1.
    [108]Ying-Ho L, Chun-Sheng W. Constrained frequent pattern mining on univariate uncertain data[J]. Journal of Systems and Software.2013,86(3):759-778.
    [109]Lin C W, Hong T P. A new mining approach for uncertain databases using CUFP trees[J]. Expert Systems with Applications.2012,39(4):4084-4093.
    [110]Liu Y. Mining frequent patterns from univariate uncertain data[J]. Data and Knowledge Engineering.2012,71 (1):47-68.
    [111]Sun X, Lim L, Wang S. An approximation algorithm of mining frequent itemsets from uncertain dataset[J]. International Journal of Advancements in Computing Technology. 2012,4(3):42-49.
    [112]廖国琼,吴凌琴,万常选.基于概率衰减窗口模型的不确定数据流频繁模式挖掘[J].计算机研究与发展.2012,49(05):1105-1115.
    [113]Leung C K, Jiang F. Frequent itemset mining of uncertain data streams using the damped window model [C].26th Annual ACM Symposium on Applied Computing (SAC 2011), TaiChung, Taiwan,2011:950-955.
    [114]Leung C K, Jiang F. Frequent pattern mining from time-fading streams of uncertain data[C].13th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2011), Toulouse, France,2011:252-264.
    [115]Wang L, Cheung D W, Cheng R, et al. Efficient Mining of Frequent Itemsets on Large Uncertain Databases[J]. IEEE Transactions on Knowledge and Data Engineering.2011, 99 (PrePrints):1.
    [116]刘殷雷,刘玉葆,陈程.不确定性数据流上频繁项集挖掘的有效算法[J].计算机研究与发展.2011,2011(S3):1-7.
    [117]Calders T, Garboni C, Goethals B. Approximation of frequentness probability of itemsets in uncertain data[C]. IEEE International Conference on Data Mining (ICDM 2010), Sydney, NSW, Australia,2010:749-754.
    [118]Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications [J]. IEEE Transactions on Knowledge and Data Engineering.2009,21(5):609-623.
    [119]Leung C K, Hao B. Mining of frequent itemsets from streams of uncertain data[C]. International Conference on Data Engineering, Shanghai, China,2009:1663-1670.
    [120]Leung C K, Mateo M A F, Brajczuk D A. A tree-based approach for frequent pattern mining from uncertain data[C].12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan,2008:653-661.
    [121]Chui C, Kao B, Hung E. Mining frequent itemsets from uncertain data[C].11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007), Nanjing, China,2007:47-58.
    [122]Leung C K, Carmichael C L, Hao B. Efficient mining of frequent patterns from uncertain data[C]. IEEE International Conference on Data Mining Workshops (ICDM Workshops 2007), Omaha, NE, United states,2007:489-494.
    [123]王爽,王国仁.面向不确定感知数据的频繁项查询算法[J].计算机学报.2013,36(03):571-581.
    [124]王爽,王国仁.基于滑动窗口的Top-K概率频繁项查询算法研究[J].计算机研究与发展.2012,49(10):2189-2197.
    [125]Aggarwal C C, Li Y, Wang J, et al. Frequent pattern mining with uncertain data[C]. 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), Paris, France,2009:29-37.
    [126]Yao H, Hamilton H J, Butz G J. A foundational approach to mining itemset utilities from databases [C].4th SIAM International Conference on Data Mining (ICDM 2004), Lake Buena Vista, FL, United states,2004:482-486.
    [127]Yao H, Hamilton H J. Mining itemset utilities from transaction databases[J]. Data and Knowledge Engineering.2006,59(3):603-626.
    [128]Liu Y, Liao W K, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets[C].9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Hanoi, Viet nam,2005:689-695.
    [129]Li H, Huang H, Chen Y, et al. Fast and memory efficient mining of high utility itemsets in data streams[C].8th IEEE International Conference on Data Mining (ICDM 2008), 2008:881-886.
    [130]Tseng V S, Shie B, Wu C, et al. Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases[J]. IEEE Transactions on Knowledge and Data Engineering. 2013,25(8):1772-1786.
    [131]Tseng V S, Wu C W, Shie B E, et al. UP-Growth:An efficient algorithm for high utility itemset mining[C]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, DC, United states,2010:253-262.
    [132]Ahmed C F, Tanbeer S K, Jeong B S, et al. Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J]. IEEE Transactions on Knowledge and Data Engineering.2009,21 (12):1708-1721.
    [133]Liu M, Qu J. Mining high utility itemsets without candidate generation[C].21st ACM International Conference on Information and Knowledge Management (CIKM 2012), Maui, HI, United states,2012:55-64.
    [134]Lin C W, Hong T P, Lu W H. An effective tree structure for mining high utility itemsets[J]. Expert Systems with Applications.2011,38(6):7419-7424,
    [135]Li Y C, Yeh J S, Chang C C. Isolated items discarding strategy for discovering high utility itemsets[J]. Data and Knowledge Engineering.2008,64(1):198-217.
    [136]Hu J Y, Silovic A M. High-utility pattern mining:A method for discovery of high-utility item sets[J]. Pattern Recognition.2007,40(11):3317-3324.
    [137]Erwin A, Gopalan R P, Achuthan N R. CTU-mine:An efficient high utility itemset mining algorithm using the pattern growth approach[C].7th IEEE International Conference on Computer and Information Technology,2007:71-76.
    [138]Tseng V S, Chu C, Liang T. Efficient mining of temporal high utility itemsets from data streams[C].2nd International Workshop on Utility-Based Data Mining,2006:
    [139]Chu C, Tseng V S, Liang T. An efficient algorithm for mining temporal high utility itemsets from data streams[J]. Journal of Systems and Software.2008,81(7): 1105-1117.
    [140]Ahmed C F, Tanbeer S K, Jeong B. Efficient mining of high utility patterns over data streams with a sliding window method. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2010, Springer Berlin Heidelberg,2010: 99-113.
    [141]Liu J, Wang K, Fung B. Direct Discovery of High Utility Itemsets without Candidate Generation[C].12th IEEE International Conference on Data Mining (ICDM 2012),2012: 984-989.
    [142]Li H, Wang Y, Zhang D, et al. PFP:Parallel FP-growth for query recommendation[C]. 2nd ACM International Conference on Recommender Systems (RecSys 2008), Lausanne, Switzerland,2008:107-114.
    [143]王洁,戴清灏,曾宇,等.云制造环境下并行频繁模式增长算法优化[J].计算机集成制造系统.2012,18(09):2124-2129.
    [144]Riondato M, DeBrabant J A, Fonseca R, et al. PARMA:A parallel randomized algorithm for approximate association rules mining in MapReduce[C].21st ACM International Conference on Information and Knowledge Management (CIKM 2012), Maui, HI, United states,2012:85-94.
    [145]李玲娟,张敏.云计算环境下关联规则挖掘算法的研究[J].计算机技术与发展.2011,21(02):43-46.
    [146]黄立勤,柳燕煌.基于MapReduce并行的Apriori算法改进研究[J].福州大学学报(自然科学版).2011,39(05):680-685.
    [147]Xiao T, Yuan C, Huang Y. PSON:A parallelized SON algorithm with MapReduce for mining frequent sets[C].4th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2011), Tianjin, China,2011:252-257.
    [148]Zhou L, Zhong Z, Chang J, et al. Balanced parallel FP-growth with mapreduce[C]. IEEE Youth Conference on Information, Computing and Telecommunications (YC-ICT 2010), Beijing, China,2010:243-246.
    [149]Yang X Y, Liu Z, Fu Y. MapReduce as a programming model for association rules algorithm on Hadoop[C].3rd International Conference on Information Sciences and Interaction Sciences (ICIS 2010), Chengdu, China,2010:99-102.
    [150]Cryans J, Ratte S, Champagne R. Adaptation of apriori to MapReduce to build a warehouse of relations between named entities across the web[C].2nd International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2010), Menuires, France,2010:185-189.
    [151]Lin M, Lee P, Hsueh S. Apriori-based frequent itemset mining algorithms on MapReduce[C].6th International Conference on Ubiquitous Information Management and Communication,2012:76.
    [152]Mooney C H, Roddick J F. Sequential pattern mining--approaches and algorithms[J]. ACM Computing Surveys (CSUR).2013,45(2):19.
    [153]Anwar F, Petrounias I, Morris T, et al. Mining anomalous events against frequent sequences in surveillance videos from commercial environments[J]. Expert Systems with Applications.2012,39(4):4511-4531.
    [154]Liao V C, Chen M. DFSP:a Depth-First SPelling algorithm for sequential pattern mining of biological sequences[J]. Knowledge and Information Systems.2013,1-17.
    [155]Rao K S, Chandran K R. Mining of customer walking path sequence from RFID supermarket data[J]. Electronic Government, an International Journal.2013,10(1):34-55.
    [156]Brauckhoff D, Dimitropoulos X, Wagner A, et al. Anomaly extraction in backbone networks using association rules[J]. IEEE/ACM Transactions on Networking (TON).2012, 20(6):1788-1799.
    [157]Karabatak M, Ince M C. An expert system for detection of breast cancer based on association rules and neural network[J]. Expert Systems with Applications.2009, 36(2):3465-3469.
    [158]Tajbakhsh A, Rahmati M, Mirzaei A. Intrusion detection using fuzzy association rules[J]. Applied Soft Computing.2009,9(2):462-469.
    [159]Yoon K, Bae D. A pattern-based outlier detection method identifying abnormal attributes in software project data[J]. Information and Software Technology.2010, 52(2):137-151.
    [160]李宏,李博,吴敏,等.一种基于关联规则的多类标分类算法[J].控制与决策.2009,24(4):574-578,582.
    [161]Nguyen L T, Vo B, Hong T, et al. Classification based on association rules:A lattice-based approach[J]. Expert Systems with Applications.2012,39(13): 11357-11366.
    [162]Zhang S J, Zhou Q. A Novel Efficient Classification Algorithm Based on Class Association Rules[J]. Applied Mechanics and Materials.2012,135:106-110.
    [163]Nguyen L T, Vo B, Hong T, et al. Classification based on association rules:A lattice-based approach[J]. Expert Systems with Applications.2012,
    [164]延皓,张博,刘芳,等.基于量值的频繁闭项集层次聚类算法[J].北京邮电大学学报.2011,34(06):64-68.
    [165]Wang L, Feng L, Jin B. Sliding Window-based Frequent Itemsets Mining over Data Streams using Tail Pointer Table[J]. International Journal of Computational Intelligence Systems (Accept).
    [166]刘黎明,王水,王乐.基于迭代事务集与交集剪枝的最大频繁项集挖掘算法[J].南开大学学报(自然科学版).2009,42(04):97-102.
    [167]王乐,王水,陈波,等.交集剪枝法挖掘最大频繁项集[J].计算机工程与应用.2009,45(13):156-159.
    [168]Koh J L, Shieh S F. An efficient approach for maintaining association rules based on adjusting FP-Tree structures. Springer Berlin Heidelberg,2004:417-424.
    [169]Tanbeer S K, Ahmed C F, Jeong B S, et al. CP-tree:A tree structure for single-pass frequent pattern mining[C].12th Pacific-Asia conference on Advances in knowledge discovery and data mining, Osaka, Japan,2008:1022-1027.
    [170]Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data[C]. ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada,2008:819-831.
    [171]Sun L, Cheng R, Cheung D W, et al. Mining uncertain data with probabilistic guarantees[C]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, DC, United states,2010:273-282.
    [172]Bernecker T, Kriegel H P, Renz M, et al. Probabilistic frequent itemset mining in uncertain databases[C]. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), Paris, France,2009:119-127.
    [173]IBM Data Generator. http://www.almaden.ibm.com/software/quest/Resources/index. shtml. Accessed in June 2011.
    [174]Goethals B. Frequent itemset mining dataset repository:[Z].2010:2011, Frequent itemset mining dataset repository.
    [175]Wang L, Feng L, Wu M. AT-Mine:An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset[J]. Journal of Computers.2013,8(6):1417-1426.
    [176]Wang L, Feng L, Wu M. UDS-FIM:An efficient algorithm of frequent itemsets mining over uncertain transaction data streams[J]. Journal of Software(In Press).2013,
    [177]Feng L, Wu M, Wang L. Top-K Highly Expected Weight-based Itemsets Mining over Uncertain Transaction Datasets[J]. Journal of Computational Information Systems (Accepted). 2012,
    [178]Wang L, Wang S, Feng L. High expected weight itemsets mining on uncertain transaction datasets[J]. International Journal of Advancements in Computing Technology.2012, 4(20):625-632.
    [179]Lin C W, Hong T P, Lan G C, et al. Mining High Utility Itemsets Based on the Pre-large Concept. Advances in Intelligent Systems and Applications-Volume 1, Springer Berlin Heidelberg,2013:243-250.
    [180]Wu C W, Shie B, Tseng V S, et al. Mining top-K high utility itemsets[C].18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012), 2012:78-86.
    [181]Dwivedi V K. Disk-resident high utility pattern mining:a trie structure implementation[C].2013 International Conference on Information Systems and Computer Networks (ISCON 2013), Piscataway, NJ, USA,2013:44-49.
    [182]Pillai J, Vyas 0 P, Muyeba M. HURI-A novel algorithm for mining high utility rare itemsets[C].2nd International Conference on Advances in Computing and Information Technology (ACITY 2012), Chennai, India,2013:531-540.
    [183]Lin C, Hong T, Lan G, et al. Incrementally mining high utility patterns based on pre-large concept[J]. Applied Intelligence.2013,1-15.
    [184]Song W, Liu Y, Li J. Mining high utility itemsets by dynamically pruning the tree structure[J]. Applied Intelligence.2013,1-15.
    [185]Pisharath J, Liu Y, Ozisikyilmaz B, et al. NU-MineBench Version 2.0 Scorce Code and Datasets:[Z]. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html. Accessed in July 2011.
    [186]Ye F Y, Wang J D, Shao B L. New algorithm for mining frequent itemsets in sparse database [C]. International Conference on Machine Learning and Cybernetics, Guangzhou, China,2005:1554-1558.
    [187]Feng L, Wang L, Jin B. UT-Tree:Efficient mining of high utility itemsets from data streams[J]. Intelligent Data Analysis.2013,17(4):585-602.
    [188]Feng L, Jiang M, Wang L. An algorithm for mining high average utility itemsets based on tree structure [J]. Journal of Information and Computational Science.2012,9(11): 3189-3199.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700