数据流相似性查询及模式挖掘研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

数据流相似性查询及模式挖掘研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Similarity Query and Pattern Mining on Data Streams
作者：郭建奎
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：数据流 ; 数据挖掘 ; 相似性 ; 相似性查询 ; Lp-norm ; DTW ; 流模式 ; 频繁模式 ; Web访问模式
英文关键词：Data Mining ; Data Stream ; Similarity Search ; Web Access Pattern ; Lp-norm ; DTW ; Frequent Pattern
学位年度：2008
导师：朱扬勇
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2008-04-10

摘要

随着数据挖掘研究领域的不断拓宽和研究内容的不断深入,人们发现应用中越来越多的数据是以流的形式产生的,例如网络流,网页点击流,交通流以及传感器网络数据等等。分析和挖掘这类数据日益成为一个热点问题。其中,分析流数据间的相似性和模式发现成为重要的研究内容。研究数据流的相似性查询对于完善数据流查询、改进数据流系统等都有着重要的应用价值,并且对于在数据流上进行分类、聚类等也有着指导意义。当前在数据流环境下相似性查询和模式发现的研究工作没有充分考虑数据流数据自身的特点,往往假定内存空间无限或者不满足增量更新。另一方面,据我们所知,目前还没有相关工作系统地解决相似性查询的问题。
     基于此,本文着重研究数据流环境下相似性查询及模式发现问题,主要包括如下三个关键方面:
     (1)基于Lp距离,提出系统解决数据流环境下相似性查询的技术。
     在数据流环境下,基于Lp距离函数,本文系统的提出了一个解决相似性查询的框架,用以解决数据流环境下相似性查询。在充分分析数据流数据的特点后,提出一种新颖的数据结构SDS-Tree(the Same-DirectedSlope Tree)来分层表示数据流对象,实现对原始数据流的表示。基于Lp距离,本文证明SDS-Tree的有效性,并且进一步给出一个相似性判别中更为有效的粒度。基于有效的SDS-Tree结构,文章分别给出有效处理单一固定窗口下的相似性查询算法ASQFSW(Algorithm for SimilarityQueries in Fixed Sliding Window)以及滑动窗口下的增量相似性查询算法IASQSW(Incremental Algorithm for Similarity Queries in SlidingWindows)。特别,IASQSW算法找到了窗口滑动时数据流数据变化的一个上界,根据该上界,算法只需更新有限的SDS-Tree结点,就能够完成窗口滑动时的相似性查询。详细的理论分析以及大量的实验评估表明,我们给出的技术和方法显著优于目前的研究方法。
     (2)针对Lp距离无法解决时间弯曲的现象,为提高在数据流环境下相似性查询的准确度,提出了基于DTW距离的相似性查询的技术。
     在数据流环境下,使用Lp距离无法解决时间弯曲现象。为提高在一些应用场合中相似性匹配的准确度,基于DTW距离,提出了一个解决相似性查询的算法ESDS(Estimating Similarity on Data Streams)。算法根据数据流数据的变化特性提出了数据分段的思想,每段数据仅用三个数值(最大值,最小值和差异值)来表示原始数据流的特性。为保证数据特征提取的有效性,根据数据变化的规律,提出了振荡数据的概念,并给出了判断数据流中是否存在振荡数据的算法judgeSurge。为保证对振荡数据的处理不会影响到数据流的特性,进一步提出了有效振荡和最大有效振荡幅度的概念,设计了求解有效振荡数据的算法judgeValidSurge和求解最大有效振荡幅度的算法calMaxScope。算法基于特征数据,设计了新的DTW距离函数,基于动态规划算法,设计了在数据流环境下进行相似性判别的算法ESDS。详细的理论分析以及大量的实验评估表明,文章给出的技术和方法具有很高的准确度和效率。
     (3)针对Web流数据,设计了两个Web流模式挖掘算法。
     数据流间的相似性查询在Web数据流中有重要的应用价值。文章首先分析了Web流数据的特征,然后着重研究了Web流数据的模式发现问题。在充分分析经典算法WAP-mine的缺陷后,首先针对WAP树结构设计了一个自顶向下挖掘的算法TD-WAP-mine。算法避免了在挖掘频繁模式过程中每次需要构造大量中间数据,而直接对原始的WAP树进行挖掘,节省了生成中间数据的代价,在支持度比较小或者原始Web流数据过于大的情况下,TD-WAP-mine表现出更好的性能。其次,针对WAP树存在数据冗余情况,提出了压缩WAP树的概念,在不影响挖掘结构的前提下,设计了压缩WAP树算法,并且直接对WAP树投影,设计了一个自顶向下的挖掘算法TAM-WAP,在大规模实验集上的实验表明,TAM-WAP算法表现出更好的性能和伸缩性。
As the domain of data mining is becoming more and more widely,and the content of its is becoming deeper and deeper,people find that many data are produced with the manner of streams,such as Web streams,Web click streams,traffic streams and sensor datas,etc.Studying data stream has become a hot research field in data mining society,and has many important applications.Estimating similarity on data streams and finding pattern hidden in the stream are two important tasks.Much work has been done about how to estimate similarity and patterns on data streams.However,there exists lots of work to be done.We focus on the features of data in the stream context and propose a framework to deal with similarity evaluation.And furthermore we address the problem by analyzing Web data,finding information and mining patterns hidden in them with two algorithms of finding Web access patterns.
     The main contributions of this thesis are as follows.
     We propose an efficient technology,i.e.,ETSEDS(an Efficient Technology for Similarity Evaluation on Data Streams),to process similarity queries on data streams under Lp-norm.ETSEDS technology captures the main characteristics of stream data,and exploits a novel tree structure,i.e. SDS-Tree(the Same-Directed Slope Tree),to stratify and construct stream data on sliding windows.Moreover,based on the SDS-Tree structure,we propose two efficient algorithms,i.e.ASQFSW(Algorithm for Similarity Queries in Fixed Sliding Window) and IASQSW(Incremental Algorithm for Similarity Queries in Sliding Windows),to process the similarity queries on single-fixed sliding window and general sliding windows respectively. Specially,IASQSW algorithm can realize similarity queries by updating a spot of nodes on SDS-Tree structure.Furthermore,we present detailed theoretical analyses and extensive experiments that demonstrate our algorithms,which are both efficient and effective.
     We propose a new algorithm ESDS(Estimating Similarity on Data Streams),which can estimate similarity efficiently on data streams under the time warping distance.In order to evaluate the efficiency of our algorithm,we present a simple but efficient method of Segment to denote the original stream data and the Segment method just need three data to express the stream,which are the max data,min data and the difference between them.To improve the efficiency of Segment,we study a new type of stream which is called Surge Stream.We also propose a new method to judge whether a stream is a Surge Stream.However,Surge Stream sometime will become very regularly in the original stream.So we propose the concept of Valid Surge Stream and Max Valid Surge Scope and design algorithms which can be used to find them respectively.With the help of the above concept,our Segment algorithm can detect the features of the original streams efficiently.In computing the distance of DTW between data streams by using dynamic programming,we also design a new distance of DTW which can compute the similarity on data streams efficiently.The experiments on many real and synthetic data sets show that our algorithm can evaluate the similarity on data streams efficiently and not be studied in the previous research.
     Discovering interesting Web access patterns from Web logs is a Web usage-mining problem with many practical applications.For this problem, some conventional algorithms,such as GSP,HPrefix and WAP-mine,can be used to mine WAP from Web logs.WAP-mine Algorithm mines Web access patterns by storing the original Web access sequence database and then generates the frequent patterns by recursively mining the intermediate Web access pattern trees.Though WAP-mine scans the database only twice,it needs to recursively build intermediate data and will suffer strongly from such building activities especially on low support thresholds.Because in the process of mining frequent Web access pattern,WAP-Mine will generate much intermediate data,at the lower support the efficiency is very low.To address this issue,we propose two new algorithms based on the top-down manner for mining Web access pattern.First,based on the WAP-tree structure, which is proposed by WAP-mine,we propose a new method TD-WAP-mine which can be used to find Web access patterns in a Top-Down manner. TD-WAP-mine can efficiently find frequent patterns at the low support because it doesn't need generate too much the intermediate WAP-tree. Second,we give a compressed structure of WAP according to the features of WAP and design a new method of how to find the compressed WAP-tree.We also give a new strategy which is called Projection method to find frequent patterns.Instead of stubbornly building intermediate data for each step of mining process,our algorithm selectively builds intermediate data according to the features of current area to be mined.The experimental results on various real world and artificial datasets show that our two algorithms greatly reduce the efforts to build intermediate data and in general offers a better performance than WAP-mine.

引文

[ACF+00] Julia Allen, Alan Christie, William Fithen, John McHugh, Jed Pickel, Ed Stoner. State of the practice of intrusion detection technologies. CMU/SEI-99-TR-028, 2000.
    [AFS93] Rakesh Agrawal, Christos Faloutsos, Arun Swami. Efficient similarity search in sequence databases. FODO, 1993.
    [AH00] Avnur R, Hellerstein JM. Continuously Adaptive Query Processing. Proceedings of ACM SIGMOD Conference, 2000. 261-272.
    [AHW+03] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams. VLDB 2003: 81-92.
    [ALS+95] Rakesh Agrawal, King-Lp Lin, Harpreet S. Sawhney, Kyuseok Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series database. In Proceedings of the 21~(st) Vldb Conference, Zurich, Switzerland, 1995.
    [AMS+96] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of the 1996 Annual ACM Symp. On Theory of Computing, pages 20-29,1996.
    [AS94] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In VLDB-94, September 1994.
    [AS95] R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE'95), Taipei, Taiwan, Mar, 1995, pp. 3-14.
    [BBM+02] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom. Models and issues in data stream systems. In :Proc of 21st ACM Symposium on Principles of Database Systems( PODS 2002), 2002. 1- 16.
    [BBK98] Stefan Berchtold, Christian Bohm, Hans-peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD, Seattle, USA, pages 142-153,1998.
    [BC94] Donald J. Berndt, James Clifford. Using Dynamic Time Warping to Find Patterns in Time Series. KDD Workshop 1994: 359-370.
    [BH99] M Black, R J Hickeyl Maintaining the performance of a learned classifier under concept drift. Intelligent Data Analysis, 1999, 3: 453-474.
    [BKS+90] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proc. ACM SIGMOD, Atlantic City, USA, pages 322-331, May 1990.
    [BSG+01] B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the Web. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 437-448, May 2001.
    [Coo00] R. W. Cooley. Web Usage Mining: Discovery and Application of Interesting Patterns from Web data. Phd thesis, Dept. of computer Science, University of Minnesota, May 2000.
    [CCD+03] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman, Fred Reiss, and Mehul Shah. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Proc of The Conf on Innovative Data Systems Research (CIDR), 2003.
    [CDI+02] Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthukrishnan: Comparing Data Streams Using Hamming Norms (How to Zero In). VLDB 2002: 335-345.
    [CF99] Kin-pong Chan, Ada Wai-Chee Fu: Efficient Time Series Matching by Wavelets. ICDE 1999: 126-133.
    [CFY03] Franky Kin-Pong Chan, Ada Wai-chee Fu, Clement Yu. Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping. IEEE Transactions on Knowledge and Data Engineering, Volume 15 Issue 3, March 2003.
    [CHC+99] Chu-Song Chen, Yi-Ping Hung, Jen-Bo Cheng. RANSAC-Based DARCES: A New Approach to Fast Automatic Registration of Partially Overlapping Range Images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11): 1229-1234 (1999).
    [CKH+02] S. Chu, E. Keogh, D. Hart, and M. Pazzani. Iterative deepening dynamic time warping for time series. In Proceedings of SIAM International Conference on Data Mining, 2002.
    [CL08] Lei Chen, Xiang Lian: Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding. IEEE Trans. Knowl. Data Eng. 20(3): 321-336 (2008). [
    CM03] Graham Cormode, S. Muthukrishnan: Estimating Dominance Norms of Multiple Data Streams. ESA 2003: 148-160.
    [CMS97] Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web (A Survey Paper) (1997), in Proceedings of the 9th IEEE International Conference on Tools with Artificial. Newport Beach, IEEE, 1997 pp. 558-567.
    [CMS99] Robert Cooley, Bamshad Mobasher, Jaideep Srivastava. Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 1999, pp.5-32.
    [CN04] Yuhan Cai, Raymond T. Ng: Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials. SIGMOD Conference 2004: 599-610.
    [COP03] M. Charikar, L. O'Callaghan, R. Panigrahy. Better streaming algorithms for clustering problems. Proc. of 35th ACM Symposium on Theory of Computing, 2003.

    [CPY98] M.-S. Chen, J.-S. Park and P. S. Yu: Efficient Data Mining for Path Traversal Patterns. IEEE Trans. on Knowledge and Data Engineering, Vol. 10, No. 2, Arpil 1998, pp. 209-221.

    [CTS00] Robert Cooley, Pang-Ning Tan, Jaideep Srivastava, Discovering of Interesting Usage Patterns from Web Data, Web Usage Analysis and User Profiling Workshop,revised papers/WEBKDD'99, August 1999, Berlin Heidelberg,2000,pp.163-182.

    [CWC03] Jin C, Qian W, Sha C, Yu JX, Zhou A. Dynamically aintaining frequent items over a data stream. In: Carbonell J, ed. Proc. of the 2003 ACM CIKM Int'l Conf. on Information and Knowledge Management. New Orleans: ACM Press, 2003. 287-294.

    [CYT02] C Cranor, Y Gao, T Johnson, et al. GigaScope: High Performance Network Monitoring with an SQL Interface. Proc. ACM Int Conf. on Management of Data, 2002. 623.

    [DDD+92] D. Terry, D. Goldberg, D. Nichols, and B. Oki. Continuous queries over append-only databases. In Proc. Of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992.
    [DDU+03] Daniel J. Abadi, Don Carney, Ugur C, etintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, Stan Zdonik. Aurora: A New Model and Architecture for Data Stream Management. The Intl Journal on Very Large Data Bases , 2003 , 12 (2): 120-139.
    [DH00] P Domingos, G Hulten. Mining high2speed data streams. The Assoiciation for Computing Machinery 6~(th) Int Conf on Knowledge Discovery and Data Minings, Boston , 2000.
    [DM02] Mayur Datar, S. Muthukrishnan, estimating rarity and similarity over data stream windows, DIMACS Technical Report 2002-21.
    [DP95] Dina Q.Goldin, Paris C.Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In proceedings of Constraint Programming 95, Marseilles, September 1995.
    [DR06] Fan Deng, Davood Rafiei: Approximately detecting duplicates for streaming data using stable bloom filters. SIGMOD Conference 2006: 25-36.
    [Eli99] J. S. T. Eliassi-Rad. Intelligent agents for Web-based tasks: An advice-taking approach. In Working Notes of the AAAI/ICML-98 Workshop on Learning for Text Categorization, Madison, WI, pages 588-589,1999.
    [EAM02] Demaine E, Lopez-Ortiz A, Munro JI. Frequency estimation of Internet packet streams with limited space. In: Mohring RH, Raman R, eds. Algorithms. ESA 2002, Proc. of the 10th Annual European Symp. Rome: Springer-Verlag, 2002. 348-360.
    [EL05] Christie I. Ezeife, Yi Lu: Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree. Data Min. Knowl. Discov. 10(1): 5-38 (2005).
    [FK95] Christos Faloutsos, King-Ip Lin. FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Conference 1995: 163-174.
    [FPS+96] Fayyad U.M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in Knowledge Discovery and Data Mining, Cambridge, MA: AAAI/MIT Press, 1996.
    [FRM94] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proceedings of ACM SIGMOD, pages 419 - 429, May 1994.
    [G84] Antonm Guttman. R-tree: a dynamic index structure for spatial searching. ACM Press, 1984.46-57.
    [G93] GRAEFE G. Query Evaluation Techniques for Large Databases. ACM Computing Surveys, 1993, 25 (2): 73 - 170.
    [GGI+02] A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, Small-Space Algorithms for Approximate Histogram Maintenance. STOC 2002: 152-161.
    [GGK03] Guha S, Gunopulos D, Koudas N. Correlating Synchronous and Asynchronous Data Streams. In: Proc of the 9th ACM SIGKDD Conference, 2003. 529-534.
    [GH05] S. Guha Sudipto Guha, Boulos Harb. Wavelet synopsis for data streams: Minimizing Non-Euclidean Error. Kdd conference 2005: 88-97.
    [GHP+03] C. Giannella, J. Han, J. Pei, X. Yan, P.S. Yu. Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: H. Kargupta, A. Joshi, K.Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT, 2003.
    [GHS02] M Guetova, Holldobter, H P Storr. Incremental fuzzy decision trees. The 25th German Conf on Artificial Intelligence (KI2002), Aachen , Germany, 2002.
    [GKM+01] Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss: Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. VLDB 2001: 79-88.
    [GKS01] Johannes Gehrke, Flip Korn, Divesh Srivastava: On Computing Correlated Aggregates Over Continual Data Streams. SIGMOD Conference 2001: 13-24.
    [GMK03] Dina Q. Goldin, Todd D. Millstein, Ayferi Kutlu. Bounded similarity querying for time-series data. Information and Computation, Volume 19 Issue 2, Nov. 2004.
    [GMM00] S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering data streams. Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE, November 2000.
    [GMM+03] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE TKDE, 15(3):515-528, 2003.
    [GRM03] J Gama, R Rocha, P Medas. Accurate decision trees for mining high2speed data streams. In: Proc of the the ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining New York: ACM Press, 2003.
    [GS03] Cormode G, Muthukrishnan S. What's hot and what's not: Tracking most frequent items dynamically. In: Halevy AY, Ives ZG, Doan AH, eds. Proc. Of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. On Principles of Database Systems. San Diego: ACM Press, 2003. 296-306.
    [GW02] Like Gao, Xiaoyang Sean Wang: Continually evaluating similarity-based pattern queries on a streaming time series. SIGMOD Conference 2002: 370-381.
    [HIA95] H. Jagadish, I. Mumick, and A. Silberschatz. View maintenance issues for the Chronicle data model. In Proc. Of the 1995 ACM Symp. On principles of Database Systems, pages 113-124, May 1995.
    [HM01] J. Han, Xiaofeng Meng. Research on Web Ming: A Survey.In Journal of Computer Research and Development, 2001,38(4):405.
    [HPM+00] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. 2000 ACM SIGKDD Int. Conf. Knowledge Discovery in Databaseds(KDD'OO), pp. 355-359, Boston, MA, Aug. 2000.
    [HPY00] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD'2000, Dallas, Tx, May 2000, pp. 1-12.
    [HPY05] J. Han, J. Pei, and X. Yan: Sequential pattern mining by Pattern-Growth. Principles and Extensions, StudFuzz 180, 183-220, 2005.
    [HSD01] Hulten G, Spencer L, Domingos P. Mining Time Changing Data Streams. In : Proc of the 7th ACM SIGKDD Intl Conf on Knowledge Discovery and Data Mining, 2001. 97-106.
    [IV06] Renata Ivancsy, Istvan Vajk. Frequent Pattern Mining in Web Log Data. Acta Polytechnical Huangarica, pages 77-90, Vol.3, No.1, 2006.
    [J00] McHugh J., Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA. Intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on. Informationand System Security, 2000. 3(4), 262-294.
    [JA03] R Jin, G Agrawal. Efficient decision tree construction on streaming data. The ACM SIGKDD 9th Int Conf on Knowledge Discovery and Data Mining, Washington, 2003.

    [JDF+00] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagraCQ: A scalable continuous query system for internet databases. In Proc. Of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May. 2000.
    [JL01] J.-S. R. Jang and H.-R. Lee. Hierarchical filtering method for content-based music retrieval via acoustic input. In Proceedings of ACM Multimedia, pages 401-410, September/October 2001.
    [K96] Joseph K. The association between Financial Ratios and stock prices for firms on the stock exchange. Asian journal of business information systems, vol. 1, Num. 1, summer 1996.
    [Keo02] Eamonn J. Keogh. Exact Indexing of Dynamic Time Warping. VLDB 2002:406-417.
    [KB00] Raymond Kosala, Hendrik Blockeel. Web Mining Research: A Survey. SIGKDD Explorations 2(1): 1-15 (2000).
    [KL03] Tae-Yeong Kwak, Yoon-Joon Lee. A filtering method for searching similar multidimensional sequences under the time-warping distance. Information Systems, Volume 28 Issue 7, October 2003.
    [KJC+97] Flip Korn, H. V. Jagadish, Christos Faloutsos. Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. SIGMOD Conference 1997: 289-300.
    [KP00] E. Keogh, M. Pazzani. Scaling up Dynamic Time Warping for Datamining Applications[C]. Proceedings of the Sixth ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press. 2000:285-289.
    [KP01] E. Keogh, M. Pazzani. Derivative Dynamic Time Warping. In First SIAM International Conference on Data Mining, Chicago, 2001.
    [KPC01] S.-W. Kim, S. Park, and W. W. Chu. An index-based approach for similarity search supporting time warping in large sequence databases. In Proceedings of ICDE, pages 607-614, April 2001.
    [KPC04] Sang-Wook Kim, Sanghyun Park, Wesley W.Chu. Efficient processing of similarity search under time warping in sequence databases: an index-based approach. Information Systems, Volume 29 Issue 5, July 2004.
    [KYI+99] H. Kawasaki, T. Yatabe, K. Ikeuchi, and M. Sakauchi. Automatic modeling of a 3d city map from real-world video. In Proceedings of ACM Multimedia (1), pages 11-18, October/November 1999.
    [Lan99] P. Langley. User modeling in adaptive interfaces. In Proceedings of the Seventh International conference on User Modeling, pages 357-370,1999.
    [Las02] M Last. Online classification of nonstationary data streams. Intelligent Data Analysis, 2002, 6(2): 1-16.

    [LBO+00] B. Lan, S. Bressan, B.C. Ooi, and K. Tan. Rule-Assisted Prefetching in Web-Server Caching, Proc. ACM Int'l Conf. Information and Knowledge Management (ACM CIKM'00), pp.504-511, 2000.
    [LC08] Xiang Lian, Lei Chen. Efficient Similarity Search over Future Stream Time Series. IEEE Trans. Knowl. Data Eng. 20(1): 40-54 (2008).
    [LCW+99] L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Trans. On Knowledge and Data Engineering, 11(4):583-590,Aug. 1999.
    [LCY+07] Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, Ge Yu. Similarity Match Over High Speed Time-Series Streams. ICDE 2007: 1086-1095.
    [LE03] Yi Lu, C.I. Ezeife. Position Coded Pre-order Linked WAP-tree for Web Log Sequential Pattern Mining. In Proceedings of the 7th Pacific-Asia Conference, PAKDD 2003, Seoul, Korea, April 30 - May 2, 2003, pp. 337 - 349.
    [LKL04] Sangjun Lee, Dongseop Kwon, Sukho Lee. Minimum distance queries for time series data. Journal of Systems and Software, Volume 69 Issue 1-2, January 2004.
    [LMR+08] Lewis Girod, Yuan Mei, Ryan Newton, Stanislav Rost, Arvind Thiagarajan Hari Balakrishnan, Samuel Madden. Xstream: a Signal-Oriented Data Stream Management System. ICDE'08, Cancun, Mexico.
    [M89] Mary E. S. Loomis. Data Management and File Structures. Second edition, Prentice Hall, Inc., 1989.
    [M96] M. Sullivan. Tribeca: A stream databasemanager for network traffic analysis. In Proc. Of the 1996 Intl. Conf. on Very Large Data Bases, page 594, Sept. 1996.
    [Mou00] D. W. Mount. Bioinfomatics: Sequence and Genome Analysis. Cold Spring Harbor, New York, 2000.
    [Mut03] Mut hukrishnan S. Data streams: Algorit hms and applications. In: Proc of t he fourteent h annual ACM2SIAM symposium on discrete algorit hms, 2003. 413-413.
    [MBN+99] S.K. Madria, S.S. Bhowmick, W.K. Ng, and E.P. Lim. Research Issues in Web Data Mining. In Proceedings of the 1st International Conference on Data Warehousing and Knowledge Discovery (DAWAK99), Florence, Italy, August 30-September 3 1999, LNCS 1676, pp. 303-312.
    [MKM02] Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Widmayer P, Ruiz FT, Bueno RM, Hennessy M,Eidenbenz S, Conejo R, eds. Proc. Of the Int'l Colloquium on Automata, Languages and Programming. Malaga: Springer-Verlag, 2002. 693-703.
    [MM02] Manku GS, Motwani R. Approximate frequency counts over data streams. In: Bernstein P, Ioannidis Y, Ramakrishnan R, eds. Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong: Morgan Kaufmann Publishers, 2002. 346-357.
    [MWH02] Yang-Sae Moon, Kyu-Young Whang, and Wook-Shin Han. General match: a subsequence matching method in time-series databases based on generalized windows. Sigmod'02, page 382-393.
    [MWL01] Yang-Sae Moon, Kyu-Young Whang, Woong-Kee Loh. Duality-Based Subsequence Matching in Time-Series Databases. ICDE 2001: 263-272.
    [MS00] B. Masand and M. Spiliopoulou. Workshop on Web usage analysis and user profiling. SIGKDD Explorations, 1(2), 2000.
    [O03] O' Callaghan L. Approximation algorit hms for clustering st reams and large data sets: [Ph D Thesis]. The Department of Computer Science, Stanford University, 2003.
    [OHS+00] K. Otsuka, T. Horikoshi, S. Suzuki, and H. Kojima. Memorybased forecasting for weather image patterns. In Proceedings of the 17th Conference on Artificial Intelligence (AAAI), pages 330-336, July 2000.

    [OKN03] Shigeru Oyanagi, Kazuto Kubota, Akihiko Nakase. Mining WWW Access Sequence by Matrix Clustering.WEBKDD 2002 - MiningWeb Data for Discovering Usage Patterns and Profiles, 4th International Workshop, Edmonton, Canada, July 23, 2002, LNCS2703, 2003, Revised Papers. pp.119-136.

    [PBF03] S. Papadimitriou, A. Brockwell, and C. Faloutsos. Adaptive, hands-off stream mining. In Proceedings of VLDB, pages 560-571, Berlin, Germany, Sept. 2003.

    [PCT+00] Adrian Perrig, Ran Canetti, J.D. Tygar, and Dawn Song. Efficient authentication and signing of multicast streams over lossy channels. In IEEE Symposium on Security and Privacy, May 2000.
    [PCY+00] S. Park, W. W. Chu, J. Yoon, and C. Hsu. Fast retrieval of similar sub-sequences under time warping. In ICDE, pages 23-32, 2000.
    [PCY+03] Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Jungim Won. Similarity search of time-warped subsequences via a suffix tree. Information Systems, Volume 28 Issue 7, October 2003.
    [PHM+00] J. Pei, J.W. Han, B. Mortazavi-asl and H. Zhu. Mining Access Patterns Efficiently from Web Logs. In Proceedings of Pacific Asia Conference on Knowledge Discovery and Data Mining PAKDD 2000, Kyoto, Japan, April 18-20, 2000, LNCS 1805, pp. 396-407.
    [PHM+01] Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, Meichun Hsu. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth. In Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, pp. 215-224.
    [PJF05] Papadimitriou S, Sun J, Faloutsos C. Streaming Pattern Discovery in Multiple TimeSeries. In : Proc of the 31st VLDB Conf ,2005. 697-708.
    [PKC+01] Sanghyun Park, Sang-Wook Kim, June-Suh Cho, Sriram Padmanabhan. Prefix-Querying An Approach for Effective Subsequence Matching Under Time Warping in Sequence Databases[C]. Proceedings of the tenth International conference on Information and knowledge management. New York: ACM Press, 2001:255-262.
    [PLC99] S. Park, D. Lee, W. W Chu. Fast Retrieval of Similar Subsequences in Long Sequence Databases[C]. Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange. Washington: IEEE Computer Society. 1999: 60-67.
    [PS06] Spiros Papadimitriou, Philip S. Yu: Optimal multi-scale patterns in time series streams. SIGMOD Conference 2006: 647-658.
    [PSF05] Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos. Streaming Pattern Discovery in Multiple Time-Series. VLDB 2005: 697-708.
    [RPS98] Henzinger M R, Raghavan P, Rajagopalan S. Computing on data streams. SRC Technical Note 1998-011 Digital systems research center: Palo Alto, California, 1998.
    [Sri89] Sridharan N.S, Editor, Proceedings of the 1989 International Joint Conference on Artificial Intelligence, August 1989.
    [Spi99] M. Spiliopoulou. Data mining for the Web. In Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD'99, pages 588-589,1999.
    [SA96] R.Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT'96), Avignon, France, Mar. 1996, pp. 3-17.
    [SCD+00] Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 2000, pp. 12-23.
    [SFY07]Yasushi Sakurai,Christos Faloutsos,Masashi Yamamuro:Stream Monitoring under the Time Warping Distance.ICDE 2007:1046-1055.
    [SL07]孙玉芬,卢炎生.流数据挖掘综述.计算机科学,Vol34,No.1,2007.
    [SPF05]Yasushi Sakurai,Spiros Papadimitriou,Christos Faloutsos.BRAID:Stream Mining through Group Lag Correlations.SIGMOD Conference 2005:599-610.
    [SPP+06]Sharmila Subramaniam,Themis PaLpanas,Dimitris Papadopoulos,Vana Kalogeraki,Dimitrios Gunopulos:Online Outlier Detection in Sensor Data Using Non-Parametric Models.VLDB 2006:187-198.
    [SPY06]Jimeng Sun,Spiros Papadimitriou,Philip S.Yu.Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams.ICDM 2006:1076-1080.
    [SR96]R.Srikant and R.Agrawal.Mining sequential pattems:Generalizations and performance improvements.In Proc.5th Int.Conf.Extending Database Technology(EDBT'96),Avignon,France,Mar.1996,pp.3-17.
    [SRF87]Timos Sellis,Nick Roussopoulos,Christos Faloutsos.The R+-Tree:A Dynamic Index For Multi-dimensional Objects.International Conference of Very Large Data Bases,Brighton,England,August 1987.
    [SYF05]Yasushi Sakurai,Masatoshi Yoshikawa,Christos Faloutsos.FTW:fast similarity search under the time warping distance.PODS 2005:326-337.
    [SYZL03]Yi-Dong Shen,Qiang Yang,Zhong Zhang,Hongjun Lu.A Graph-based Optimization Algorithm for Website Topology Using Interesting Association Rules,PAKDD 2003,pp.178-190.
    [TK01]Pang-Ning Tan,Vipin Kumar.Mining Indirect Associations in Web Data.WEBKDD 2001,San Francisco,CA,USA,August 26,LNCS 2356,Publisher:Springer-Verlag Heidelberg,January 2002,pp.145-166.
    [UHR+91]U.Schreier,H.Pirahesh,R.Agrawal,and C.Mohan.Alert:An architecture for transforming a passive DBMS into an active DBMS.In Proc.of the 1991 Intl.Conf.on Very Large Data Bases,pages 469-478,Sept.1991.
    [WDR06]Eugene Wu,Yanlei Diao,Shariq Rizvi:High-performance complex event processing over streams.SIGMOD Conference 2006:407-418.
    [WK02]G Widmer,M Kubatl Learning in the presence of concept drift and hidden contexts.Machine Learning,996,23(1):69-101.
    [WLY+07]王涛,李舟军,颜跃进,陈火旺.数据流挖掘分类技术综述.计算机研究与发展,44(11):1809-1815,2007.
    [WLZ+07]王伟平,李建中,张冬冬,郭龙江.一种有效的挖掘数据流近似频繁项算法.软件学报,Vol.18,No.4,April 2007,pp.884-892.
    [WNH04]E Edmond HaoCun Wu,Michael K.Ng,Joshua Zhexue Huang:On Improving Website Connectivity by Using Web-Log Data Streams.DASFAA 2004,pp.352-364.
    [WTH+02]Ke Wang,Liu Tang,Jiawei Han,Junqiang Liu.Top Down FP-Growth for Association Rule Mining.PAKDD 2002:pp.334-340.
    [WWG+05]Angeline Wong,Leejay Wu,Phillip B.Gibbons,Christos Faloutsos.Fast estimation of fractal dimension and correlation integral on stream data.Inf.Process.Lett.93(2):91-97(2005).
    [XYL+02]Yabo Xu,Jeffrey Xu Yu,Guimei Liu,Hongjun Lu.From Path Tree To Frequent Patterns:A Framework for Mining Frequent Patterns.ICDM 2002,pp.514-521.
    [YC00a]C.-H Yun and M.-S.Chen,Mining Web Transaction Patterns in an Electronic Commerce Environment,Proc.of the 4th Pacific-Asia Conf.on Knowledge Discovery and Data Mining,April 18-20,2000,pp.216-219.
    [YC00b]C.-H.Yun,M.S.Chen.Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment,Proc.of the 24th annual Intern'l Computer Software and Application Conference(COMPSAC-2000),October 25-27,2000,pp.505-510.
    [YF00]Byoung-Kee Yi,Christos Faloutsos.Fast Time Sequence Indexing for Arbitrary Lp Norms.VLDB 2000:385-394.
    [YGY02]杨怡玲,管旭东,尤晋元.基于页面内容和站点结构的页面聚类挖掘算法.软件学报,2002 Vol.13 No.3,pp.467-469.
    [YHA03]X.Yan,J.Han,and R.Afshar.CloSpan:Mining closed sequential patterns in large datasets.In Proc.2003 SIAM Int.Conf.Data Mining (SDM'03),p.166-177,San Fransisco,CA,May 2003.
    [YJ03]Y Yao,J Gehrke.Query Processing for SensorNetworks.Proc Conf.on Innovative Data Syst.Res,2003.2332244.
    [YJC98]B.-K.Yi,H.V.Jagadish,and C.Faloutsos.Efficient retrieval of similar time sequences under time warping.In Proceedings of ICDE,pages 201-208,February 1998.
    [YJF98]B.Yi,H.V.Jagadish,and C.Faloutsos.Efficient Retrieval of Similar Time Sequences Under Time Warping[C].Proceedings of the Fourteenth International Conference on Data Engineering.Washington:IEEE Computer Society.1998:201-208.
    [YPR02]Hui Yang,Srinivasan Parthasarathy,Sandeep Reddy.On the use of constrained association rules for web mining.In Proceedings of WEBKDD2002,July 23,2002,Edmonton,Alberta,Canada.
    [YS02]Zhu Y,Shasha D.StatStream:Statistical Monitoring of Thousands of Data Streams in Real Time.In:Proc of the 28th VLDB Conf,2002.358-369.
    [YW01]Jun Yang,Jennifer Widom.Incremental Computation and Maintenance of Temporal Aggregates.ICDE 2001:51-60.
    [Zak01]M.Zaki.SPADE:An efficient algorithm for mining frequent sequence.Machine Learning,40:31-60,2001.
    [ZKM01]Zijian Zheng,Ron Kohavi,Llew Mason.Real world performance of association rule algorithms.KDD 2001,pp.401-406.
    [ZS02]Y.Zhu and D.Shasha.Statistical monitoring of thousands of data streams in real time.In Proceedings of VLDB,pages 358-369,Hong Kong,China,Aug.2002.
    [ZS03a]Yunyue Zhu,Dennis Shasha.Warping Indexes with Envelope Transforms for Query by Humming.Proceedings of ACM SIGMOD Conference 2003:181-192.
    [ZS03b]Yunyue Zhu,Dennis Shasha.Efficient elastic burst detection in data streams.KDD 2003:336-345.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700