用户名: 密码: 验证码:
一种基于Shapelets的懒惰式时间序列分类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A Novel Lazy Time Series Classification Algorithm Based on the Shapelets
  • 作者:王志海 ; 张伟 ; 原继东 ; 刘海洋
  • 英文作者:WANG Zhi-Hai;ZHANG Wei;YUAN Ji-Dong;LIU Hai-Yang;School of Computer and Information Technology,Beijing Jiaotong University;
  • 关键词:时间序列 ; 懒惰式学习 ; 分类 ; shapelets ; 可解释性
  • 英文关键词:time series;;lazy learning;;classification;;shapelets;;interpretability
  • 中文刊名:JSJX
  • 英文刊名:Chinese Journal of Computers
  • 机构:北京交通大学计算机与信息技术学院;
  • 出版日期:2018-07-19 11:06
  • 出版单位:计算机学报
  • 年:2019
  • 期:v.42;No.433
  • 基金:国家自然科学基金(61672086,61702030,61771058);; 中国博士后科学基金(2018M631328);; 中央高校基本科研业务费专项资金(2017YJS036);; 北京市自然科学基金(4182052)资助~~
  • 语种:中文;
  • 页:JSJX201901003
  • 页数:15
  • CN:01
  • ISSN:11-1826/TP
  • 分类号:31-45
摘要
近些年,时间序列分类问题研究受到了越来越多的关注.基于shapelets的时间序列分类技术是一种有效的方法.然而,其在提取最优shapelet的过程中要建立包含大量冗余元素的候选shapelets集合,一般所获得的shapelets只在平均意义上具有某种鉴别性;与此同时,普通模型往往忽略了待分类实例所具有的局部特征.为此,我们提出了一种依据待分类实例显著局部特征的懒惰式分类模型.这种模型为每个待分类实例构建各自的数据驱动的懒惰式shapelets分类模型,从而逐步缩小了与其分类相关的时间序列搜索空间,使得所获得的shapelets能够直接反映待分类实例的显著局部特征.实验结果表明该文提出的模型具有较高的准确率和更强的可解释性.
        In order to discover the characteristics of data and explain the prediction process of classification model,the study of interpretable model has become increasingly prevalent in recent years.In reality,we can get massive time series data in many fields,such as weather forecast,medical monitoring,and anomaly detection.Time series classification is an important research field of time series data mining.Time series is different from the traditional attribute vector data,and it has no explicit attributes.Even with the sophisticated feature selection techniques,the dimensionality of potential feature space is still beyond the acceptable range.This poses a challenge to learn an accurate classification model with strong interpretability.Since shapelet is a new primitive that can be used to construct interpretable model,time series classification based on shapelet has recently attracted considerable interest.Shapelet-based classification algorithm is a typical shapebased algorithm.Shapelet can help us give a high sight on the local discriminative features of time series.According to the usage of shapelet,the shapelet-based models can be divided into two categories.One type method establishes a much smaller yet more discriminative feature set through the top-kshapelets to transform the origin dataset.Furthermore,traditional classification algorithms can be applied on the converted low-dimensional dataset.The other employs selected shapelets to build the classification model directly.However,these global shapelet-based models have some obvious shortcomings.First,the global model always needs to create a candidate shapelet set which contains a large number of redundant elements in the process of extracting thebest shapelet.Due to the impact of redundant instances and intra-class variation,the extracted shapelets are merely good for the training instances in the average sense.The established shapeletbased model may not be suitable and efficient for the test cases.Second,the shapelets obtained may be from different instances or approximate solutions,which cannot indicate the local characteristics of the test case exactly.Third,since the class value of the local features from the test case is unknown,the characteristics of test cases are always ignored.In order to solve the above problems,a data driven local model based on shapelets for each test case is proposed.In our model,instead of global similarity,local similarity is considered as the basis for classification.The local features of the test case are evaluated directly to find the most discriminative shapelet.And then the shapelet is used to reduce the searching space of class attribute value progressively.Since the shapelets are from the test example,they directly reflect the salient local features of the test case and can answer the question why the model assigns a certain class value to the instance.Meanwhile,in the shapelet evaluation progress,instances are selected to reduce the impact of redundant instances and intra-class variation.The lazy classification model presented in this paper is compared with two shapelet decision tree models,1NN models based on different distance functions,and C4.5 models based on different top-k shapelets transformation algorithms.Experimental results show that the proposed model has higher accuracy and stronger interpretability.
引文
[1]McGovern A,Rosendahl D H,Brown R A,et al.Identifying predictive multi-dimensional time series motifs:An application to severe weather prediction.Data Mining and Knowledge Discovery,2011,22(1-2):232-258
    [2]Patri O,Wojnowicz M,Wolff M.Discovering malware with time series shapelets//Proceedings of the 50th Hawaii International Conference on System Sciences.Hawaii,USA,2017:6079-6088
    [3]Zhu L,Lu C,Sun Y.Time series shapelet classification based online short-term voltage stability assessment.IEEETransactions on Power Systems,2016,31(2):1430-1439
    [4]Burkom H S,Murphy S P,Shmueli G.Automated time series forecasting for biosurveillance.Statistics in Medicine,2007,26(22):4202-4218
    [5]Zhong S,Khoshgoftaar T M,Seliya N.Clustering-based network intrusion detection.International Journal of Reliability,Quality and Safety Engineering,2007,14(2):169-187
    [6]Xing Z,Pei J,Keogh E.A brief survey on sequence classification.ACM SIGKDD Explorations Newsletter,2010,12(1):40-48
    [7]Ding H,Trajcevski G,Scheuermann P,et al.Querying and mining of time series data:Experimental comparison of representations and distance measures.Proceedings of the VLDB Endowment,2008,1(2):1542-1552
    [8]Keogh E,Kasetty S.On the need for time series data mining benchmarks:A survey and empirical demonstration.Data Mining and Knowledge Discovery,2003,7(4):349-371
    [9]Antunes C M,Oliveira A L.Temporal data mining:An overview//Proceedings of the KDD Workshop on Temporal Data Mining.San Francisco,USA,2001:1-13
    [10]Bagnall A,Lines J,Bostrom A,et al.The great time series classification bake off:A review and experimental evaluation of recent algorithmic advances.Data Mining and Knowledge Discovery,2017,31(3):606-660
    [11]Ye L,Keogh E.Time series shapelets:A new primitive for data mining//Proceedings of the 15th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.Paris,France,2009:947-956
    [12]Lines J,Davis L M,Hills J,et al.A shapelet transform for time series classification//Proceedings of the 18th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.Beijing,China,2012:289-297
    [13]Hills J,Lines J,Baranauskas E,et al.Classification of time series by shapelet transformation.Data Mining and Knowledge Discovery,2014,28(4):851-881
    [14]Yuan J D,Wang Z H,Han M.A discriminative shapelets transformation for time series classification.International Journal of Pattern Recognition and Artificial Intelligence,2014,28(6):1-28
    [15]Yuan Ji-Dong,Wang Zhi-Hai,Han Meng.Shapelet pruning and shapelet coverage for time series classification.Journal of Software,2015,26(9):2311-2325(in Chinese)(原继东,王志海,韩萌.基于Shapelet剪枝和覆盖的时间序列分类算法.软件学报,2015,26(9):2311-2325)
    [16]Yuan Ji-Dong,Wang Zhi-Hai,Han Meng,et al.A logical Shapelets transformation for time series classification.Chinese Journal of Computers,2015,38(7):1448-1459(in Chinese)(原继东,王志海,韩萌等.基于逻辑Shapelets转换的时间序列分类算法.计算机学报,2015,38(7):1448-1459)
    [17]Zalewski W,Silva F,Maletzke A G,et al.Exploring shapelet transformation for time series classification in decision trees.Knowledge-Based Systems,2016,112:80-91
    [18]Mueen A,Keogh E,Young N.Logical-shapelets:An expressive primitive for time series classification//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego,USA,2011:1154-1162
    [19]Rakthanmanon T,Keogh E.Fast shapelets:A scalable algorithm for discovering time series shapelets//Proceedings of the 13th SIAM International Conference on Data Mining.Austin,USA,2013:668-676
    [20]Grabocka J,Schilling N,Wistuba M,et al.Learning timeseries shapelets//Proceedings of the 20th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.New York,USA,2014:392-401
    [21]Karlsson I,Papapetrou P,Bostr9m H.Generalized random shapelet forests.Data Mining and Knowledge Discovery,2016,30(5):1053-1085
    [22]Rakthanmanon T,Campana B,Mueen A,et al.Searching and mining trillions of time series subsequences under dynamic time warping//Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.Beijing,China,2012:262-270
    [23]Berndt D J,Clifford J.Using dynamic time warping to find patterns in time series//Proceedings of the KDD Workshop.Seattle,USA,1994:359-370
    [24]Ratanamahatana C A,Keogh E.Everything you know about dynamic time warping is wrong//Proceedings of the 3rd Workshop on Mining Temporal and Sequential Data.Seattle,USA,2004:1-11
    [25]Keogh E,Ratanamahatana C A.Exact indexing of dynamic time warping.Knowledge and Information Systems,2005,7(3):358-386
    [26]Cuturi M,Blondel M.Soft-DTW:A differentiable loss function for time-series//Proceedings of the 34th International Conference on Machine Learning.Sydney,Australia,2017:894-903
    [27]Dem2ar J.Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research,2006,7(1):1-30
    (1)Chen Y P,Keogh E,et al.The UCR time series classification archive.http://www.cs.ucr.edu/~eamonn/time_series_data/,2015,10,8

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700