用户名: 密码: 验证码:
挖掘概率频繁模式恢复不确定RFID数据流
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
射频识别(RFID)技术是一种非接触自动识别技术,该技术凭借标签体积小、成本低、非接触识别、自动识别等特点,已广泛应用于多个领域。但是,由于易受外部环境的干扰和射频信号的不稳定,阅读器所产生的数据通常是不可靠、不完全和有噪声,其主要表现为漏读和交错读现象。如何正确恢复RFID数据流中的数据已成为RFID中间件急需解决的重要课题。
     该文首先对RFID数据流的特征进行了详细分析,指出RFID数据具有简单性、关联性、时态性和空间性、不确定性(包括漏读和交错读),并对相关技术包括数据流窗口模型、数据流挖掘技术、RFID数据流恢复技术等方面的研究现状进行了详细介绍与分析。
     为了更准确地恢复RFID不确定数据,该文提出了一种综合考虑多种因素的RFID不确定数据流恢复方法。该方法主要包括三个步骤:
     第一,运用基于PFP-tree树的不确定数据流挖掘方法挖掘RFID数据流中的概率频繁模式。该方法通过设置数据项索引表和事务索引表,能够快速有效地进行频繁模式挖掘。该方法还运用了衰减窗口模型,通过区分“老”事务和“新”事务的贡献度,保证了该方法具有较高的模式召回率。
     第二,基于所挖掘的概率频繁模式,通过计算标签对象之间的关联度,确定最大关联标签。而且,通过计算概率相似运动轨迹,确定最相似运动轨迹标签。基于这些信息,采用基于多元统计分析的不确定RFID数据流恢复方法——RR方法进行恢复。该方法综合考虑了标签的当前窗口信息、标签前一窗口信息、最大关联标签信息及最相似轨迹标签信息等四个因素,根据标签矢量的欧式距离判定标签的真实位置。
     第三,由于RR方法可能存在误判现象,我们通过统计标签与阅读器的拒真率和交错读率,运用贝叶斯修正方法校正标签位于阅读器范围的概率,为后续窗口的准确恢复提供更为准确的后验概率信息。
     最后,通过大量的实验分析和比较,证明基于PFP-tree树的概率模式挖掘方法比基于SUF-growth的方法具有更好的性能,以及所提出的RR恢复方法比SIS方法具有更高的恢复准确率。
RFID is a non-contact and automatic identification technology and has been widely used in many fields, owing to the properties of small size, low-cost, non-contact, automatic identification and so on. Because of the susceptible to the external environment and the instability of RF signals, the original data detected by RFID readers is always unreliable, incomplete and noise such as missing-reads and cross-reads. Therefore, how to recovery these data from RFID data steams has become an urgent and important topic in RFID middleware area.
     Firstly, the paper analyzes the characteristics of RFID data streams in details, and indicates that RFID data is simple, relevance, temporal, spatial and uncertain. Then, it introduces and analyses related technologies including window models of data streams, mining technologies on data streams, recovery methods on RFID data streams, and so on.
     In order to recover the uncertain RFID data more accurately, the paper proposes a new RFID data streams recovery method which takes several various factors into consideration comprehensively. The method mainly includes three steps:
     The first step is to make use of an uncertain data streams mining method based on PFP-tree to mine probability frequent patterns. By setting a transaction-index table and an item-index table, the method can find the frequent patterns quickly and efficiently. The method uses a damping window model to distinguish the contributions of the data generated by new transactions and the data by historic transactions, to guarantees high recall rate of frequent patterns.
     The second step is to determine the maximum relevance item using the mining results from PFP-tree, and the maximal similar trajectories item through computing similar trajectories for a specific item. Based on these information, a discriminate multivariate statistical analysis method—RR method is used to recovery RFID data, using a vector including the item's location information in current window and the information in previous window, the location information of the maximal relevance item, and the information of the maximal similar trajectories item. On the basis of these factors proposed, the true place of the tagged item can be found by computing the Euclidean distance of the vector.
     The third step is to use a Bayesian-based correction method, using the ratios of missing-reads and cross-reads, to correct the probability of items in the reading ranges of readers, to provide more accurate posterior probability for next windows.
     Finally, by experiment analysis and comparison, it has been proved that the performance of PFP-tree is better than that of SUF-growth, and RR recovery method can get higher recovery accurate rate than SIS method.
引文
[1]射频工作组:http://www.rfidinfo.com.cn/.
    [2]http://finance.sing.com.cn.RFID:小芯片中的大机会.2005.05.01.
    [3]孙剑,陈琪明.RFID中间件在世界及中国的发展现状.中国物流产品网.2007,2:98-101
    [4]B. Babcock, S. Babu, M. Datar, et al. Models and Issues in Data Streams. in:Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, Wisconsin.2002. New York, NY, USA:ACM,2002.1-16.
    [5]C. Floerkemeier, M. Lampe. Issues with RFID Usage in Ubiquitous Computing Applications. Lecture Notes in Computer Science,2004,3001/2004:188-193.
    [6]S. Jeffery, M. Garofalakis, M. Franklin. Adaptive cleaning for RFID data streams. in:Proceedings of the 32nd International Conference on Very Large Data Bases. Seoul, Korea.2006. USA:VLDB Endowment,2006,163-174.
    [7]Kostas Patroumpas and Timos Sellis, Window Update Patterns in Stream Operators. http://www.springerlink.com/content/51 u71 hg4g60133u6/
    [8]Metwually, D. Agrawal, and A. El Abbadi. Efficient computation of frequent and top-k elements in data streams. in:Proceedings of 10th International Conference on Database Theory. DATABASE THEORY-ICDT 2005,398-412.
    [9]Gurmeet Singh Manku, Rajeev Motwani. Approximate Frequency Counts over Data Streams. Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Hong Kong SAR, China,2002,346-357.
    [10]E. D. Demaine, A. L'opez-Ortiz, and J. I Munro. Frequency estimation of Internet packet LNCS streams with limited space. In Proc.10th European Sympos. Algorithms,2461,2002,348-360.
    [11]GS.Manku, R.Motwani. Approximate Frequency Counts over Data Streams. In:Proceedings of the 28th International Conference on Tlery Large Data Bases. Hong Kong, Morgan Kaufmann,2002, 346-357.
    [12]JH.Chang, WS.Lee. estWin:Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In:Proceedings of the twelfth International Conference on Information and Knowledge Management, New Orleans, USA:ACM,2003,536-539.
    [13]H.Li, S.Lee, M.Shan. An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In:Proceedings of the first International Workshop on Knowledge Discovery in Data Streams. Pisa, Italy,2004,20-24.
    [14]H.Li, S.Lee, M.Shan. Online Mining (Recently) Maximal Frequent Itemsets over Data Streams. In: Proceedings of the fifteenth International Workshops on Research Issues in Data Engineering: Stream Data Mining and Applications, Tokyo, Japan:IEEE,2005,11-18.
    [15]C.Jin, W.Qian, C.Sha, JX.Yu, A.Zhou. Dynamically Maintaining Frequent Items over a Data Stream. In:Proceedings of the 2003 ACM CIKM International'Conference on Information and Knowledge Management. New Orleans:ACM,2003,287-294.
    [16]JH.Chang, WS.Lee. Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, USA:ACM.2003,487-492.
    [17]C.Giannella, J.Han, J.Pei, X.Yan, PS.Yu.Mining Frequent Patterns in Data Streams at Multiple Time Granularities.Data Mining:Next Generation Challenges and Future Directions,2003, 191-212.
    [18]JH.Chang, WS.Lee.estWin. Adaptively Monitoring the Recent Change of Frequent Itemsets over Online Data Streams. In:Proceedings of the twelfth International Conference on Information and Knowledge Management, New Orleans, USA:ACM,2003,536-539.
    [19]Y.Chi, H.Wang, PS.Yu, RR.Muntz. Moment:Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In:Proceedings of the fourth International Conference on Data Mining, Brighton, UK:IEEE,2004,59-66.
    [20]H.Chang, WS.Lee. A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering,2004,20(4):753-762.
    [21]CH.Lin, DY.Chiu, YH.Wu, ALP.Chen, T. Hsinchu. Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In:Proceedings of the fifth SIAM International on Data Mining, Newport Beach, USA,2005,486-491.
    [22]A.Arasu, GS.Manku. Approximate Counts and Quantiles over Sliding Windows.In:Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Paris, France:ACM,2004,286-296.
    [23]S.Jeffery, M.Garofalakis, and M.Franklin. Adaptive cleaning for RFID data streams. Proceedings of the 32nd international conference on Very large data bases,2006,163-174.
    [24]Shawn R. Jeffery, Michael J. Franklin, Minos N. Garofalakis:An adaptive RFID middleware for supporting metaphysical data independence. VLDB Journal.2008,17(2):265-289.
    [25]R. Agrawal, T. Imielinski, and A. Swami, Mining association rulesbetween sets of items in large databases, in Proc. ACM SIGMOD 1993,207-216.
    [26]J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation. in Proc. ACM SIGMOD 2000,1-12.
    [27]M.Garofalakis, J.Gehrke and R.Rastogi. Querying and Mining Data Streams:You Only Get One Look[C]. In the tutorial notes of VLDB,2002,635.
    [28]C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, Mining frequent patterns in data streams at multiple time granularities, in Data Mining:Next Generation Challenges and Future Directions, ch. 6, AAAI/MIT,2004,457-461.
    [29]Arasu A, Manku G S. Approximate counts and quantiles over sliding windows. In:Proceedings of the23rd ACM SIGMOD—SIGACT-SIGART Symposium on Principles ofdatabase systems. Paris, France:ACM,2004.286-296.
    [30]Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams. Theoretical Computer Science,2004,312(1):58-75.
    [31]C.Estan and G.Varghese. New Directions in Traffic Measurement and Accounting. In proc. of SIGCOMM,2002,323-336.
    [32]刘学军,徐宏炳,董逸生等.挖掘数据流中的频繁模式.计算机研究与发展,2005,42(12):2192-2198.
    [33]刘学军,徐宏炳,策逸生等.基于滑动窗内的数据流闭合频繁模式的挖掘.计算机研究与发展,2006,43(10):1738-1743.
    [34]张昕,李晓光,土大玲等.数据流中一种快速启发式频繁模式挖掘方法.软件学报,2005,16(12):2099-2105.
    [35]朱琼,施荣华.一种数据流中的频繁模式挖掘算法.计算机应用,2008,28(6):1463-1469.
    [36]Teng W.G. et al. A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: Proceedings of the International Conference on Very Large Data Bases. Berlin. Germany.2003. 93-104.
    [37]任家东,李可,冯佳音,杨楠.在分布式数据流中查找近期频繁项方法的研究.计算机科学,2008,35(3):206-208.
    [38]李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式.软件学报,2008,19(10):585-2596.
    [39]李国徽,程远国.传感器网络中频繁移动模式挖掘算法研究.小型微型计算机系统,2008,29(6):1015-1019.
    [40]C.-K. Chui, B. Kao, and E. Hung, Mining frequent itemsets from uncertain data, in:Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery Data Mining. Osaka, Japan:IEEE ICDM Workshops,2007,47-58.
    [41]X. Dai, M. L. Yiu, N. Mamouils, Y. Tao, and M. Vaitis, Probabilistic spatial queries on existentially uncertain data, in:Proceedings of the 9th International Symposium on Spatial and Temporal Databases, Angra dos Reis, Brazil,2005,400-417.
    [42]C. K.-S. Leung, C. L. Carmichael, and B. Hao, Efficient mining of frequent patterns from uncertain data, in:Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery Data Mining. Osaka, Japan:IEEE ICDM Workshops,2007,489-494.
    [43]C. K.-S. Leung, M. A. F. Mateo, and D. A. Brajczuk, A tree-based approach for frequent pattern mining from uncertain data, in:Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining.2008. Osaka, Japan:IEEE Computer Society,2008,653-661.
    [44]Carson Kai-Sang Leung, Boyu Hao. Mining of Frequent Itemsets from Streams of Uncertain Data, in:Proceedings of the 25th International Conference On Data Engineering, shanghai, China:IEEE International Conference on Data Engineering,2009,1663-1670.
    [45]C.-K. Chui, B. Kao, and E. Hung, Mining frequent itemsets from uncertain data, in:Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery Data Mining. Omaha, Nebraska, USA: IEEE ICDM Workshops,2007,47-58.
    [46]C.K-S. Chui and B. Kao, A decremental approach for mining frequent itemsets from uncertain data,in Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining.2008. Osaka, Japan:IEEE Computer Society,2008,64-75.
    [47]C.K.-S. Leung, C. L. Carmichael, and B. Hao, Efficient mining of frequent patterns from uncertain data. in:Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery Data Mining. Omaha, Nebraska, USA:IEEE ICDM Workshops,2007,489-494.
    [48]李建中,于戈,周傲英,不确定性数据管理的要求与挑战.中国计算机学会通讯,2009,5(4):6-14.
    [49]S. Jeffery, M. Garofalakis, M. Franklin. Adaptive cleaning for RFID data streams. in:Proceedings of the 32nd International Conference on Very Large Data Bases. Seoul, Korea.2006. USA:VLDB Endowment,2006,163-174.
    [50]Shawn R. Jeffery, Michael J. Franklin, Minos N. Garofalakis:An adaptive RFID middleware for supporting metaphysical data independence. VLDB Journal.2008,17(2):265-289.
    [51]J. Xie, J. Yang, Y. Chen, et al. A Sampling-Based Approach to Information Recovery. in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering. Cancun, Mexico. 2008. Washington, DC, USA:IEEE Computer Society,2008,476-485.
    [52]R. Jeffrey, G. Alonso, M. Franklin, et al. A Pipelined Framework for Online Cleaning of Sensor Data Streams. in:Proceedings of the 22nd International Conference on Data Engineering. Atlanta, Georgia, USA.2006. Washington, DC, USA:IEEE Computer Society,2006,773-778.
    [53]Hector Gonzalez, Jiawei Han, Xuehua Shen. Cost-conscious Cleaning of Massive Data Sets, in Proc.2007 Int. Conf. on Data Engineering, Istanbul, Turkey,2007,1268-1272.
    [54]Y. Bai, F. S. Wang, P. Liu. Efficiently Filtering RFID Data Streams. in:Proceedings of the First International VLDB Workshop on Clean Databases. Seoul, Korea.2006. USA:VLDB Endowment, 2006,50-57.
    [55]谷峪,于戈,胡小龙,王义.基于监控对象动态聚簇的高效RFID数据清洗模型.软件学报,2010,21(4):632-643.
    [56]谷峪,于戈,李晓静,王义.基于动态概率路径时间模型的RFID数据填补算法.软件学报,2010,21(3):438-451.
    [57]陈远,李战怀,陈群.不可靠RFID数据上的复杂事件处理研究.计算机应用研究,2009,26(7):2537-2540.
    [58]谷峪,郭娜,于戈.基于移动阅读器的RFID概率空间范围查询技术的研究.计算机学报,2009,32(10):2052-2066.
    [59]徐东彬,黄磊,刘昌平.自适应核密度估计运动检测方法.自动化学报,2009,35(4):379-376.
    [60]马岩,张延园,尹方鸣.基于滑动窗口的RFID数据流多标签清洗算法.科学技术与工程,2009,9(05):1165-1172.
    [61]廖国琼,李晶,基于距离的分布式RFID数据流孤立点检测.计算机研究与发展,2010,47(5):930-939.
    [62]http://news.rfidworld.com.cn/2007_7/2007710131466020.html
    [63]王力宾.多元统计分析:模型、案例及SPSS应用.经济科学出版社,北京.2010.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700