用户名: 密码: 验证码:
一种基于K近邻和多元回归的传感器缺失值预测算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A KNN and Multiple Regression Based Sensor Missing Value Estimation Algorithm
  • 作者:关伟 ; 李先通
  • 英文作者:GUAN Wei;LI Xian-tong;Research Institute of Highway, Ministry of Transport;School of Transportation Science and Engineering, Harbin Institute of Technology;
  • 关键词:道路工程 ; 缺失值预测 ; K近邻 ; 大数据 ; 多元回归
  • 英文关键词:road engineering;;missing value evaluation;;K Nearest Neighbor(KNN);;big data;;multiple regression
  • 中文刊名:GLJK
  • 英文刊名:Journal of Highway and Transportation Research and Development
  • 机构:交通运输部公路科学研究院;哈尔滨工业大学交通科学与工程学院;
  • 出版日期:2019-03-15
  • 出版单位:公路交通科技
  • 年:2019
  • 期:v.36;No.291
  • 基金:“十三五”国家重点研发计划项目(2017YFC0840200)
  • 语种:中文;
  • 页:GLJK201903003
  • 页数:8
  • CN:03
  • ISSN:11-2279/U
  • 分类号:18-25
摘要
传感器是当前公路工程中数据采集与监控的主要手段之一,然而在使用过程中时常出现的缺失值严重影响了传感器的监测效果及后续数据分析。目前多数传感器缺失值预测算法在设计时利用了传感器间的空间相关性或该传感器自身的时间相关性,具有一定的预测效果。KMRA算法(K-Nearest-Neighbor on Multiple Regression Algorithm)则采用了空间相关性及时间相关性结合预测的方法,不但大幅提高了预测准确率及算法的效率,同时具有更高的实用价值。当传感器v在时刻t出现缺失值时,KMRA首先确定v与邻居之间的相关系数,选择其中K个与v相关度最高的邻居节点,利用其相关系数进行t时刻的空间相关性预测,并形成空间相关性预测结果。其次,算法利用传感器v在监测过程中产生的时间序列,选取q/2个与t时刻相邻的数值,并分别设置不同的偏相关系数,通过多元回归的方法将偏相关系数与q个取值进行时间相关性预测。最后,在分别取得时、空相关性预测结果的基础上,算法通过样本决定系数将空间与时间两部分预测结果有机整合,形成最终预测结果。算法的试验在真实数据集上展开,将数据集中的实际数据作为缺失值进行预测,并与原数据比较以验证预测算法的准确率。在试验过程中,与其他相关算法进行了比较,试验结果显示,该算法在得出准确预测结果的同时,还能在效率上获得较大提高。
        Sensor is one of the most important tools for data collection and monitoring in highway engineering. However, the missing value during its work seriously affects its monitoring and subsequent data analyzing. Up to now,most forecasting algorithms of sensor missing value use the spatial correlations between sensors or their own temporal correlations, which has certain prediction effect. KNN on Multiple Regression Algorithm(KMRA) combines both spatial and temporal correlations together. This combination not only improved evaluation precision and the algorithm efficiency, but also has higher practical value. First, when sensor v produced a missing value at moment t, KMRA finds out the correlation factor between v and neighbors. After K nearest neighbors' nodes which have highest correlation with v are picked out, KMRA evaluates the spatial correlations at time t by using the correlation factors, and it gets the spatial correlation evaluation. Second, this algorithm picks up q/2 closest values to moment t from the temporal sequence which produced by sensor v itself during monitoring, and sets partial correlation coefficient to every value in this sequence. The time correlation between partial correlation coefficient and q values is forecasted by multiple regression. At last, based on spatial and temporal forecasting steps separately, the final evaluation value is formed by the combination of temporal and spatial values through sample determination factor. The experiments are carried out on real datasets. They use realization data as the missing values for evaluation, and the evaluation will be compared with the original data to verify the precision of the algorithm. The result of KMRA is compared with other missing value forecasting algorithms during these experimentations. The result shows that KMRA can produce precise values and it runs under a better efficiency than other algorithms, too.
引文
[1]敖道朝,李国维,李临生,等.基于传感器和无线模式的高速公路边坡自动化监测系统[J].公路交通科技,2015,32(11):41-47.AO Dao-zhao,LI Guo-wei,LI Lin-sheng,et al.Automonitoring System of Expressway Slope Based on Sensor and Wireless Modes[J].Journal of Highway and Transportation Research and Development,2015,32(11):41-47.
    [2]张倍阳,张谢东,陈卫东,等.基于嵌套层迭遗传算法的大跨桥梁传感器优化布置[J].武汉理工大学学报:交通科学与工程版,2016,40(4):745-749.ZHANG Bei-yang,ZHANG Xie-dong,CHEN Wei-dong,et al.Sensor Location Optimization of Large Span Bridge Based on Nested-stacking Genetic Algorithm[J].Journal of Wuhan University of Technology:Transportation Science&Engineering Edition,2016,40(4):745-749.
    [3]YI C,KIM L P.An Accurate and Robust Missing Value Estimation for Microarray Data:Least Absolute Deviation Imputation[C]//2006 5th International Conference on Machine Learning and Applications.Orlando,USA:IEEE,2006:1-5.
    [4]BATISTA G E A P A,MONARD M C.An Analysis of Four Missing Data Treatment Methods for Supervised Learning[J].Applied Artificial Intelligence,2003,17(5/6):519-533.
    [5]LIU C C,DAI D Q,YAN H.The Theoretic Framework of Local Weighted Approximation for Microarray Missing Value Estimation[J].Pattern Recognition,2010,43(8):2993-3002.
    [6]ZHANG R,XU Z B,HUANG G B,et a1.Global Convergence of Online BP Training with Dynamic Learning Rate[J].IEEE Transactions on Neural Networks and Learning Systems,2012,23(2):330-341.
    [7]PAN L,GAO H,LIU Y.A Spatial Correlation Based Adaptive Missing Data Estimation Algorithm in Wireless Sensor Networks[J].International Journal of Wireless Information Networks,2014,21(4):280-289.
    [8]PAN L Q,LI J Z,LUO J Z.A Temporal and Spatial Correlation Based Missing Values Imputation Algorithm in Wireless Sensor Networks[J].Chinese Journal of Computers,2010,33(1):1-11.
    [9]潘立强,李建中.传感器网络中一种基于多元回归模型的缺失值估计算法[J].计算机研究与发展,2009,46(12):2101-2110.PAN Li-qiang,LI Jian-zhong.A Multiple-regressionmodel-based Missing Values Imputation Algorithm in Wireless Sensor Network[J].Journal of Computer Research and Development,2009,46(12):2101-2110.
    [10]许可,雷建军.基于属性相关性的无线传感网络缺失值估计方法[J].计算机应用,2015,35(12):3341-3343,3347.XU Ke,LEI Jian-jun.Estimating Algorithm for Missing Values Based on Attribute Correlation in Wireless Sensor Network[J].Journal of Computer Applications,2015,35(12):3341-3343,3347.
    [11]袁媛,邵春福,林秋映,等.基于RBF神经网络的交通流数据修复研究[J].交通运输研究,2016,2(5):46-52.YUAN Yuan,SHAO Chun-fu,LIN Qiu-ying,et al.Repair of Traffic Flow Data Based on RBF Neural Network[J].Transport Research,2016,2(5):46-52.
    [12]LEE B,KIM K,CHUNG E Y.Replacement Policy Adaptable Miss Curve Estimation for Efficient Cache Partitioning[J].IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems,2017,37(2):445-457.
    [13]ASIF M T,MITROVIC N,DAUWELS J,et al.Matrix and Tensor Based Methods for Missing Data Estimation in Large Traffic Networks[J].IEEE Transactions on Intelligent Transportation Systems,2016,17(7):1816-1825.
    [14]陈光平.基于时间序列数据特性的缺失值估计算法[J].计算机工程与应用,2012,48(12):135-138.CHEN Guang-ping.Missing Value Estimating Algorithm Based on Time Series Data Properties[J].Computer Engineering and Applications,2012,48(12):135-138.
    [15]许可,雷建军.基于属性相关性的无线传感网络缺失值估计方法[J].计算机应用,2015,35(12):3341-3343.XU Ke,LEI Jian-jun.Estimating Algorithm for Missing Values Based on Attribute Correlation in Wireless Sensor Network[J].Journal of Computer Applications,2015,35(12):3341-3343.
    [16]李珊,俞瑛,胡康华,等.基于制造云服务Qo S序列特性的缺失值估计算法[J].计算机集成制造系统,2016,22(12):2930-2936.LI Shan,YU Ying,HU Kang-hua,et al.Missing Value Estimating Algorithm Based on Cloud Manufacturing Services Qo S Time Series Data Properties[J].Computer Integrated Manufacturing Systems,2016,22(12):2930-2936.
    [17]刘钊,杜威,闫冬梅,等.基于K近邻算法和支持向量回归组合的短时交通流预测[J].公路交通科技,2017,34(5):122-128.LIU Zhao,DU Wei,YAN Dong-mei.Short-term Traffic Flow Forecast Based on Combination of K Nearest Neighbor Algorithm and Support Vector Regression[J].Journal of Highway and Transportation Research and Development,2017,34(5):122-128.
    [18]陈飞彦,田宇驰,胡亮,等.物联网中基于KNN和BP神经网络预测模型的研究[J].计算机应用与软件,2015,32(6):127-130.CHEN Fei-yan,TIAN Yu-chi,HU Liang.STUDY on KNN and BP Neural Network-based Prediction Model in IOT[J].Computer Applications and Software,2015,32(6):127-130.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700