用户名: 密码: 验证码:
点源时间序列数据缺失值的估值方法比较——以小流域气象和水文数据为例
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Performance Comparison of Different Interpolation Methods on Missing Values for Time Series Data——A Case Study of Meteorological and Hydrological Data in Subtropical Small Watershed
  • 作者:甘蕾 ; 周脚根 ; 石锦 ; 李希 ; 沈健林 ; 吕殿青 ; 李裕元 ; 吴金水
  • 英文作者:GAN Lei;ZHOU Jiao-gen;SHI Jin;LI Xi;SHEN Jian-lin;LV Dian-qing;LI Yu-yuan;WU Jin-shui;College of Resources and Environmental Sciences,Hunan Normal University;Key Laboratory of Agroecological Processes in Subtropical Region,Institute of Subtropical Agriculture,Chinese Academy of Sciences;College of Engineering,Hunan Agricultural University;
  • 关键词:缺失值 ; 估值方法 ; 变异系数 ; 时间序列
  • 英文关键词:Missing values;;Interpolation methods;;Coefficient of variance;;Time series
  • 中文刊名:ZGNY
  • 英文刊名:Chinese Journal of Agrometeorology
  • 机构:湖南师范大学资源与环境科学学院;中国科学院亚热带农业生态研究所亚热带农业生态过程重点实验室;湖南农业大学工学院;
  • 出版日期:2018-03-20
  • 出版单位:中国农业气象
  • 年:2018
  • 期:v.39
  • 基金:国家科技支撑计划项目(2014BAD14B02);; 水利部公益性行业科研专项经费项目(201501055);; 湖南省地理学重点学科建设项目(20110101)
  • 语种:中文;
  • 页:ZGNY201803007
  • 页数:10
  • CN:03
  • ISSN:11-1999/S
  • 分类号:59-68
摘要
对点源时间序列数据缺失值进行有效估值能提升其数据质量。为探究不同估值方法对点源时间序列数据缺失值的估值效果及其影响因素,以亚热带典型小流域长期定位观测的每日气象和水文数据(最高气温、最低气温、太阳辐射量、降雨量及地表径流量)为例,以均方根误差(RMSE)、绝对平均误差(MAE)和Pearson相关系数(r)为性能验证指标,比较了线性内插法(LIM)、K-最近邻插值法(KNNM)、样条插值法(SIM)、多项式插值法(PIM)和核密度估值法(KDEM)5种估值方法的估值性能差异及其主要影响因素。结果表明:(1)LIM、SIM和KDEM的估值性能总体上优于其它2种方法;(2)5种估值方法对气象数据(最高气温、最低气温和太阳辐射量)缺失值估值的RMSE为1.81~6.35,MAE为1.30~4.20,r为0.70~0.98(P<0.05),而对水文数据(降雨量和地表径流量)缺失值估值的RMSE为12.54~26.28,MAE为3.60~14.21,r为0.07~0.72。可见,各估值方法对气象数据的估值性能强于对水文数据;(3)上述数据集的变异系数(CV)与估值评估指标(RMSE、MAE及r)线性相关(P<0.05),是影响估值性能的重要因素。
        The effective estimation of the missing values of time series data at the scale of point process could improve its data quality.The meteorological and hydrological data sets(daily maximum air temperature,daily minimum air temperature,daily solar radiation,daily rainfall and daily stream flow) were collected through the long-term field experiments in a typically small subtropical watershed in subtropical zone.The performance differences within five interpolation methods of linear interpolation method(LIM),K-Nearest neighbor interpolation method(KNNM),spline interpolation method(SIM),polynomial interpolation method(PIM) and kernel density estimation method(KDEM) were analyzed on the above-mentioned five data sets.The root mean square error(RMSE),absolute mean error(MAE) and Pearson correlation coefficient(r) were selected to evaluate the advantages and disadvantages of the five methods.The results showed that:(1) The estimation performance of LIM,SIM and KDEM was generally superior to the other two methods.(2) The estimation of the missing values of meteorological data(maximum temperature,minimum temperature and solar radiation) produced the varying values of the three evaluation indices with RMSE values of 1.81-6.35,MAE values of 1.30-4.20 and r values of 0.70-0.98(P<0.05),respectively.In contrast,the estimation of missing values of hydrological data(rainfall and stream flow) had relatively high values of RMSE and MAE which were 12.51-26.28 and 3.60-14.21,respectively,and low values of r(0.07-0.72).So the above-mentioned interpolation methods generally produced better estimation of missing values of meteorological data sets than those of hydrological data.(3) Additionally,the coefficient of variation(CV) of the above data sets linearly correlated with the evaluation indices(RMSE,MAE and r)(P<0.05),and played an important role in affecting the valuation performance of the above-mentioned interpolation methods.
引文
[1]Kantardzic M.Data mining:concepts,models,methods,and algorithms[M].John Wiley&Sons,2011.
    [2]关宏强,蔡福,王阳,等.短时间序列气温要素空间插值方法精度的比较研究[J].气象与环境学报,2007,23(5):13-16.Guan H Q,Cai F,Wang Y,et al.Comparison of different spatial interpolation methods for air temperature data of short-time series[J].Journal of Meteorology and Environment,2007,23(5):13-16.(in Chinese)
    [3]毛洋洋,赵艳霞,张祎,等.五个常见日太阳总辐射模型在华北地区的有效性验证及分析[J].中国农业气象,2016,37(5):520-530.Mao Y Y,Zhao Y X,Zhang Y,et al.Validation and analysis of five general daily solar radiation estimation models used in Northern China[J].Chinese Journal of Agrometeorology,2016,37(5):520-530.(in Chinese)
    [4]郭兆夏,李星敏,朱琳,等.基于GIS的陕西省年降水量空间分布特征分析[J].中国农业气象,2010,31(S1):121-123.Guo Z X,Li X M,Zhu L,et al.Research on spatial distribution of annual precipitation in Shanxi Province based on GIS[J].Chinese Journal of Agrometeorology,2010,31(S1):121-123.(in Chinese)
    [5]Srebotnjak T,Carr G,de Sherbinin A,et al.A global water quality index and hot-deck imputation of missing data[J].Ecological Indicators,2012,17:108-119.
    [6]段建军,王小利,高照良,等.黄土高原地区50年降水时空动态与趋势分析[J].水土保持学报,2009,23(5):143-146.Duan J J,Wang X L,Gao Z L,et al.Dynamics and trends analysis of annual precipitation in the Loess Plateau Region for 50 years[J].Journal of Soil and Water Conservation,2009,23(5):143-146.(in Chinese)
    [7]马晶,陈锡云,刘晓燕.地理因素辅助的黄土高塬典型流域面雨量制图效果比较与评价[J].水土保持学报,2016,30(6):174-180.Ma J,Chen X Y,Liu X Y.Comparison and evaluation of areal precipitation mapping effectiveness with consideration of geographic factors in the Loess Plateau[J].Journal of Soil and Water Conservation,2016,30(6):174-180.(in Chinese)
    [8]Zhu Q A,Zhang W C,Zhao D Z.Topography-based spatial daily precipitation interpolation by means of PRISM and thiessen polygon analysis[J].Scientia Geographica Sinica,2005,25(2):233-238.
    [9]Gu Z H,Shi P J,Chen J.Precipitation interpolation research over regions with sparse meteorological stations:a case study in Xilingole League[J].Journal of Beijing Normal University(Natural Science),2006,42(2):204-208.
    [10]邵晓梅,严昌荣,魏红兵.基于Kriging插值的黄河流域降水时空分布格局[J].中国农业气象,2006,27(2):65-69.Shao X M,Yan C R,Wei H B.Spatial and temporal structure of precipitation in the Yellow River Basin based on Kriging method[J].Chinese Journal of Agrometeorology,2006,27(2):65-69.(in Chinese)
    [11]Liu Y,Chen P Q,Zhang W.A spatial interpolation method for surface air temperature and its error analysis[J].Chinese Journal of Atmospheric Sciences,2006,30(1):146-152.
    [12]杜东升,廖玉芳,赵福华.湖南复杂地形下日平均气温空间插值方法探讨[J].中国农业气象,2011,32(4):607-614.Du D S,Liao Y F,Zhao F H.Study on the spatial interpolation method for daily mean air temperature over complex terrain in Hunan province[J].Chinese Journal of Agrometeorology,2011,32(4):607-614.(in Chinese)
    [13]郭建茂,王锦杰,吴越,等.基于卫星遥感与气象站数据的水稻高温热害监测和评估模型研究:以江苏、安徽为例[J].农业现代化研究,2017,38(2):298-306.Guo J M,Wang J J,Wu Y,et al.Research on monitoring and modeling of rice heat injury based on satellite and meteorological station data:case study of Jiangsu and Anhui[J].Research of Agricultural Modernization,2017,38(2):298-306.(in Chinese)
    [14]任利利,殷淑燕.汉江上游近50多年来气温变化特征与区域差异[J].农业现代化研究,2013,34(3):348-352.Ren L L,Yin S Y.Air temperature variation of the upper reaches of Hanjiang River in recent 50 years and its regional differences[J].Research of Agricultural Modernization,2013,34(3):348-352.(in Chinese)
    [15]鲍晓蕾,高辉,胡良平.多种填补方法在纵向缺失数据中的比较研究[J].中国卫生统计,2016,33(1):45-48.Bao X L,Gao H,Hu L P.Comparative study of various imputation methods in dealing with longitudinal missing data[J].Chinese Health Statistics,2016,33(1):45-48.(in Chinese)
    [16]杨军,赵宇,丁文兴.抽样调查中缺失数据的插补方法[J].数理统计与管理,2008,27(5):821-832.Yang J,Zhao Y,Ding W X.On imputation methods of missing data in survey sampling[J].Application of Statistics and Management,2008,27(5):821-832.(in Chinese)
    [17]Ferrari G T,Ozaki V.Missing data imputation of climate datasets:implications to modeling extreme drought events[J].Revista Brasileira de Meteorologia,2014,29(1):21-28.
    [18]Saleem M U,Ahmed S R.Missing data imputations for upper air temperature at 24 standard pressure levels over pakistan collected from Aqua satellite[J].Journal of Data Analysis and Information Processing,2016,4(3):132.
    [19]郑小波,罗宇翔,于飞,等.西南复杂山地农业气候要素空间插值方法比较[J].中国农业气象,2008,29(4):458-462.Zheng X B,Luo Y X,Yu F,et al.Comparisons of spatial interpolation methods for agro-climate factors in complex mountain areas of southwest China[J].Chinese Journal of Agrometeorology,2008,29(4):458-462.(in Chinese)
    [20]孟岑,李裕元,吴金水,等.亚热带典型小流域总氮最大日负荷(TMDL)及影响因子研究:以金井河流域为例[J].环境科学学报,2016,36(2):700-709.Meng C,Li Y Y,Wu J S,et al.Study on total nitrogen TMDL and its contributing factors in typical subtropical watersheds:a case study of Jinjinghe watershed[J].Acta Scientiae Circumstantiae,2016,36(2):700-709.(in Chinese)
    [21]李新,程国栋,卢玲.空间内插方法比较[J].地球科学进展,2000,15(3):260-265.Li X,Cheng G D,Lu L.Comparison of spatial interpolation methods[J].Advance Earth Sciences,2000,15(3):260-265.(in Chinese)
    [22]张晓琴,王敏.基于主成分分析的成分数据缺失值插补法[J].应用概率统计,2016,32(1):101-110.Zhang X Q,Wang M.Imputation of missing values for compositional data based on principal component analysis[J].Chinese Journal of Applied Probability and Statistics,2016,32(1):101-110.(in Chinese)
    [23]陈林.基于GIS的流域水文数据的时空分析:以格兰德河流域径流数据为例[D].青岛:山东科技大学,2010.Chen L.GIS-based spatial-temporal analysis of watershed hydrological data[D].Qingdao:Shandong University of Science and Technology,2010.(in Chinese)
    [24]王国荣,俞耀明,徐兆亮,等.数值分析(第三版)[M].北京:机械工业出版社,2005.Wang G R,Yu Y M,Xu Z L,et al.Numerical analysis(Third Edition)[M].Beijing:Mechanical Industry Press,2005.(in Chinese)
    [25]殷杰,石锐.SAS中处理数据集缺失值方法的对比研究[J].计算机应用,2007,27(b6):438-439.Yin J,Shi R.A comparative study on the method of missing value of data set in SAS[J].Computer Applications,2007,27(b6):438-439.(in Chinese)
    [26]花琳琳,施念,杨永利,等.不同缺失值处理方法对随机缺失数据处理效果的比较[J].郑州大学学报(医学版),2012,47(3):315-318.Hua L L,Shi N,Yang Y L,et al.Comparison of different methods in dealing with missing values of missing at random[J].Journal of Zhengzhou University(Medical Sciences),2012,47(3):315-318.(in Chinese)
    [27]蔡浩.地质统计学在地层岩土参数分布规律研究中的应用[D].苏州:苏州科技学院,2015.Cai H.Applications of geostatistics to research on the distribution of the geotechnical parameters[D].Suzhou:Suzhou University of Science and Technology,2015.(in Chinese)
    [28]Hong T,Kim C J,Jeong J,et al.Framework for approaching the minimum CV(RMSE)using energy simulation and optimization tool[J].Energy Procedia,2016,88:265-270.
    [29]张桂铭,朱阿兴,杨胜天,等.基于核密度估计的动物生境适宜度制图方法[J].生态学报,2013,33(23):7590-7600.Zhang G M,Zhu A X,Yang S T,et al.Mapping wildlife habitat suitability using kernel density estimation[J].Acta Ecologica Sinica,2013,33(23):7590-7600.(in Chinese)
    [30]于力超,金勇进,王俊.缺失数据插补方法探讨:基于最近邻插补法和关联规则法[J].统计与信息论坛,2015,30(1):35-40.Yu L C,Jin Y J,Wang J.The research of missing data imputation method:based on nearest neighbor imputation and association rules[J].Statistic&Information Forum,2015,30(1):35-40.(in Chinese)
    [31]阎洪.薄板光顺样条插值与中国气候空间模拟[J].地理科学,2004,24(2):163-169.Yan H.Modeling spatial distribution of climate in China using thin plate smoothing spline interpolation[J].Scientia Geographica Sinica,2004,24(2):163-169.
    [32]Noor N M,Abdullah M M A B,Yahaya A S,et al.Comparison of linear interpolation method and mean method to replace the missing values in environmental data set[J].Materials Science Forum,2015,(5):10.
    [33]唐云辉,高阳华.基于邻域特征的温度缺失值的填补方法[J].中国农业气象,2008,29(4):454-457.Tang Y H,Gao Y H.Imputation method of missing temperature data based on neighborhood features[J].Chinese Journal of Agrometeorology,2008,29(4):454-457.(in Chinese)
    [34]赵彦锋,陈杰,齐力,等.不同采样尺度下土壤图和Kriging法的空间估值精度比较:以砂姜黑土典型地区的研究为例[J].土壤通报,2011,(4):872-878.Zhao Y F,Chen J,Qi L,et al.The comparison of soil map and Kriging methods for spatially prediction precision of soil properties with different sample spacings:a case of Shajiang black soil area[J].Chinese Journal of Soil Science,2011,(4):872-878.(in Chinese)
    [35]Yozgatligil C,Aslan S,Iyigun C,et al.Comparison of missing value imputation methods in time series:the case of Turkish meteorological data[J].Theoretical and Applied Climatology,2013,112(1-2):143-167.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700