用户名: 密码: 验证码:
基于随机森林回归模型的登革热风险评估研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Risk assessment of dengue fever based on random forest model
  • 作者:黄宇琳 ; 赵永谦 ; 曹峥 ; 刘涛 ; 邓爱萍 ; 肖建鹏 ; 张兵 ; 祝光湖 ; 彭志强 ; 马文军
  • 英文作者:HUANG Yu-lin;ZHAO Yong-qian;CAO Zheng;LIU Tao;DENG Ai-ping;XIAO Jian-peng;ZHANG Bing;ZHU Guang-hu;PENG Zhi-qiang;MA Wen-jun;Jinan University Faculty of Medical Science;School of Geographical Sciences,Guangzhou University;Guangdong Provincial Institute of Public Health,Guangdong Provincial Center for Disease Control and Prevention;Guangdong Provincial Center for Disease Control and Prevention;
  • 关键词:登革热 ; 随机森林回归 ; 风险评估
  • 英文关键词:Dengue;;Random forest regression;;Risk assessment
  • 中文刊名:GDWF
  • 英文刊名:South China Journal of Preventive Medicine
  • 机构:暨南大学基础医学院;广州大学地理科学学院;广东省疾病预防控制中心广东省公共卫生研究院;广东省疾病预防控制中心;
  • 出版日期:2019-02-20
  • 出版单位:华南预防医学
  • 年:2019
  • 期:v.45
  • 基金:国家重点研发计划(2018YFB0505500,2018YFB0505503);; 广东省科技计划项目(2014A040401041);; 国家自然科学基金(81773497)
  • 语种:中文;
  • 页:GDWF201901006
  • 页数:6
  • CN:01
  • ISSN:44-1550/R
  • 分类号:32-37
摘要
目的基于随机森林回归模型构建小空间尺度的登革热风险评估工具,为登革热防控提供依据。方法以2012年1月至2014年9月登革热病例及相关因素数据为训练集,分别构建登革热流行频率、持续时间及强度风险指标的随机森林回归模型,以2014年10月至2015年12月登革热病例及相关因素数据为验证集,并对构建的模型进行评估。结果频率、持续时间、强度指标与发病数指标的相关系数均>0.7。依据训练集构建的登革热流行频率、持续时间和强度风险指标的随机森林回归模型变量解释度分别为96.72%、91.98%和90.1%,提示模型拟合度较好;交叉验证法可见各模型均方误差分别0.001 9、1.424 6和1.881 1,均处于较低水平;比较随机森林回归、支持向量回归、广义线性模型和广义相加模型的准确性,随机森林回归和支持向量机等机器学习模型均方误差远低于广义线性模型和广义相加模型。结论以登革热频率、持续时间及强度指标为结局变量,气象、环境及社会经济特征为预测变量构建的随机森林回归模型准确性较好,可作为登革热风险评估工具,为登革热防控工作服务。
        Objective To construct a small spatial scale dengue risk assessment tool based on the random forest model,so as to provide scientific basis for the prevention and control of dengue fever. Methods Data of dengue case and related factors from February 2012 to September 2014 were used as the training set and random forest regression(RFR)models were constructed separately for frequency,duration and intensity of dengue fever. Data of dengue cases and related factors from October 2014 to March2015 were used to as the testing set to verify the accuracy of the models. Results The correlation coefficients between incidence and frequency,duration,intensity of dengue fever were all higher than 0.7.Based on the training set,the pseudo Rsquareds in the models of frequency,duration,and intensity were96.72%,91.98%,and 90.1%;the crossvalidated mean square errors(MSEs)of the models were 0.0019,1.424 6,and 1.881 1,respectively. By comparing the accuracy of RFR,support vector regression(SVR),generalized linear model(GLM)and generalized additive model(GAM),the MSEs of RFR and SVR were much lower than those of GLM and GAM. Conclusion The RFR models constructed using the frequency,duration and intensity of dengue fever as outcome variables and the meteorological,environmental and socioeconomic characteristics as predictors have better accuracy and can be used as a risk assessment tool for preventing and control of the outbreak of dengue fever.
引文
[1] WHO. Global strategy for dengue prevention and control,2012–2020 WHO,Geneva 2012[M]. Geneva:WHO,2012:1.
    [2] Sarin YK,Singh S,Singh T. Dengue viral infection[J]. Indian Pediatr,2010,55(1):129.
    [3] Lai S,Huang Z,Zhou H,et al. The changing epidemiology of dengue in China,1990-2014:a descriptive analysis of 25 years of nationwide surveillance data[J]. BMC Med,2015,13:100.
    [4] Wu JY,Lun ZR,James AA,et al. Review:Dengue fever in mainland China[J]. Am J Trop Med Hyg,2010,83(3):664-671.
    [5]王芹,许真,窦丰满,等.中国2005—2007年登革热流行现状与监测分析[J].中华流行病学杂志,2009,30(8):802-806.
    [6] Wen TH,Lin NH,Lin CH,et al. Spatial mapping of temporal risk characteristics to improve environmental health risk identification:a case study of a dengue epidemic in Taiwan[J]. Sci Total Environ,2006,367(2-3):631-640.
    [7] Liu X,Rajarethinam J,Shi Y,et al. Development of predictive dengue risk map using Random Forest[J]. Int J Infect Dis,2016,45:346.
    [8] Ding F,Fu J,Jiang D,et al. Mapping the spatial distribution of Aedes aegypti and Aedes albopictus[J]. Acta Trop,2018,178:155-162.
    [9] Guo P,Liu T,Zhang Q,et al. Developing a dengue forecast model using machine learning:a case study in China[J]. PLoS Negl Trop Dis,2017,11(10):e0005973.
    [10] Kuhn M. Building predictive models in R using the caret package[J]. J Stat Softw,2008,28(5):1-26.
    [11] Zeileis A,Karatzoglou A,Smola A,et al. Kernlaban S4 package for kernel methods in R[J]. J Stat Softw,2004,11(9):721-729.
    [12] Wood S. mgcv:GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL[J/OL].[2008-01-01].https://www.researchgate. net/publication/281094814_The_mgcv_package_GAMs_with_GCV_smoothness_estimation_and_GAMMs_by_REMLPQL.
    [13] Chen QQ,Meng YJ,Li Y,et al. Frequency,duration and intensity of dengue fever epidemic risk in townships in Pearl River Delta and Yunnan in China,2013[J]. Biomed Environ Sci,2015,28(5):388-395.
    [14] Chu HJ,Chan TC,Jao FJ. GISaided planning of insecticide spraying to control dengue transmission[J]. Int J Health Geogr,2013,12(1):42.
    [15] Dom NC,Ahmad AH,Adawiyah R,et al. Spatial mapping of temporal risk characteristic of dengue cases in Subang Jaya[R/OL].[2010-12-01].https://www.researchgate.net/profile/Nazri_Che_Dom/publication/251994778_Spatial_mapping_of_temporal_risk_characteristic_of_dengue_cases_in_Subang_Jaya/links/02e7e5256477687a18000000/SpatialmappingoftemporalriskcharacteristicofdenguecasesinSubangJaya.pdf.
    [16] Brett L.机器学习与R语言[M].北京:机械工业出版社,2015:1.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700