用户名: 密码: 验证码:
基于随机森林分类算法的巢湖水质评价
详细信息   下载全文 | 推荐本文 |
摘要
基于监测数据及机器学习算法的湖泊水质实时评价技术对当前湖泊水资源的管理、维护和保护具有重要意义。本文针对巢湖水质的类别评价,利用随机森林(Random Forest,RF)分类算法对该区域水质进行类别判定。与其他算法相比,随机森林算法有着精度高、可容忍噪声强等诸多优点。测试结果表明,当决策树的棵数ntree=300,分裂属性集中属性个数mtry=2时,在合肥湖滨监测断面水质分类准确率可达96.15%,在巢湖裕溪口监测断面水质分类准确率高达100%,该方法具有稳健性较高、实用性强、泛化性能好等特点,能够有效进行水质评价。
        Real time evaluation of water quality based on monitoring data and machine learning algorithm has great significance for management,maintenances and protection of water resources in lake. Aiming at the class evaluation of water quality of Chaohu,a classification algorithm named random forest was used to determine the category of the water quality of this area. Comparing with other typical machine learning methods,this method has higher precision of classification and better tolerableness of noise. The testing result shows that when the quantities of the decision-making tree: ntree = 300 and the number of attributes of split attribute sets: mtry = 2,the accuracy rate of water quality classification in Hefei Hubin monitoring section could reach 96. 15%,and it reaches as high as 100% in Yu Xikou monitoring section. The suggested method has higher robustness,stronger practicability and higher generalization performance. It can effectively fulfill water quality assessment with high precision.
引文
[1]刘鸿亮.湖泊富营养化控制.北京:中国环境科学出版社,2011
    [2]Breiman L.Random forests.Machine Learning,2001,45(1):5-32
    [3]Chen Xuewen,Liu Mei.Prediction of protein-protein interactions using random decision forest framework.Bioinformatics,2005,21(24):4394-4400
    [4]Smith A.,Sterba-Boatwright B.,Mott J.Novel application of a statistical technique,random forests,in a bacterial source tracking study.Water Research,2010,44(14):4067-4076
    [5]Ying Weiyun,Li Xiu,Xie Yaya,et al.Preventing customer churn by using random forests modeling//Proceedings of the IEEE International Conference on Information Reuse and Integration(IRI 2008).Las Vegas,NV,USA:IEEE,2008:429-434
    [6]Lee S.L.A.,Kouzania A.Z.,Hu E.J.Random forest based lung nodule classification aided by clustering.Computerized Medical Imaging and Graphics,2010,34(7):535-542
    [7]Ward M.M.,Pajevic S.,Dreyfuss J.,et al.Short‐term prediction of mortality in patients with systemic lupus erythematosus:Classification of outcomes using random forests.Arthritis Care&Research,2006,55(1):74-80
    [8]盂杰.随机森林模型在财务失败预警中的应用.统计与决策,2014,(4):179-181
    [9]康有,陈元芳,顾圣华,等.基于随机森林的区域水资源可持续利用评价.水电能源科学,2014,32(3):34-38Kang you,Chen Yuanfang,Gu Shenghua,et al.Assessment of sustainable utilization of regional water resources based on random forest.International Journal Hydroelectric Energy,2014,32(3):34-38(in Chinese)
    [10]张雷,王琳琳,张旭东,等.随机森林算法基本思想及其在生态学中的应用——以云南松分布模拟为例.生态学报,2014,34(3):650-659Zhang Lei,Wang Linlin,Zhang Xudong,et al.The basic principle of random forest and its applications in ecology:A case study of Pinus yunnanensis.Acta Ecologica Sinica,2014,34(3):650-659(in Chinese)
    [11]席北斗,赫英臣,龚斌.德国巴伐利亚州水域水质分类特征.人民黄河,2010,32(1):50-51
    [12]Ho T.K.The random subspace method for constructing decision forests.IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844
    [13]Breiman L.Bagging predictors.Machine Learning,1996,24(2):123-140
    [14]马昕,王雪,杨洋.基于随机森林算法的大学生异动情况的预测.江苏科技大学学报(自然科学版),2012,26(1):86-90Ma Xin,Wang Xue,Yang Yang.Prediction of degradation for undergraduate using random forest.Journal of Jiangsu University of Science and Technology(Natural Science Edition),2012,2 6(1):86-90(in Chinese)
    [15]董师师,黄哲学.随机森林理论浅析.集成技术,2013,2(1):1-7Dong Shishi,Huang Zhexue.A brief theoretical overview of random forests.Journal of Integration Technology,2013,2(1):1-7(in Chinese)
    [16]方匡南,吴见彬,朱建平,等.随机森林方法研究综述.统计与信息论坛,2012,26(3):32-38Fang Kuangnan,Wu Jianbin,Zhu Jianping,et al.A review of technologies on random forests.Statistics&Information Forum,2011,26(3):32-38(in Chinese)
目录

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700