用户名: 密码: 验证码:
基于进化集成分类器的铁路安全隐患智能分类
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Intelligent Classification of Railway Safety Hazards Based on Evolutionary Ensemble Classifier
  • 作者:李新琴 ; 史天运 ; 李平 ; 王喆 ; 杨连报
  • 英文作者:LI Xinqin;SHI Tianyun;LI Ping;WANG Zhe;YANG Lianbao;Postgraduate Department,China Academy of Railway Science;China Academy of Railway Science;Institute of Computing Technology,China Academy of Railway Sciences;
  • 关键词:交通信息工程 ; 铁路大数据 ; 文本智能分类 ; 进化集成分类器 ; 安全事故隐患
  • 英文关键词:traffic information engineering;;railway big data;;intelligent text classification;;evolutionary ensemble classifier;;railway safety hazard
  • 中文刊名:JTJS
  • 英文刊名:Journal of Transport Information and Safety
  • 机构:中国铁道科学研究院研究生部;中国铁道科学研究院;中国铁道科学研究院电子计算技术研究所;
  • 出版日期:2019-04-28
  • 出版单位:交通信息与安全
  • 年:2019
  • 期:v.37;No.217
  • 基金:国家重大研发计划课题项目(2018YFB1201403);; 铁路总公司课题项目(K2018S007)资助
  • 语种:中文;
  • 页:JTJS201902005
  • 页数:7
  • CN:02
  • ISSN:42-1781/U
  • 分类号:39-45
摘要
针对铁路安全事故隐患文本数据分类提出进化集成分类器模型。分析安全事故隐患数据特征,根据每一类安全事故隐患数据都有特征关键词的特点,运用TF-IDF方法提取文本特征并转换为向量。设计进化集成分类器模型实现流程。采用Bagging集成分类器将TF-IDF转换后的文本向量进行随机采样,训练若干个决策树基分类器模型,设计遗传算法编码机制、灵敏度设定、适应度函数及目标函数选择等关键步骤。根据遗传算法流程实现基分类器组合优化,将经过遗传算法进化的最优个体对应的基分类器参与Bagging投票分类,验证分类效果。通过对某铁路局供电接触网安全事故隐患文本数据实验分析,进化集成分类器模型在安全事故隐患分类的准确率相比于单个决策树分类器和Bagging集成分类器分类结果分别提升17.42%和4.63%,证明设计的进化集成分类模型能够取得较好的分类效果,可应用于铁路安全事故隐患分类。
        An evolutionary ensemble classifier model is proposed for text data classification of railway safety hazards. Based on analyses of characteristics of the data of safety accidents, TF-IDF method is used to extract features of text and converted into vectors. Implementation processes of the evolutionary ensemble classifier model are designed. Bagging ensemble classifier is applied for random sampling of the text vector, and a number of decision tree classifier models are trained. A genetic algorithm is used for coding mechanism, sensitivity setting, fitness function, and key steps of selecting target function. According to the process of genetic algorithm, combination optimization for base classifier is achieved. The base classifier corresponds with the optimal evolution from genetic algorithm is used in Bagging voting classification to verify effects. Based on a case study of text data from a Railway Bureau, compared with decision tree classifier and Bagging ensemble classifier, classification accuracy of the model is improved by 17.42% and 4.63%, respectively. The results show that the model has better classification effects and can be applied to classify railway safety hazards.
引文
[1] 张殿业,金键,杨京帅.铁路运输安全理论与技术体系[J].中国铁道科学,2005(3):114-118.ZHANG Dianye,JIN Jian,YANG Jingshuai.Theoretic and technical research framework of railway transportation safety[J].China Railway Science,2005(3):114-118.(in Chinese)
    [2] 李智涛.铁路安全生产事故隐患排查治理的思考[J].上海铁道科技,2018(1):23-24.LI Zhitao.Consideration on the investigation and treatment of hidden dangers in railway production safety accidents[J].Shanghai Railway Science & Technology,2018(1):23-24.(in Chinese)
    [3] 杨连报,李平,薛蕊,等.基于不平衡文本数据挖掘的铁路信号设备故障智能分类[J].铁道学报,2018(2):59-66.YANG Lianbao,LI Ping,XUE Rui,et al.Intelligent fault classification of railway signal equipment based on imbalanced text data mining[J].Journal of the China Railway Society,2018(2):59-66.(in Chinese)
    [4] 张磊,王喆.基于铁路安全管理信息报告的文本挖掘技术研究[J].铁路计算机应用,2018,27(8):9-12.ZHANG Lei,WANG Zhe.Text mining technology of railway safety management information documents[J].Railway Computer Application,2018,27(8):9-12 (in Chinese)
    [5] 张兴强,刘雪,朱艺焱,等.基于互联网数据城市快速路地点安全分析方法[J].交通运输系统工程与信息,2018,18(5):53-59.ZHANG Xingqiang,LIU Xue,ZHU Yiyan,et al.Location security analysis of urban expressway based on internet data[J].Journal of Transportation Systems Engineering and Information Technology,2018,18(5):53-59.(in Chinese)
    [6] 张坤,梅诗冬,景国勋,等.道路交通事故信息文本预处理技术研究与实践[J].安全与环境工程,2017,24(4):112-116,122.ZHANG Kun,MEI Shidong,JING Guoxun,et al.Research and practice for the text preprocessing technology of road traffic accident information[J].Safety and Environmental Engineering,2017,24(4):112-116,122.(in Chinese)
    [7] 吴伋,江福才,姚厚杰,等.基于文本挖掘的内河船舶碰撞事故致因因素分析与风险预测[J].交通信息与安全,2018,36(3):8-18.WU Ji,JIANG Fucai,YAO Houjie,et al.An analysis and risk forecasting of inland ship collision based on text mining[J].Journal of Transport Information and Safety,2018,36(3):8-18.(in Chinese)
    [8] 余晨,毛喆,高嵩.基于规则的海事自由文本信息抽取方法研究[J].交通信息与安全,2017,35(2):40-47.YU Chen,MAO Zhe,GAO Song.An approach of extracting information for maritime unstructured text based on rules[J].Journal of Transport Information and Safety,2017,35(2):40-47.(in Chinese)
    [9] WU Lei,HOI S C H,YU N.Semantics-preserving bag-of-words models and applications[J].IEEE Transactions on Image Processing,2010,19(7):1908-1920.
    [10] HUANG CHENGHUI.A text similarity measurement combining word semantic information with tf-idf method[J].Chinese Journal of Computers,2011,34(5):856-864.
    [11] TURNEY,PETER D,PANTEL,et al.From frequency to meaning:vector space models of semantics[J].Journal of Artificial Intelligence Research,2010,37(1):141-188.
    [12] BREIMAN L.Random forests [J].Machine learning,2001,45(1):5-32.
    [13] FRIEDMAN J H.Greedy function approximation:A gradient boosting machine [J].Annals of Statistics,2001,29(5):1189-1232.
    [14] WANG Chenglong,JIANG Feijun,YANG Hongxia.A hybrid framework for text modeling with convolutional RNN[C].23th ACM SIGKDD International Conference,Canada:ACM,2017.
    [15] LEI T,BARZILAY R ,JAAKKOLA T .Molding CNNs for text:Non-linear,non-consecutive convolutions[J].Indiana University Mathematics Journal,2015.
    [16] 陶宏曜,梁栋屹.基于卷积神经网络的职位描述文本分类方法[J].软件,2017,38(6):30-34.TAO Hongyao,LIANG Dongyi.Job description classification method based on convolution neural networks[J].Computer Engineering & Software,2017,38(6):30-34.(in Chinese)
    [17] SENIOR A,SAK H,SHAFRAN I.Context dependent phone models for LSTM RNN acoustic modelling[C].Speech and Signal Processing (ICASSP),Australia:IEEE,2015
    [18] QIN Yi,ZENG Yifei.Research of clinical named entity recognition based on Bi-LSTM-CRF[J].Journal of Shanghai Jiaotong University (Science),2018,23(3):58-63.
    [19] 任永功,荣杰,尹明飞,等.基于信息增益的文本特征选择方法[J].计算机科学,2012,39(11):127-130.REN Yonglao,YANG Rongjie,YI Mingfei,et al.Text feature selection method based on information gain[J].Computer Science,2012,39(11):127-130.(in Chinese)
    [20] 崔炳谋.编组站综合自动化若干问题的研究[D].北京:铁道部科学研究院,2007.CUI Bingmou.The research of some key problems on synthetic automation of marshlling yard[D].Beijing:China Academy of Railway Sciences,2007.(in Chinese)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700