用户名: 密码: 验证码:
基于轨检车检数据的决策树分类算法的研究与应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘(Data Mining)是从大量数据中发现潜在规律、提取有用知识的方法和技术。近年来,数据挖掘受到了普遍关注,已经成为信息系统和计算机科学领域研究中最活跃的部分。
     数据挖掘技术从一开始就是面向应用的。目前,在很多领域,数据挖掘都是一个很时髦的词,尤其是在如银行、电信、保险、交通、零售等领域。但数据挖掘技术应用于轨检车检测数据分析领域,目前研究较少。铁路线路检测产生了大量的轨检车检测数据,期待对其进行挖掘,找出其中潜在的规律,以对未来的数据进行分析与预测。因此,本文以真实的轨检车检测数据为例,阐述轨检车检测数据分析的意义、现状及存在的不合理点,具体提出利用数据挖掘分类算法对庞大的轨检车检测数据进行分析与预测的改进设想。
     分类算法最知名的是决策树方法(Decision Tree),决策树是用于分类的一种树结构。其中的每个内部节点(internal node)代表对某个属性的一次测试,一条边代表一个测试结果,叶子(leaf)代表某个类(class)或者类的分布(class distribution),最上面的节点是根节点。决策树分类法由于其分类效率高、速度快、可理解性强、简洁性好等优点,在海量数据环境中应用最为广泛。
     本文全面介绍了决策树分类算法的研究现状和研究热点,重点分析了ID3算法和C4.5算法。在此基础上提出了一种改进算法QC4.5,该算法在分析C4.5法实现的时间复杂度与空间复杂度的基础上,针对其对连续型属性的处理提出了两种改进方案,在决策树递归生成过程中,根据属性值的特点选择最优的方案来计算属性的信息增益。通过实验数据表明,QC4.5的执行效率优于C4.5,证明了算法的可行性。
     另外,本文在深入研究决策树分类算法的基础上,并结合轨检车检测数据分类的需要,开发了一个轨检车检测数据分类系统,可以作为一个通用的数据挖掘平台应用于各个领域。
Data mining is a method and technology which can discover underlying rules and extract useful knowledge. In recent years, data mining has attracted widely attention and became the most active part in the research of information system and computer science.
     Data Mining technique face to the application from the first. In many fields, Data Mining is a fashionable word, particularly in the realms such as bank, telecom, insurance, transportation and retail etc. But Data Mining is few used in data analysis of track geometry car inspection data realm. It produced large numbers of track geometry car inspection data in railway track inspection, and it is expected to be mined to find the latent rule to analyze and forecast the data of future. Therefore, this paper put real track geometry car inspection data as example to expatiate the meaning, status, and inconsequence, and put forward the amelioration assume to analyze and forecast the colossal data of Track geometry car using classification arithmetic based on decision tree.
     The most famous classification algorithm is Decision Tree, which is a tree-structure used for classification. Each internal node of Decision Tree represents one test of a property, while each edge represents the result of the test and each leaf represents a class or a class distribution. The node on the top is the root node. Because of its high efficiency, fast speed ,strong intelligibility, good Simplicity and so on merits, Decision tree is used most widely in the massive data environments.
     The research situation and hotspot of Decision Tree are roundly introduced in this paper, furthermore, the ID3 classification algorithm and the C4.5 classification algorithm are typically analyzed. Based on this, an improved classification algorithm named QC4.5 which put forward two strategies to improve C4.5 algorithm to deal with the continuous Properties, based on the analysis of time complexity and space complexity of C4.5 algorithm. Based on the UCI Knowledge Discovery in Databases Archive and UCI Machine Learning Archive as experiment data, this paper compares C4.5 with QC4.5(the new algorithm) on the execution efficiency, and it can be see that QC4.5 is better than C4.5.
     In addition, based on the in-depth research on Decision Tree classification algorithms, a system of track geometry car inspection data is developed in need of classification of track geometry car inspection data, and as an universal data mining platform, it could be applied in all fields.
引文
[1]储孝魏.客运专线轨道检测及维修技术的分析探讨[J].铁道标准设计.2005,2.29-31.
    [2]刘志军.以健全提速安全保障体系为重点确保铁路运输安全持续稳定[J].2007,4.1-12.
    [3]孙国瑛,沈善良.铁路工务[M].成都:西南交通大学出版社,1998.
    [4]韩守昌.利用轨检车检测资料指导线路养护维修[J].铁道建筑,2001,2.36-38.
    [5]高新平.应用轨道动态数据采集分析系统分析轨道动态变化规律[J].京铁科技通讯.2001,2,18-20.
    [6]李立军,刘彦辰.强化轨道检测数据应用适应快速铁路维修工作[J].世界轨道交通,2004,4.40-41.
    [7]铁道部基础设施检测中心报告.现代轨道检测技术及其应用.2006.9.
    [8]罗林,张格明.轮轨系统轨道平顺状态的控制[M].北京.中国铁道出版社.2006.
    [9]赵国堂.轨检车技术现状与发展[M].北京.中国铁道出版社,2001.
    [10]张末.GJ-4轨道检查车的原理与应用[M].北京:中国铁道出版社,2001.
    [11]贡照华,张琳.轨道结构状态分析及其控制[J].铁道标准设计,2005,3.72-73.
    [12]李海峰,许玉德.计算机编制铁路轨道养护维修计划的方法[J].同济大学学报.2004,32(4).480-484.
    [13]许玉德,曾学贵.轨道不平顺预测理论及智能化决策系统的研究[学位论文].北京交通大学.2003.
    [14]中华人民共和国铁道部.铁路线路维修规则[M].北京.中国铁道出版社.2006.
    [15]许玉德,李浩然,李海峰.铁路轨道养护维修计算机辅助决策系统中几个技术问题的研究[J].上海铁道大学学报,2000,21(10).26-31.
    [16]许心越.基于轨检车检测数据的轨道状态预测模型研究[学位论文].北京交通大学.2007.36-43.
    [17]刘俊,王福田,刘仍奎.铁路工务检测数据综合信息平台的设计与实现[J].铁路计算机应用.2006.15(8).26-28.
    [18]梁艳平,刘仍奎.轨道交通基础数据库元数据内容体系研究[J].交通运输系统工程与信息.2005.5(3).61-64.
    [19]马小宁.基于决策树的轨道不平顺数据分析[学位论文].北京交通大学.2005.
    [20]Tom M.Mitchell.机器学习[M].北京.机械工业出版社.2003.
    [21]朱明.数据挖掘[M].合肥.中国科技大学出版社.2002.
    [22]梁循.数据挖掘算法与应用[M].北京.北京大学出版社.2006.
    [23]Brodley C E,Utgoff P E.Multivariate decision trees[J].Machine Learning,1995,19(1):45-77.
    [24]史忠植.知识发现[M].北京.清华大学出版社,2002..83-86.
    [25]J.R.Quinlan,Induction of decision trees.Machine Learning[J].1986,22((1),81-106.
    [26]Pang-Ning Tan,Michael Steinbach.数据挖掘导论[M].北京.人民邮电出版社.2006,139-150.
    [27]翟俊海,张素芳,王熙照.ID3算法的理论基础[J].兰州大学学报,2007,43(6).66-69.
    [28]杨鸣,张载鸿.决策树学习算法ID3的研究[J].微机发展,2002,5(6).6-9.
    [29]陆君安.信息论基础[M].第二版.武汉.武汉大学出版社.2006.14-53.
    [30]KiraK,RendellL.The feature selection problem:traditional methods and a new algorithm[J].In:AAAI-92 Proceedings of the 9th National Conference on Artificial Intelligence.1992.129-134.
    [31]Hong J.AE1:an extension approximate method for general covering problem[J].International Journal of Computer and Information Science,1985,14(6):421-437.
    [32]Quinlan J R.C4.5:Programs for Machine Learning[M].Morgan Kauffman,1993.
    [33]G.Blanchard,C.Schafer,Y.Rozenholc.Optimal dyadic decision trees[J].Machine Learning,2007,66(2-3):209-241.
    [34]PRajeev Rastogi,PKyuseok Shim.PUBLIC:A Decision Tree Classifier that Integrates Building and Pruning[J].Data Mining and Knowledge Discovery,2000,4(4):315-344.
    [35]李波.基于SLIQ分类算法的数据挖掘技术及其在企业CRM中的应用[J].计算机工程与应用.2002(21).29-32.
    [36]B.Chandra,Sati Mazumdar.Elegant Decision Tree Algorithm for Classification in Data Mining[J].Proceedings of the Third International Conference on Web Information Systems Engineering,2002.160-174.
    [37]许向阳,龚永华.SPRINT算法的改进[J].计算机工程与应用.2003(3),187-189.
    [38]Jiawei Han,Micheline Kamber.数据挖掘概念与技术[M].北京:机械工业出版社.2001.
    [39]J.R.Quinlan.Simplifying Decision Trees[J],Internet.Journal of Man-Machine Studies.1987(27).221-234.
    [40]Clifdford A.Shaffer.数据结构与算法分析(Cq++版)[M].北京.电子工业出版社,1998.
    [41]Frank M.Carrano.数据结构与算法分析(JAVE版)[M].北京.清华大学出版社,2007.
    [42]严蔚敏,吴伟民.数据结构(C语言版)[M].北京.清华大学出版社.2005.173-176.
    [43]http://www.ics.uci.edu/-mlearn/MLSummary.html
    [44]钟雁,郭雨松.数据挖掘技术在铁路货运客户细分中的应用[J].北京交通大学学报.2008.32(3).58-63.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700