用户名: 密码: 验证码:
社保医疗消费中的异常信息检测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
社会医疗保险是社会医疗保障的基础,坚持“低水平,广覆盖”的原则,强调“公平优先、兼顾效率”,承担基本医疗的保障职责。我国对社会医疗保险的投入一直在不断的增长,社会医疗保险领域的信息化建设正在迅速发展。社保医疗信息管理系统的建设,为医疗基金的管理带来了方便,但是随着参保人数以及待遇项目不断增多,目前,不同地区的医保信息系统里已经积累了大量历史数据,如何对这些数据做一些有效的分析,已经成为现阶段数据管理的一个难点。随着数据挖掘技术的不断成熟与发展,其对数据的处理功能已经被人们接受并且广‘泛应用到有数据分析需求的行业之中。在医保信息管理方面引入数据挖掘技术,对大量沉积的历史数据进行处理,得出一些没有发现的信息和规则,辅助医保基金风险防控和安全运营。
     本文首先总结了目前在社会医疗保险管理上遇到的问题。介绍了常用的数据挖掘方法,对聚类分析进行深入的探讨。研究了EM算法,并在选取初始化参数和最大化步骤上对EM算法进行改进。在SQL Server2008数据挖掘平台基础上,实现改进EM算法,并通过插件方式注册到分析服务器算法库中,通过与EM算法的比较,得出改进EM算法的有效性。随后,在SQL Server2008上创建面向医疗消费费用异常检测主题的数据仓库,生成多维数据集,用改进EM算法创建数据挖掘模型,在此模型上,利用predict函数取得各种医疗消费费用的预测值,并通过一系列的数据分析规则,得出医疗消费费用的异常记录。同时,标注真实的异常消费记录作为跟踪,比较改进EM算法在查找异常记录上的查准率,证实了该算法在实际应用中的有效性。本文的内容作为一种尝试性的探测,得出一种取得医疗费用异常消费记录的分析方法。
     本文主要研究成果包括:(1)改进EM算法,并验证改进EM算法的有效性;(2)创建面向医疗消费费用异常分析主题的数据仓库,生成多维数据集,并在此基础上完成数据分析流程,取得费用的异常消费记录,成为一种有效的分析消费费用异常的方法。
Social health insurance has been assumed the protection of basic medical duties. With the continual investment of social health insurance, the management system of social health insurance is developed fast. The information management system of social security provides convenience for the management of social security. The number of insured people is increased, and different parts of the social security system have been accumulated a large amount of historical data. With the rapidly development of data mining technology, the theories are became matured, the data processing functions have been accepted and applied to data analysis. The introduction of data mining techniques in the social security management, excessive sedimentation of historical data for processing, to find out some hidden information and rules in large data sets, also, to prevent the risk of social security fund and control the security operations.
     In this thesis, the author first summarizes the problems sometimes encountered in the management of the social security medical insurance. Then, the theory of data mining methods, and cluster analysis is discussed. Later, introducing EM algorithm process, using the improved EM algorithm to select the initial parameters and maximize step simplifies the EM algorithm. SQL Server2008data mining platform on the basis of improved EM algorithm and plug-ins registered to the analysis server algorithm library, by comparing with the EM algorithm, improved the validity of the EM algorithm. Subsequently, the data warehouse for medical expenses anomaly detection topic in SQL Server2008to be created, by using a modified EM algorithm to create data mining models, this model, predict function to obtain the predictive value of various medical and consumer costs and data analysis through a series of rules, drawing the exception record of the costs of medical consumption. At the same time, the true abnormal consumption records are improved the EM algorithm and found out the exception record precision, the validity of the algorithm in practical applications is confirmed. The contents of this article obtained an analytical method to obtain the medical costs of abnormal consumption record.
     The main research of this thesis are as follows, firstly improved the EM algorithm, and validation to improve the effectiveness of the EM algorithm, secondly, created the data warehouse and generated the cube in the themes of detect the abnormal information in medical consumption of social security, and completed data mining on this basis analysis process, at last we have got the conclude that the result was useful. It's an effective method of detect medical consumption abnormal information.
引文
[1]杨国华.浅谈我国全民医疗保险制度的发展和完善[D].昆明:云南财经大学,2011.
    [2]胡晓义.加快建立覆盖城乡局面的社会保障体系[J].社会保障制度,2010,11:48-59.
    [3]朱扬勇.数据挖掘技术在医保领域中的研究与应用[D].上海:复旦大学,2011.
    [4]吕涛.基于J2EE架构的社保信息系统的开发[D].长沙:国防科学技术大学,2005.
    [5]苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
    [6]刘凯.数据仓库技术在社保领域中的应用研究[D].长沙:中南大学,2007.
    [7]杨海青.数据挖掘技术在医院管理中的应用[J].中华医院管理杂志,2005,21(7):496-499.
    [8]Y. M. Chae, S. H. Ho, K. W. Cho, D.H. Lee, S. H. Ji. Data mining approach to policy analysis in a health insurance domain[J]. International Journal of Medical informatics,2001,62:103-111.
    [9]J. Li, K.Y. Huang, J. Jin and J. Shi. A survey on statistical methods for health care fraud detection[J]. Health Care Management Science,2008,11(3):275-287.
    [10]Willian H,Inmon著,王志海等译.数据仓库[M].北京机械工业出版社,2006.
    [11]夏火松.数据仓库与数据挖掘技术[M].北京:科学出版社,2004.
    [12]代永卫,司志刚,费华平.基于数据仓库的公安决策支持系统设计[J].微计算机信息,2007,23(18):186-190.
    [13]陈元中.基于聚类的OLAP多维分析查询推荐方法研究[J].计算机工程与设计,2010,31(15):5303-3505.
    [14]林济南.基于商务智能的医院医保业务决策支持系统的研究[J].计算机与现代化,2010(9):132-137.
    [15]Jiawei Han,Micheline Kamber著,范明,孟晓峰等译.数据挖掘概念与技术[M].北京机械工业出版社,2007.
    [16]Paolo Giudici著,袁方等译.实用数据挖掘[M].北京:电子工业出版社,2004.
    [17]Bing Liu著,俞勇,薛贵荣,韩定一等译.Web数据挖掘[M].北京:清华大学出版社,2009:87-109.
    [18]Mohammed J, Zaki, Ching-Jui Hsiao. An Efficient Algorithm for Closed Itemset Mining[C]. Proceeding of the 2002 SIAM international conference on datamining(SDM'O2), Arlington, VA,2002:457-473.
    [19]K.Borne. Scientific Data Mining in Astronomy [J].Data Mining and Knowledge Discovery Series.CRC Press, Boca Raton, FL,2009:91-114.
    [20]W.Lee and S.J.Staolfo. Data Mining Approaches for Intrusion Detection[C]. In 7th USENIX Security Symposium,1998:26-29.
    [21]朱德利SQL Server 2005数据挖掘与商业智能完全解决方案[M].电子工业出版社,2007.
    [22]郭志懋,周傲英. 数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2083.
    [23]徐俊刚,裴莹.数据ETL研究综述[J].计算机科学,2011,38(4):15-16.
    [24]王淑娜.金融数据仓库中ETL的设计与实现[D].北京邮电大学,2010.
    [25]陈燕.数据仓库与数据挖掘[M].大连:大连海事大学出版社,2007.
    [26]Quinlan J R. Induction of decision tree[J]. Machine Learning,1986,1(1):81-106.
    [27]Quinlan J R. Programs from machine learning Morgan Kauffman[J].1993,5(3): 27-35.
    [28]Kotsiantis S, Pintelas P. Increasing the Classification Accuracy of Simple Bayesian Classifier[G]. Lecture Notes in Artificial Intelligence, AMSA Springer-Verlag2004,3192:198207.
    [29]A.McCallum, K. Nigam. A comparison of event models for Naive Bayes text Classfication[C]. AAAI-98 Workshop on Learning for Text Categorization, 1998:41-48.
    [30]T. Joachims. Text categorization with support vector machines:Learning with many relevant features[C]. In Machine Learing:ECML-98, Tenth European Conference on Machine Learning,1998:136-142.
    [31]S.Zanero. Unsupervised Learning Algorithms for Intrusion Detection[J]. PhD thesis, Politecnico di Milano T. U., Milano, Italy,2006.
    [32]P. Slobodan, A. Gonzalo, O. Agustin, et al. Labeling Clusters in an Intrusion Detection System Using a combination of Clustering Evaluation Techniques[C]. Proceedings of the 39th Hawaii International Conference on System Sciences, IEEE,2006:128-137.
    [33]J. Hipp, U. Guntzer and G. Nakhaeizadeh. Algorithms for Association Rule Mining-A General Survey and Comparison[C]. ACM SIGKDD Explorations, 2000,2:58-64.
    [34]M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules[C]. Proc.3rd Int. Conf. on Knowledge Discovery and Data Mining,1997.
    [35]Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan. Frequent pattern mining:current status and future directions[J].Data Mining and Knowledge Discovery. Springer, 2007,15:55-86.
    [36]Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation[C]. In Proceedings of ACM SIGMOD,2000:1-12.
    [37]Agrawal R, Imielinski T, Swani A. Mining association rules between sets of items in large databases[C]. Proc of ACM SIGMOD Conference on Management of Data 1993:63-65.
    [38]Han Jia-wei, Pei Jian, Yin Yin-wen, et al. Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery,2004,8(1):53-87.
    [39]Agrawal R, Srikant R. Mining sequential patterns[C]. Proceedings of the 11th International Conference of Data Engineering. United States:IEEE,1995:3-14.
    [40]Srikant R, Agrawal R. Mining sequential patterns; generalizations and performance improvements[C]. EDBT 96:Proceedings of the 5th International Conference on Extending Database Technology:Advances in Database Technology. Berlin:Springer-Verlag,1996:3-17.
    [41]Zaki M. Spade:an efficient algorithm for mining frequent sequences[J]. Machine Learning,2001,42(1):31-62.
    [42]Pei Jian, Han Jiawei, Mortazavi-Asl B, et al. Mining Sequential Patterns by Pattern-growth:The PrefixSpan Approach[J]. IEEE Trans, on Knowledge and Data Engineering,2004,16(11):1424-1440.
    [43]汪林林,范军.基于PrefixSpan的序列模式挖掘改进算法[J].计算机工程,2009,23(35):57-61.
    [44]米帅军.数据挖掘技术在保险行业中的应用研究[D].华东交通大学,2010.
    [45]P.Adriaans, D.Zantinge. Data Mining[M]. Addison Wesley:Harlow,England,1996.
    [46]袁新昌,陈建中.商务智能技术在社保领域的应用研究[J].计算机与现代化,2009(6):171-174.
    [47]赵蕊.基于WEKA平台的决策树算法设计与实现[D].长沙:中南大学,2007.
    [48]刘文凤,卿晓霞Chameleon聚类算法的Weka实现[J].计算机系统应用,2010,12(19):246-250.
    [49]陈彗萍,林莉莉,王建东等WEKA数据挖掘平台及其二次开发[J].计算机工程与应用,2008,44(19):76-79.
    [50]A.K.Jain, M.N.Murty, P .J.Flynn. Data Clustering:A Review[J]. ACM Computing Surveys,1999,31(3):264-323.
    [51]Pang Ning, Tan Michael Steinbach, Vipin Kumar著,范明,范宏建等译.数据挖掘导论[M].人民邮电出版社,2006.
    [52]Krishnapuram R, Kell J M. A possilistic approach to clustering[J]. IEEE Transactions of Fuzzy Systems,1993,1(2):98-110.
    [53]Guha S, Mishra N, Motwani R, et al. Clustering data streams[C]. Proc of the Annual Symp on Foundations of Computer Science,2000:359-366.
    [54]曾超群.基于聚类算法的数据挖掘技术的研究[D].长沙:中南大学,2010.
    [55]J. Hartigan and M. Wong. Algorithm 136:A kmeans clustering algorithm[J]. Applied Statistics,1979,28:100-108.
    [56]A. Likas, N. Vlassis and J. Verbeek. The global K-means clustering algorithm[J]. Pattern Recognition,2003,36(2):451-461.
    [57]Huang Z, Ngm. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Transaction on fuzzy systems,1999.445-453.
    [58]Huang, Zhexue. Extensions to the K-means algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998.283-312.
    [59]王文华.聚类分类算法研究及其应用[D].浙江大学,2009.
    [60]万小军,杨建武,陈晓鸥.文档聚类中k-means算法的一种改进算法[J].计算机工程,2003,29(2):102-103.
    [61]T. Zhang, R. Ramakrishnan and M. Livny. Brinch:an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data,1996:103-114.
    [62]S. Guha, R. Rastogi and K. Shim. CURE:a clustering algorithm for large databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data,1998:73-84.
    [63]S Guha. R Rastogi, K Shim. ROCK:a robust clustering algorithm for categorical attributes[C]. Proceeding of the 15th International Conference of Data Engineering,1999:23-26.
    [64]G. Karypis, E. Han and V. Kumar. Chameleon:hierarchical clustering using dynamic modeling[J]. IEEE Computer,1999,32:68-74.
    [65]M. Ester, H.-P. Kriegel, J. Sander and X.. Xu. A density-based algorithm for discovering clusters in large spatial database with noise[C]. International Conference on Knowledge Discovery in Databases and Data Mining,1996: 226-231.
    [66]M.Ankerst, MM. Breunig, HP. Kriegel and J. Sander. OPTICS:ordering points to identify the clustering structure[J]. ACM SIGMOD Record,1999,28(2):49-60.
    [67]A. Hinneburg and D. Keim. An efficient approach to clustering in largemultimedia databases with noise[C]. Proceeings of the 4th International Conference on Knowledge Discovery and Data Mining,1998:58-65.
    [68]Mallat S, Zhong S. Characterization of signals from multiscale edges[J]. IEEE Trans Pattern and Machine Intell,1992, PAMI 14(7):710-732.
    [69]茆诗松.高等数理统计[M].北京:高等教育出版社,1998.
    [70]A. P. Dempster, N.M. Laird and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm[J]. J. R. Statist. Soc. B.1977(39):1-39
    [71]岳佳.基于EM算法的模型聚类的研究及应用[D].江南大学,2007.
    [72]杨基栋.EM算法理论及其应用[J].安庆师范学院学报:自然科学版,2009,15(4):30-33.
    [73]史鹏飞.基于改进EM算法呢的混合模型参数估计及聚类分析[D].西北大学,2009.
    [74]郭志毅.基于EM算法的半监督文本分类方法研究[D].重庆邮电大学,2010.
    [75]雅各布,米斯勒SQL Server 2005分析服务从入门到精通[M].清华大学出 版社,2006.
    [76]郭醒.基于SSAS的数据挖掘算法研究与实现[D].吉林大学,2007.
    [77]Jamie MacLennan, ZhaoHui Tang, Bogdan Crivat. Data Mining with Microsoft SQL Server 2008[M]. Wiley Publishing,2009:310-318.
    [78]ZhaoHui Tang,Jamie MacLennan数据挖掘原理与应用------SQL Server2005数据库[M].邝祝芳,焦贤龙,高升译,北京:清华大学出版社,2007.
    [79]基于DMX的数据挖掘算法包原型的设计与实现[J].计算机技术与发展,2011,3(21):120-124.
    [80]张佳文,乐嘉锦.MD-SQL:一种基于MDX的多维数据查询语言[J].计算机科学,2008,3(35):68-70.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700