用户名: 密码: 验证码:
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     本文首先总结了目前在社会医疗保险管理上遇到的问题。介绍了常用的数据挖掘方法,对聚类分析进行深入的探讨。研究了EM算法,并在选取初始化参数和最大化步骤上对EM算法进行改进。在SQL Server2008数据挖掘平台基础上,实现改进EM算法,并通过插件方式注册到分析服务器算法库中,通过与EM算法的比较,得出改进EM算法的有效性。随后,在SQL Server2008上创建面向医疗消费费用异常检测主题的数据仓库,生成多维数据集,用改进EM算法创建数据挖掘模型,在此模型上,利用predict函数取得各种医疗消费费用的预测值,并通过一系列的数据分析规则,得出医疗消费费用的异常记录。同时,标注真实的异常消费记录作为跟踪,比较改进EM算法在查找异常记录上的查准率,证实了该算法在实际应用中的有效性。本文的内容作为一种尝试性的探测,得出一种取得医疗费用异常消费记录的分析方法。
Social health insurance has been assumed the protection of basic medical duties. With the continual investment of social health insurance, the management system of social health insurance is developed fast. The information management system of social security provides convenience for the management of social security. The number of insured people is increased, and different parts of the social security system have been accumulated a large amount of historical data. With the rapidly development of data mining technology, the theories are became matured, the data processing functions have been accepted and applied to data analysis. The introduction of data mining techniques in the social security management, excessive sedimentation of historical data for processing, to find out some hidden information and rules in large data sets, also, to prevent the risk of social security fund and control the security operations.
     In this thesis, the author first summarizes the problems sometimes encountered in the management of the social security medical insurance. Then, the theory of data mining methods, and cluster analysis is discussed. Later, introducing EM algorithm process, using the improved EM algorithm to select the initial parameters and maximize step simplifies the EM algorithm. SQL Server2008data mining platform on the basis of improved EM algorithm and plug-ins registered to the analysis server algorithm library, by comparing with the EM algorithm, improved the validity of the EM algorithm. Subsequently, the data warehouse for medical expenses anomaly detection topic in SQL Server2008to be created, by using a modified EM algorithm to create data mining models, this model, predict function to obtain the predictive value of various medical and consumer costs and data analysis through a series of rules, drawing the exception record of the costs of medical consumption. At the same time, the true abnormal consumption records are improved the EM algorithm and found out the exception record precision, the validity of the algorithm in practical applications is confirmed. The contents of this article obtained an analytical method to obtain the medical costs of abnormal consumption record.
     The main research of this thesis are as follows, firstly improved the EM algorithm, and validation to improve the effectiveness of the EM algorithm, secondly, created the data warehouse and generated the cube in the themes of detect the abnormal information in medical consumption of social security, and completed data mining on this basis analysis process, at last we have got the conclude that the result was useful. It's an effective method of detect medical consumption abnormal information.
    [8]Y. M. Chae, S. H. Ho, K. W. Cho, D.H. Lee, S. H. Ji. Data mining approach to policy analysis in a health insurance domain[J]. International Journal of Medical informatics,2001,62:103-111.
    [9]J. Li, K.Y. Huang, J. Jin and J. Shi. A survey on statistical methods for health care fraud detection[J]. Health Care Management Science,2008,11(3):275-287.
    [10]Willian H,Inmon著,王志海等译.数据仓库[M].北京机械工业出版社,2006.
    [15]Jiawei Han,Micheline Kamber著,范明,孟晓峰等译.数据挖掘概念与技术[M].北京机械工业出版社,2007.
    [16]Paolo Giudici著,袁方等译.实用数据挖掘[M].北京:电子工业出版社,2004.
    [17]Bing Liu著,俞勇,薛贵荣,韩定一等译.Web数据挖掘[M].北京:清华大学出版社,2009:87-109.
    [18]Mohammed J, Zaki, Ching-Jui Hsiao. An Efficient Algorithm for Closed Itemset Mining[C]. Proceeding of the 2002 SIAM international conference on datamining(SDM'O2), Arlington, VA,2002:457-473.
    [19]K.Borne. Scientific Data Mining in Astronomy [J].Data Mining and Knowledge Discovery Series.CRC Press, Boca Raton, FL,2009:91-114.
    [20]W.Lee and S.J.Staolfo. Data Mining Approaches for Intrusion Detection[C]. In 7th USENIX Security Symposium,1998:26-29.
    [21]朱德利SQL Server 2005数据挖掘与商业智能完全解决方案[M].电子工业出版社,2007.
    [22]郭志懋,周傲英. 数据质量和数据清洗研究综述[J].软件学报,2002,13(11):2076-2083.
    [26]Quinlan J R. Induction of decision tree[J]. Machine Learning,1986,1(1):81-106.
    [27]Quinlan J R. Programs from machine learning Morgan Kauffman[J].1993,5(3): 27-35.
    [28]Kotsiantis S, Pintelas P. Increasing the Classification Accuracy of Simple Bayesian Classifier[G]. Lecture Notes in Artificial Intelligence, AMSA Springer-Verlag2004,3192:198207.
    [29]A.McCallum, K. Nigam. A comparison of event models for Naive Bayes text Classfication[C]. AAAI-98 Workshop on Learning for Text Categorization, 1998:41-48.
    [30]T. Joachims. Text categorization with support vector machines:Learning with many relevant features[C]. In Machine Learing:ECML-98, Tenth European Conference on Machine Learning,1998:136-142.
    [31]S.Zanero. Unsupervised Learning Algorithms for Intrusion Detection[J]. PhD thesis, Politecnico di Milano T. U., Milano, Italy,2006.
    [32]P. Slobodan, A. Gonzalo, O. Agustin, et al. Labeling Clusters in an Intrusion Detection System Using a combination of Clustering Evaluation Techniques[C]. Proceedings of the 39th Hawaii International Conference on System Sciences, IEEE,2006:128-137.
    [33]J. Hipp, U. Guntzer and G. Nakhaeizadeh. Algorithms for Association Rule Mining-A General Survey and Comparison[C]. ACM SIGKDD Explorations, 2000,2:58-64.
    [34]M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules[C]. Proc.3rd Int. Conf. on Knowledge Discovery and Data Mining,1997.
    [35]Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan. Frequent pattern mining:current status and future directions[J].Data Mining and Knowledge Discovery. Springer, 2007,15:55-86.
    [36]Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation[C]. In Proceedings of ACM SIGMOD,2000:1-12.
    [37]Agrawal R, Imielinski T, Swani A. Mining association rules between sets of items in large databases[C]. Proc of ACM SIGMOD Conference on Management of Data 1993:63-65.
    [38]Han Jia-wei, Pei Jian, Yin Yin-wen, et al. Mining frequent patterns without candidate generation:a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery,2004,8(1):53-87.
    [39]Agrawal R, Srikant R. Mining sequential patterns[C]. Proceedings of the 11th International Conference of Data Engineering. United States:IEEE,1995:3-14.
    [40]Srikant R, Agrawal R. Mining sequential patterns; generalizations and performance improvements[C]. EDBT 96:Proceedings of the 5th International Conference on Extending Database Technology:Advances in Database Technology. Berlin:Springer-Verlag,1996:3-17.
    [41]Zaki M. Spade:an efficient algorithm for mining frequent sequences[J]. Machine Learning,2001,42(1):31-62.
    [42]Pei Jian, Han Jiawei, Mortazavi-Asl B, et al. Mining Sequential Patterns by Pattern-growth:The PrefixSpan Approach[J]. IEEE Trans, on Knowledge and Data Engineering,2004,16(11):1424-1440.
    [45]P.Adriaans, D.Zantinge. Data Mining[M]. Addison Wesley:Harlow,England,1996.
    [50]A.K.Jain, M.N.Murty, P .J.Flynn. Data Clustering:A Review[J]. ACM Computing Surveys,1999,31(3):264-323.
    [51]Pang Ning, Tan Michael Steinbach, Vipin Kumar著,范明,范宏建等译.数据挖掘导论[M].人民邮电出版社,2006.
    [52]Krishnapuram R, Kell J M. A possilistic approach to clustering[J]. IEEE Transactions of Fuzzy Systems,1993,1(2):98-110.
    [53]Guha S, Mishra N, Motwani R, et al. Clustering data streams[C]. Proc of the Annual Symp on Foundations of Computer Science,2000:359-366.
    [55]J. Hartigan and M. Wong. Algorithm 136:A kmeans clustering algorithm[J]. Applied Statistics,1979,28:100-108.
    [56]A. Likas, N. Vlassis and J. Verbeek. The global K-means clustering algorithm[J]. Pattern Recognition,2003,36(2):451-461.
    [57]Huang Z, Ngm. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Transaction on fuzzy systems,1999.445-453.
    [58]Huang, Zhexue. Extensions to the K-means algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998.283-312.
    [61]T. Zhang, R. Ramakrishnan and M. Livny. Brinch:an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data,1996:103-114.
    [62]S. Guha, R. Rastogi and K. Shim. CURE:a clustering algorithm for large databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data,1998:73-84.
    [63]S Guha. R Rastogi, K Shim. ROCK:a robust clustering algorithm for categorical attributes[C]. Proceeding of the 15th International Conference of Data Engineering,1999:23-26.
    [64]G. Karypis, E. Han and V. Kumar. Chameleon:hierarchical clustering using dynamic modeling[J]. IEEE Computer,1999,32:68-74.
    [65]M. Ester, H.-P. Kriegel, J. Sander and X.. Xu. A density-based algorithm for discovering clusters in large spatial database with noise[C]. International Conference on Knowledge Discovery in Databases and Data Mining,1996: 226-231.
    [66]M.Ankerst, MM. Breunig, HP. Kriegel and J. Sander. OPTICS:ordering points to identify the clustering structure[J]. ACM SIGMOD Record,1999,28(2):49-60.
    [67]A. Hinneburg and D. Keim. An efficient approach to clustering in largemultimedia databases with noise[C]. Proceeings of the 4th International Conference on Knowledge Discovery and Data Mining,1998:58-65.
    [68]Mallat S, Zhong S. Characterization of signals from multiscale edges[J]. IEEE Trans Pattern and Machine Intell,1992, PAMI 14(7):710-732.
    [70]A. P. Dempster, N.M. Laird and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm[J]. J. R. Statist. Soc. B.1977(39):1-39
    [75]雅各布,米斯勒SQL Server 2005分析服务从入门到精通[M].清华大学出 版社,2006.
    [77]Jamie MacLennan, ZhaoHui Tang, Bogdan Crivat. Data Mining with Microsoft SQL Server 2008[M]. Wiley Publishing,2009:310-318.
    [78]ZhaoHui Tang,Jamie MacLennan数据挖掘原理与应用------SQL Server2005数据库[M].邝祝芳,焦贤龙,高升译,北京:清华大学出版社,2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700