用户名: 密码: 验证码:
基于机器学习的车险索赔频率预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Claim Frequency Modeling and Prediction via Machine Learning
  • 作者:曾宇哲 ; 吴嫒博 ; 郑宏远 ; 罗来娟
  • 英文作者:ZENG Yu-zhe;WU Ai-bo;ZHENG Hong-yuan;LUO Lai-juan;School of Statistics,Renmin University of China;
  • 关键词:汽车保险 ; 索赔频率 ; 机器学习 ; 梯度提升 ; 深度学习 ; 神经网络
  • 英文关键词:auto insurance;;claim frequency;;machine learning;;gradient promotion;;neural network;;deep neural network
  • 中文刊名:TJLT
  • 英文刊名:Statistics & Information Forum
  • 机构:中国人民大学统计学院;
  • 出版日期:2019-05-10
  • 出版单位:统计与信息论坛
  • 年:2019
  • 期:v.34;No.224
  • 基金:教育部人文社会科学重点研究基地重大项目《基于大数据的精算统计模型与风险管理问题研究》(16JJD910001);; 中国人民大学2018年度中央高校建设世界一流大学(学科)和特色发展引导专项资金
  • 语种:中文;
  • 页:TJLT201905010
  • 页数:10
  • CN:05
  • ISSN:61-1421/C
  • 分类号:70-79
摘要
近年来,广义线性模型已被广泛用于车险定价,而一些研究结果显示机器学习在某些方面优于广义线性模型,但这些结果都只是基于某个单一数据集。为了更全面地比较广义线性模型与机器学习方法在车险索赔频率预测问题上的效果,对7个车险数据集进行了比较测试,包括深度学习、随机森林、支持向量机、XGboost等机器学习方法;基于相同的训练集,建立不同的广义线性模型预测索赔频率,根据最小信息准则(AIC)选取最优的广义线性模型;通过交叉验证调参获得机器学习最佳参数和模型。研究结果显示:在所有的数据集上XGboost的预测效果一致地优于广义线性模型;对于某些自变量较多、变量间相关性强的数据集,神经网络、深度学习和随机森林的预测效果比广义线性模型更好。
        Generalized linear models(GLM) are traditionally widest used model in auto claim modelling.In recent years some scholars have applied machine learning algorithms to auto claim data,and their study has shown that methods based on machine learning prevail over traditional GLM models in some perspectives,but these results mainly ground on certain datasets.We compare GLM and machine learning algorithms over 7 datasets and mainly focus on frequency prediction.Based on same training sets,we develop several GLM models,and choose the best in the AIC criteria view.Also,we choose the best fitted machine learning method with its parameters through cross validation.Our study shows that XGboost model have a better performance than GLM in all datasets.In the situation of larger number of predictors,higher dependence between variables,Neural Network(NN) and Deep Neural Network(DNN) models are better than GLM models.
引文
[1] 李扬,许文甫,马双鸽.污染数据的稳健稀疏成组变量选择方法研究[J].统计与信息论坛,2018(6).
    [2] 张儒斌,刘树林,张超锋,等.互联网环境下基于消费者搜索的酒店入住率预测研究[J].统计与信息论坛,2018(5).
    [3] 戴之遥.梯度Boosting算法在车险定价中的应用[D].北京:中国人民大学,2017.
    [4] 孟生旺,李天博,高光远.基于机器学习算法的车险索赔概率与累积赔款预测[J].保险研究,2017(10).
    [5] 孟生旺,黄一凡.驾驶行为保险的风险预测模型研究[J].保险研究,2018(8).
    [6] Guelman L.Gradient Boosting Trees for Auto Insurance Loss Cost Modeling and Prediction[J].Expert Systems with Applications,2012,39(3).
    [7] Yang Y,Qian W,Zou H.Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models[J].Journal of Business & Economic Statistics,2018,36(3).
    [8] Sakthivel K M,Rajitha C S.A Comparative Study of Zero-Inflated,Hurdle Models with Artificial Neural Network in Claim Count Modeling[J].International Journal of Statistics and Systems,2017,12(2).
    [9] Sakthivel K M,Rajitha C S.Artificial Intelligence for Estimation of Future Claim Frequency in Non-Life Insurance[J].Global Journal of Pure and Applied Mathematics,2017,13(6).
    [10] Lee S C K,Lin S.Delta Boosting Machine with Application to General Insurance[J].North American Actuarial Journal,2018,22(3).
    [11] 薛薇.R语言数据挖掘方法及应用[M].北京:电子工业出版社,2016:199-202.
    [12] Breiman L,Friedman J H,Olshen R A,et al.Classification and Regression Trees[M].Boston:Wadsworth International Group,1984:342-346.
    [13] Breiman L.Random Forests[J].Machine Learning,2001,45(1).
    [14] Mcculloch W S,Walter P.ALogical Calculus of the Ideas Immanent in Nervous Activity[J].The Bulletin of Mathematical Biophysics,1943,5(4).
    [15] 李航.统计学习方法[M].北京:清华大学出版社,2011:137-138.
    [16] De Jong P,Heller G Z.Generalized Linear Models for Insurance Data[M].Cambridge:Cambridge University Press,2009:34-49.
    [17] Frees E W,Valdez E A.Hierarchical Insurance Claims Modeling[J].Journal of the American Statistical Association,2008,103(484).
    [18] Hallin M,Ingenbleek J F.The Swedish Automobile Portfolio in 1977:A Statistical Study[J].Scandinavian Actuarial Journal,1983,(1).
    [19] Frees E W,Meyers G,Cummings A D.Summarizing Insurance Scores Using a Gini Index[J].Journal of the American Statistical Association,2011,106(495).
    [20] Qian W,Yang Y,Zou H.Tweedie's Compound Poisson Model with Grouped Elastic Net[J].Journal of Computational and Graphical Statistics,2016,25(2).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700