摘要
【目的】利用HS编码数据中所蕴含的规律,为海关税收风险判断分析提供有效的知识服务。【方法】提出直接以HS编码作为风险判别目标和以HS编码正误作为风险判别目标两种基于机器学习的自动分类方案解决HS编码风险判断问题,针对编码目标的结构、特征的性质、文本的长短等特征构建与方案对应的SVM预测模型并进行相应实验。【结果】对以HS编码作为判别目标和以HS编码正误作为判别目标两种预测海关报关风险方案进行探讨与分析,发现后者对训练数据的要求更低,预测速度更快,风险的识别效果也更好。【局限】仅获得4个月的数据,可能存在样本代表性不足的问题。【结论】最终经过测试获得风险预测率较高的分类器,为形成可实用的分类模型和判别系统提供了良好的知识基础。
[Objective] This study tries to utilize patterns from the HS codes to provide effective knowledge service for the China customs taxation. [Methods] We proposed two machine learning-based automatic classification schemes. The first one directly used original HS codes as risk identifiers while the other one relied on the correctness of the HS codes. We also built a SVM prediction model and examined the two schemes from the perspectives of target structures and features, as well as the text length. [Results] We found that the second model required less training efforts and processing time and then reached better accuracy. [Limitations] Only used four-month-data to train the new models. [Conclusions] This study finds an effective way to forecast customs risks, and indicate directions of applicable products.
引文
[1]Zhang S,Zhao S.The Implication of Customs Modernization on Export Competitiveness in China[A]//Impact of Trade Facilitation on Export Competitiveness:A Regional Perspective[M].2009,66:121-131.
[2]Laporte B.Risk Management Systems:Using Data Mining in Developing Countries’Customs Administrations[J].World Customs Journal,2011,5(1):17-27.
[3]白雪燕.中国海关概论[M].北京:中国海关出版社,2011.(Bai Xueyan.Introduction to China Customs[M].Beijing:China Custom Press,2011.)
[4]Pierce J R,Schott P K.A Concordance Between Ten-Digit U.S.Harmonized System Codes and Sic/Naics Product Classes and Industries[J].Journal of Economic and Social Measurement,2012,37(1-2):61-96.
[5]海关总署.中华人民共和国海关报关员执业管理办法[J].中华人民共和国国务院公报,2007(12):30-33.(General Administration of Customs.The Customs of the People’s Republic of China.Customs Administration Measures[J].The State Council of the People’s Republic of China,2007(12):30-33.)
[6]周欣,张弛海.基于数据挖掘的海关风险分类预测模型研究[J].海关与经贸研究,2017,38(2):22-31.(Zhou Xin,Zhang Chihai.Customs Risk Classification and Forecasting Model Based on Data Mining[J].Journal of Customs and Trade,2017,38(2):22-31.)
[7]卢金秋.数据挖掘中的人工神经网络算法及应用研究[D].杭州:浙江工业大学,2005.(Lu Jinqiu.Research and Application on Artificial Neural Network Algorithm in Data Mining[D].Hangzhou:Zhejiang University of Technology,2005.)
[8]杨海.现代海关制度建设中的难点及对策研究[D].武汉:华中科技大学,2008.(Yang Hai.A Research on Crux and the Counterplan Within Construction of Modern Customs System[D].Wuhan:Huazhong University of Science and Technology,2008.)
[9]马俊.基于关联规则的海关审单商品分组研究[D].大连:大连理工大学,2006.(Ma Jun.ARM-Based Research on Commodity Grouping for Customs Documents Checking[D].Dalian:Dalian University of Technology,2006.)
[10]唐麒麟,李长生.美国海关“预进口复审系统”简介[J].中国海关,1994(11):44-45.(Tang Qilin,Li Changsheng.Introduction of U.S.Customs“Pre-import Review System”[J].China Custom,1994(11):44-45.)
[11]Zdanowicz J S.Detecting Money Laundering and Terrorist Financing via Data Mining[J].Communications of the ACM,2004,47(5):53-55.
[12]Hoffmann L.A Critical Look at the Current International Response to Combat Trade-Based Money Laundering:The Risk-Based Customs Audit as a Solution[J].Texas International Law Journal,2013,48(2):325.
[13]操辉.韩国海关全心开发风险管理系统[J].中国海关,2001(7):60-61.(Cao Hui.South Korean Customs Devotes Heart to Risk Management System[J].China Custom,2001(7):60-61.)
[14]张荣忠.印度海关的巨大进步[J].中国海关,2004(8):46-47.(Zhang Rongzhong.The Great Progress of Indian Customs[J].China Custom,2004(8):46-47.)
[15]Coundoul O,Gadiaga M,Geourjon A M,et al.Inspecting Less to Inspect Better:The Use of Data Mining for Risk Management by Customs Administrations[R].Working Papers,2012:46.
[16]Shao H,Zhao H,Chang G.Applying Data Mining to Detect Fraud Behavior in Customs Declaration[C]//Proceedings of the 2002 International Conference on Machine Learning and Cybernetics,2002:1241-1244.
[17]任尔伟,牟青杰,孙学文.数据挖掘技术在海关查验和价格瞒骗辅助决策中的应用[J].上海海关高等专科学校学报,2002(3):58-61.(Ren Erwei,Mou Qingjie,Sun Xuewen.Application of Data Mining Technology in Customs Inspection and Price-cheat Assistant Decision-making[J].Journal of Shanghai Customs College,2002(3):58-61.)
[18]张云波,邓波,苏锦秀.数据挖掘在海关商品查验中的应用[J].上海海关高等专科学校学报,2003(2):51-55.(Zhang Yunbo,Deng Bo,Su Jinxiu.Application of Data Mining in Customs Inspection[J].Journal of Shanghai Customs College,2003(2):51-55.)
[19]卢金秋.人工神经网络在海关风险管理中的应用研究[J].计算机工程与应用,2006,42(27):208-211.(Lu Jinqiu.Application Research on Customs Risk-management Based on Artificial Neural Networks[J].Computer Engineering and Applications,2006,42(27):208-211.)
[20]喻宇.重庆海关进出口数据挖掘与分析[D].重庆:重庆大学,2008.(Yu Yu.Mining and Analysising of Chongqing Customs’Import and Export Data[D].Chongqing:Chongqing University,2008.)
[21]杨波.关于进出口商品归类风险的成因探析和防范[J].海关与经贸研究,2016,37(1):59-81.(Yang Bo.Cause and Prevention of the Risks in Import and Export Commodities Classification[J].Journal of Customs and Trade,2016,37(1):59-81.)
[22]刘昌伟,段景辉.基于因子分析法的海关风险管理评价分析[J].海关与经贸研究,2016,37(6):27-42.(Liu Changwei,Duan Jinghui.On Evaluation of Customs Risk Management on the Basis of Factor Analysis[J].Journal of Customs and Trade,2016,37(6):27-42.)
[23]张亦鸣.1996年版《商品名称及编码协调制度》对我国进出口税则的影响[J].中国海关,1995(2):27-28.(Zhang Yiming.The Influence of the 1996 Version of the Harmonized Commodity Name and Coding System on China’s Import and Export Tariffs[J].China Custom,1995(2):27-28.)
[24]王克海.大规模产品生产作业计划作业事项号的自动生成[J].系统工程理论与实践,1994(8):51-55.(Wang Kehai.The Automatic Generation of the Event Number for the Large-Scale Producting Task Schedule[J].Systems Engineering-Theory&Practice,1994(8):51-55.)
[25]陈东明,常桂然.基于分段编码自动生成产品结构树的研究[J].计算机集成制造系统,2005,11(7):1014-1018.(Chen Dongming,Chang Guiran.Automatic Creation of Product Structure Tree Based on Segment Coding[J].Computer Integrated Manufacturing Systems,2005,11(7):1014-1018.)
[26]王昊,严明,苏新宁.基于机器学习的中文书目自动分类研究[J].中国图书馆学报,2010,36(6):28-39.(Wang Hao,Yan Ming,Su Xinning.Research on Automatic Classification of Chinese Language Items Based on Machine Learning[J].Journal of Library Science in China,2010,36(6):28-39.)
[27]Wang J,Lee M C.Reconstructing DDC for Interactive Classification[C]//Proceedings of the 16th ACM Conference on Information and Knowledge Management.ACM,2007:137-146.
[28]Koller D,Sahami M.Hierarchically Classifying Documents Using Very Few Words[C]//Proceedings of the 14th International Conference on Machine Learning.1997:170-178.
[29]Zimek A,Buchwald F,Frank E,et al.A Study of Hierarchical and Flat Classification of Proteins[J].IEEE/ACMTransactions on Computational Biology&Bioinformatics,2010,7(3):563-571.
[30]王昊,叶鹏,邓三鸿.机器学习在中文期刊论文自动分类研究中的应用[J].现代图书情报技术,2014(3):80-87.(Wang Hao,Ye Peng,Deng Sanhong.The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J].New Technology of Library and Information Service,2014(3):80-87.)
[31]谢小楚.数据挖掘技术在海关缉私系统中的设计与应用[D].北京:北京工业大学,2007.(Xie Xiaochu.The Design and Application of Data Mining Technology in Customs Smuggling Systems[D].Beijing:Beijing University of Technology,2007.)
[32]严俊龙,李铁源.基于SVM的网络安全风险评估模型及应用[J].计算机与数字工程,2012,40(1):82-84.(Yan Junlong,Li Tieyuan.Assessing Model of Network Security Risk Based on SVM[J].Computer and Digital Engineering,2012,40(1):82-84.)
[33]罗方科,陈晓红.基于Logistic回归模型的个人小额贷款信用风险评估及应用[J].财经理论与实践,2017,38(1):30-35.(Luo Fangke,Chen Xiaohong.Credit Risk Assessment of Personal Small Loan Based on Logistic Regression Model and Its Application[J].The Theory and Practice of Finance and Economics,2017,38(1):30-35.)
[34]海关总署关税征管司.进出口税则商品及品目注释[M].北京:中国商务出版社,2011.(Customs Administration Department.Import and Export Tariff Notes on Commodities and Products[M].Beijing:China Business Press,2011.)
[35]陆跃平.《商品名称及编码协调制度》及其公约介绍[J].国际贸易,1992(1):51-53.(Lu Yueping.“Commodity Name and Coding Coordination System”and Its Convention Introduction[J].International Trade,1992(1):51-53.)
[36]中华人民共和国海关进出口税则编委会.中华人民共和国海关进出口税则[M].北京:经济日报出版社,2012.(Customs Import and Export Tariff Editorial Board of the People’s Republic of China.Customs Import and Export Tariff of the People’s Republic of China[M].Beijing:Economic Daily Press,2012.)
[37]海关总署统计司.中华人民共和国海关统计商品目录[M].北京:中国海关出版社,2014.(Statistical Department of the General Administration of Customs.Catalogue of Customs Statistics of the People’s Republic of China[M].Beijing:China Customs Press,2014.)
[38]陆彦婷,陆建峰,杨静宇.层次分类方法综述[J].模式识别与人工智能,2013,26(12):1130-1139.(Lu Yanting,Lu Jianfeng,Yang Jingyu.A Survey of Hierarchical Classification Methods[J].Pattern Recognition and Artificial Intelligence,2013,26(12):1130-1139.)
[39]李森.层次化文本分类方法的研究[D].济南:山东大学,2007.(Li Sen.Research on Hierarchy Document Classification[D].Jinan:Shandong University,2007.)
[40]Mc Callum A,Rosenfeld R,Mitchell T M,et al.Improving Text Classification by Shrinkage in a Hierarchy of Classes[C]//Proceedings of the 15th International Conference on Machine Learning.1998:359-367.
[41]胥丽娜.海关商品归类错误的风险及其防范[J].对外经贸实务,2015(11):70-73.(Xu Lina.The Risk of Misclassification of Customs Commodities and Its Prevention[J].Practice in Foreign Economic Relations and Trade,2015(11):70-73.)
[42]Joachims T.Making Large-Scale SVM Learning Practical[R].Advances in Kernel Methods-Support Vector Learning,DOI:10.17877/DE290R-14262.
[43]Leslie C,Eskin E,Noble W S.The Spectrum Kernel:A String Kernel for SVM Protein Classification[J].Pacific Symposium on Biocomputing,2002:564-575.
[44]曹予思.我国海关查验工作绩效评估的研究[D].北京:中央财经大学,2010.(Cao Yusi.Study on Performance Evaluation of China Customs Inspection Work[D].Beijing:Central University of Finance and Economics,2010.)