基于篇章结构的英文作文自动评分方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于篇章结构的英文作文自动评分方法

详细信息查看全文 | 推荐本文 |

英文篇名：English Automated Essay Scoring Methods Based on Discourse Structure
作者：周明 ; 贾艳明 ; 周彩兰 ; 徐宁
英文作者：ZHOU Ming;JIA Yan-ming;ZHOU Cai-lan;XU Ning;School of Computer Science and Technology,Wuhan University of Technology;Hubei Key Laboratory of Transportation Internet of Things,Wuhan University of Technology;Research Center for Artificial Intelligence and Big Data,Global Wisdom Inc;
关键词：作文自动评分 ; 篇章成分 ; 篇章结构分析 ; 自然语言处理 ; 随机森林 ; 线性回归
英文关键词：Automated essay scoring;;Discourse element;;Discourse structure analysis;;Natural language processing;;Random forest;;Linear regression
中文刊名：JSJA
英文刊名：Computer Science
机构：武汉理工大学计算机科学与技术学院;武汉理工大学交通物联网技术湖北省重点实验室;北京博智天下信息技术有限公司人工智能与大数据研究中心;
出版日期：2019-03-15
出版单位：计算机科学
年：2019
期：v.46
语种：中文;
页：JSJA201903035
页数：8
CN：03
ISSN：50-1075/TP
分类号：240-247

摘要

作文自动评分(Automated Essay Scoring AES)是指使用统计学、自然语言处理及语言学等领域的技术对作文进行评价和评分的系统。篇章结构分析是自然语言处理领域的一个重要研究方向,也是作文自动评分系统的重要组成部分之一。目前国外的作文自动评分系统虽有广泛应用,但对篇章结构评分的研究还存在不足,且对中国学生英语作文的针对性不强;国内对英语作文自动评分的研究处于起步阶段,忽视了篇章结构对英语作文评分的重要性。针对这些问题,提出一种基于篇章结构的英文作文自动评分方法,在词、句、段落3个层面上提取作文的词汇、句法以及结构等特征,并使用支持向量机、随机森林以及极端梯度上升等算法对篇章成分进行分类,最后构建线性回归模型对作文的篇章结构进行评分。实验结果表明,基于随机森林的篇章成分识别模型(Discourse Element Identification based Random Forest,DEI-RF)的准确率为94.13%;基于线性回归的篇章结构自动评分模型(Discourse Structures Scoring based Linear Regression,DSS-LR)在背景介绍段(Introduction)、论证段(Argumentation)以及让步段(Concession)的均方差可达到0.02,0.11和0.08。
Automated essay scoring is defined as the computer technology that evaluates and scores the composition,based on the technologies of statistics,natural language processing,linguistics and some other fields.Discourse structure analysis is not only an important research field of natural language processing,but also an important component of the AES system.Nowadays,AES system has widely application.However,there is not enough research on the structure of the essay,and the AES system does not focus on the Chinese students.The domestic researches on the AES are in infancy,ignoring the importance of discourse structure in essay scoring.In view of these problems,this paper proposed a method of automated essay scoring based on discourse structure.Firstly,the method extracts essay's features,such as vocabulary,lexical and discourse structure from levels of words,sentences and paragraphs.Then,the composition of essays is classified by support vector machines,random forests and extreme gradient boosting,and then the linear regression model with the discourse element is constructed to score the compositions.The experimental results show that the accuracy of discourse element identification based random forest(DEI-RF) can reach 94.13%,and the mean squared error of automated discourse structure scoring based on linear regression(DSS-LR) model can reach 0.02,0.11 and 0.08 on introduction,argumentation and concession respectively.

引文

[1] STAB C,GUREVYCH I.Parsing Argumentation Structures in Persuasive Essays[J].Computational Linguistics,2017,43(3):619-659.
    [2] STAB C,GUREVYCH I.Identifying Argumentative Discourse Structures in Persuasive Essays[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Proces-sing (EMNLP).2014:46-56.
    [3] SONG W,FU R,LIU L,et al.Discourse Element Identification in Student Essays based on Global and Local Cohesion[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:2255-2261.
    [4] BURSTEIN J,MARCU D,KNIGHT K.Finding the WRITE Stuff:Automatic Identification of Discourse Structure in Student Essays[J].IEEE Intelligent Systems,2003,18(1):32-39.
    [5] YIGAL A,JILL B.Automated Essay Scoring with E-rater? v.2.0 [J].The Journal of Technology,Learning,and Assessment,2006,4(2):1-21.
    [6] PALTRIDGE B.Discourse Analysis for the Second Language Writing Classroom[M]//The TESOL Encyclopedia of English Language Teaching.John Wiley & Sons,Inc.,2017.
    [7] HSIEH C J,CHANG K W,LIN C J,et al.A dual coordinate descent method for large-scale linear SVM [C]//International Conference on Machine Learning.Helsinki,Finland:IEEE press,2008:1369-1398.
    [8] BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
    [9] CHEN T,GUESTRIN C.XGBoost:A Scalable Tree Boosting System[C]//Acm SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2016:785-794.
    [10] MANN W.Rhetorical Structure Theory:Toward a Functional Theory of Text Organization[J].Text & Talk,2009,8(3):243-281.
    [11] DUVERLE D A,PRENDINGER H.A novel discourse parser based on support vector machine classification[C]//Internatio-nal Joint Conference on Natural Language Processing of the Afnlp.ACL,2010:665-673.
    [12] FENG V W,HIRST G.A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing[C]//Proceeding of the 52nd Annual Meeting of the Association for Computational Linguistics.ACL,2014:511-521.
    [13] YAN W R,XU Y,ZHU S S,et al.A Survey to Discourse Relation Analyzing[J].Journal of Chinese Information Processing,2016,30(4):1-11.(in Chinese)严为绒,徐扬,朱珊珊,等.篇章关系分析研究综述[J].中文信息学报,2016,30(4):1-11.
    [14] LI S,KONG F,ZHOU G D.A PDTB-Based Automatic Explicit Discourse Parser[J].Journal of Chinese Information Processing,2016,30(2):18-25.(in Chinese)李生,孔芳,周国栋.基于PDTB的自动显式篇章分析器[J].中文信息学报,2016,30(2):18-25.
    [15] XU F,ZHU Q M,ZHOU G D.Implicit discourse relation recognition based on tree kernel[J].Chinese Journal of Software,2013,24(5):1022-1035.(in Chinese)徐凡,朱巧明,周国栋.基于树核的隐式篇章关系识别[J].软件学报,2013,24(5):1022-1035.
    [16] JIANG Y R,SONG R.Topic clause identification method based on specific features[J].Journal of Computer Applications,2014,34(5):1345-1349.(in Chinese)蒋玉茹,宋柔.基于细粒度特征的话题句识别方法[J].计算机应用,2014,34(5):1345-1349.
    [17] BIRAN O,RAMBOW O.Identifying Justifications in Written Dialogs[J].International Journal of Semantic Computing,2011,5(4):363-381.
    [18] XING Y K,MA S P.A Survey on Statistical language Models[J].Computer Science,2003,30(9):22-26.(in Chinese)邢永康,马少平.统计语言模型综述[J].计算机科学,2003,30(9):22-26.
    [19] PRASAD R,MILTSAKAKI E,DINESH N,et al.The penn discourse treebank 2.0 annotation manual[J].Proceedings of Lrec,2007,24(1):2961-2968.
    [20] PALAU R M,MOENS M F.Argumentation mining:the detection,classification and structure of arguments in text[C]//International Conference on Artificial Intelligence and Law.ACM,2009:98-107.
    [21] 周志华.机器学习[M].北京:清华大学出版社,2016.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700