用户名: 密码: 验证码:
基于循环神经网络变体和卷积神经网络的文本分类方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Text classification method based on recurrent neural network variants and convolutional neural network
  • 作者:李云红 ; 梁思程 ; 任劼 ; 李敏奇 ; 张博 ; 李禹萱
  • 英文作者:LI Yunhong;LIANG Sicheng;REN Jie;LI Minqi;ZHANG Bo;LI Yuxuan;School of Electronics and Information,Xi'an Polytechnic University;State Grid Xi'an Power Supply Company;
  • 关键词:文本分类 ; 句向量 ; 循环神经网络 ; 卷积神经网络
  • 英文关键词:text classification;;sentence vector;;recurrent neural network;;convolution neural network
  • 中文刊名:XBDZ
  • 英文刊名:Journal of Northwest University(Natural Science Edition)
  • 机构:西安工程大学电子信息学院;国网西安供电公司;
  • 出版日期:2019-06-11 10:23
  • 出版单位:西北大学学报(自然科学版)
  • 年:2019
  • 期:v.49;No.241
  • 基金:国家自然科学基金资助项目(61471161);; 陕西省科技厅自然科学基础研究重点项目(2016JZ026);; 西安工程大学大学生创新创业项目(chx201824)
  • 语种:中文;
  • 页:XBDZ201904009
  • 页数:7
  • CN:04
  • ISSN:61-1072/N
  • 分类号:93-99
摘要
针对长文本在文本分类时提取语义关键特征难度大,分类效果差等问题,建立基于循环神经网络变体和卷积神经网络(BGRU-CNN)的混合模型,实现中文长文本的准确分类。首先,通过PV-DM模型将文本表示为句向量,并将其作为神经网络的输入;然后,建立BGRU-CNN模型,经双向门控循环单元(B-GRU)实现文本的序列信息表示,利用卷积神经网络(CNN)提取文本的关键特征,通过Softmax分类器实现文本的准确分类;最后,经SogouC和THUCNews中文语料集测试,文本分类准确率分别达到89. 87%和94. 65%。测试结果表明,循环层提取的文本序列特征通过卷积层得到了进一步优化,文本的分类性能得到了提高。
        In view of the long text semantic key features is difficult to extract,poor classification results in a text classification,a mixed model based on recurrent neural network variants and convolutional neural networks( BGRU-CNN) was established to achieve accurate classification of Chinese long texts. First,the text is represented as a sentence vector by PV-DM model as input to the neural network. Then,the BGRU-CNN model is established,the sequence information of the text is represented by the bidirectional gate recurrent unit( BGRU). The key features of the text are extracted by the convolution neural network( CNN),and the text is classified by the Softmax classifier. Finally,by SogouC and THUCNews corpus test,the accuracy of text classification reaches 89. 87% and 94. 65% respectively. The test results show that the text sequence features extracted by the recurrent layer are further optimized through convolution layer,and the classification performance of the text is improved.
引文
[1]魏勇.关联语义结合卷积神经网络的文本分类方法[J].控制工程,2018,25(2):367-370.
    [2] CHRISTOPHER D. Manning,Hinrich Schǖtze. Foundations of statistical natural language processing[J].Journal of Object Technology 151 Matching Approach for Object-Oriented Formal Specifications,1999,26(2):37-38.
    [3]王路琪,龙军,袁鑫攀. WDS:基于词向量的文本相似函数[J].计算机科学,2018,45(11):113-116.
    [4]陈洋,罗智勇.一种基于Hownet的词向量表示方法[J].北京大学学报(自然科学版),2019,55(1):22-28.
    [5] MIKOLOV T,CHEN K,CORRADO G,et al. Efficient estimation of word representations in vector space[C]∥International Conference on Learning Representations. Arizona,USA,2013:1-12.
    [6]熊富林,邓怡豪,唐晓晟. Word2vec的核心架构及其应用[J].南京师范大学学报(工程技术版),2015,15(1):43-48.
    [7] LE Q,MIKOLOV T. Distributed representations of sentences and documents[C]∥Proceedings of the 31st International Conference on International Conference on Machine Learning. Cambridge,MA:MIT Press,2013:1188-1196.
    [8]郭泗辉,樊兴华.一种改进的贝叶斯网络短文本分类算法[J].广西师范大学学报(自然科学版),2010,28(3):140-143.
    [9]阿力木江·艾沙,吐尔根·依布拉音,库尔班·吾布力,等.基于SVM的维吾尔文文本分类研究[J].计算机工程与科学,2012,34(12):150-154.
    [10]李荣陆,王建会,陈晓云,等.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101.
    [11] MIKOLOV T,SUTSKEVER I,CHEN K,et al. Distributed representations of words and phrases and compositionality[J]. Advances in Neural Information Processing Systems,2013,26:3111-3119.
    [12] CHUNG J,GULCEHRE C,CHO K,et al. Gated feedback recurrent neural networks[J]. Computer Science,2015:2067-2075.
    [13]薛涛,王雅玲,穆楠.基于词义消歧的卷积神经网络文本分类模型[J].计算机应用研究,2018,35(10):2898-2903.
    [14] COLLOBERT R,WESTON J. A unified architecture for natural language processing:Deep neural networks with multitask learning[C]∥International Conference on Machine Learning. ACM,2008:160-167.
    [15] KIM Y. Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. ACM,2014:1746-1751.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700