用户名: 密码: 验证码:
一种新的基于段向量的文本自动摘要方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A new automatic summarization method based on paragraph vector
  • 作者:申强强 ; 熊泽宇 ; 熊岳山
  • 英文作者:SHEN Qiang-qiang;XIONG Ze-yu;XIONG Yue-shan;School of Computer,National University of Defense Technology;
  • 关键词:文本自动摘要 ; 词向量 ; 段向量 ; 主题句
  • 英文关键词:automatic text summarization;;word vector;;paragraph vector;;topic sentence
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:国防科技大学计算机学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.294
  • 基金:国家自然科学基金(61379103)
  • 语种:中文;
  • 页:JSJK201906015
  • 页数:7
  • CN:06
  • ISSN:43-1258/TP
  • 分类号:114-120
摘要
文本自动摘要技术在网页搜索和网页内容推荐等多个领域都有着非常广阔的应用前景。经典的文本摘要算法采用统计学的方法来提取文章关键字,进而提取主题句。这种方法在一定程度上忽略了文本的语义和语法信息。近年来,分布式词向量嵌入技术已经应用到文本检索当中,基于该技术提出了一种词向量化的自动文本摘要方法,该方法主要分为4个步骤:词向量生成、基于词向量的段向量生成、关键词提取和主题句抽取,最终实现文本段落的自动摘要。实验结果表明,改进的文本自动摘要方法能够有效提取主题句。
        Automatic text summarization technology has a very broad application prospect in many fields, such as web search and browsing recommendation. The classic text summarization algorithm uses statistical methods to extract article keywords and topic sentences. It ignores semantic and grammatical information of the text to some extent. As distributed word vector embedding technology has been widely used in text summarization in recent years, we propose an automatic text summarization method based on word vector generation. This method mainly includes four modules: word vector generation, paragraph vector generation based on word vector, keyword extraction, and topic sentence extraction, through which an automatic text summarization of the document can finally be achieved. Experimental results show that the improved automatic text summarization method can extract topic sentences effectively.
引文
[1] Erkan G,Radev D R.LexRank:Graph-based lexical centrality as salience in text summarization [J].Journal of Artificial Intelligence Research,2004,22(1):457-479.
    [2] Hovy E,Lin C Y.Automated text summarization and the SUMMARIST system[C]//Proc of A Workshop on Held at Baltimore,1998:197-214.
    [3] Salton G,Singhal A,Mitra M,et al.Automatic text structuring and summarization[J].Information Processing & Management:An International Journal,1997,33(2):193-207.
    [4] Conroy J M,O’Leary D P.Text summarization via hidden Markov models[C]//Proc of International ACM SIGIR Conference on Research and Development in Information Retrieval,2001:406-407.
    [5] Mihalcea R.Graph-based ranking algorithms for sentence extraction,applied to text summarization[C]//Proc of ACL 2004 on Interactive Poster and Demonstration Sessions,2004:20.
    [6] Xiong Ze-yu,Shen Qiang-qiang,Wang Yi-jie,et al.Paragraph vector representation based on word to vector and CNN learning [J].CMC,2018,55(2):213-227.
    [7] Nenkova A,Mckeown K.A survey of text summarization techniques[M]//Mining Text Data.New York:Springer US,2012:43-76.
    [8] Gambhir M,Gupta V.Recent automatic text summarization techniques:A survey [J].Artificial Intelligence Review,2016,47(1):1-66.
    [9] Jing H.Sentence reduction for automatic text summarization[C]//Proc of the Conference on Applied Natural Language Processing,2000:310-315.
    [10] Chuang W T,Yang J.Extracting sentence segments for text summarization:A machine learning approach[C]//Proc of International ACM SIGIR Conference on Research & Development in Information Retrieval,2000:152-159.
    [11] Nomoto T,Matsumoto Y.A new approach to unsupervised text summarization[C]//Proc of International ACM SIGIR Conference on Research and Development in Information Retrieval,2001:26-34.
    [12] Bhargava R,Sharma Y,Sharma G.ATSSI:Abstractive text Summarization using sentiment infusion[J].Procedia Computer Science,2016,89:404-411.
    [13] Song S L,Huang H T,Ruan T X.Abstractive text summarization using LSTM-CNN based deep learning[J].Multimedia Tools & Applications,2018,78(1):857-875.
    [14] Le Q,Mikolov T.Distributed representations of sentences and documents[C]//Proc of International Conference on Machine Learning,2014:1188-1196.
    [15] Rada M.Graph-based ranking algorithms for sentence extraction,applied to text summarization [C]//Proc of the ACL 2004 on Interactive Poster and Demonstration Sessions,2004:Article 20.
    [16] Hingu D,Shah D,Udmale S S.Automatic text summarization of Wikipedia articles[C]//Proc of International Conference on Communication,Information & Computing Technology,2015:1-4.
    [17] Edmundson H P.New methods in automatic extracting [J].Journal of the ACM,1969,16(2):264-285.
    [18] Mihalcea R,Tarau P.TextRank:Bringing order into texts[C]//Proc of 2014 Conference on Empirical Methods in Natural Language,2004:404-411.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700