用户名: 密码: 验证码:
面向公共安全事件的网络文本大数据结构化研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Public Security Event Themed Web Text Structuring
  • 作者:裴韬 ; 郭思慧 ; 袁烨城 ; 张雪英 ; 袁文 ; 高昂 ; 赵志远 ; 薛存金
  • 英文作者:PEI Tao;GUO Sihui;YUAN Yecheng;ZHANG Xueying;YUAN Wen;GAO Ang;ZHAO Zhiyuan;XUE Cunjin;State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences;University of Chinese Academy of Sciences;Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application;Key Laboratory of Virtual Geographic Environment,Nanjing Normal University,Ministry of Education;China National Institute of Standardization;State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing of Wuhan University;Key Laboratory of Digital Earth Science,Institute of Remote Sensing and Digital Earth,Chinese Academy of Sciences;
  • 关键词:语义框架 ; 文本解析 ; 事件关注度 ; 地震事件 ; 空间搜索引擎
  • 英文关键词:semantic framework;;text parsing;;social concern about events;;seismic events;;spatial search engine
  • 中文刊名:DQXX
  • 英文刊名:Journal of Geo-Information Science
  • 机构:中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室;中国科学院大学;江苏省地理信息资源开发与利用协同创新中心;南京师范大学虚拟地理环境教育部重点实验室;中国标准化研究院;武汉大学测绘遥感信息工程国家重点实验室;中国科学院遥感与数字地球研究所;
  • 出版日期:2019-01-29 18:11
  • 出版单位:地球信息科学学报
  • 年:2019
  • 期:v.21;No.137
  • 基金:国家自然科学基金项目(41525004、41421001)~~
  • 语种:中文;
  • 页:DQXX201901003
  • 页数:12
  • CN:01
  • ISSN:11-5809/P
  • 分类号:6-17
摘要
网络文本中所包含的相关信息目前已成为公共安全事件紧急救援与影响评估的重要信息源。现有的方法虽然可定向地提取文本信息中事件的各类要素信息,但由于缺乏面向事件的整体建模与解析框架,难以从网络文本中获取系统的事件要素的结构化信息,即所提取的事件要素信息要么不够完整,要么与目标事件不匹配,由此产生的遗漏与谬误难以支撑针对公共安全事件信息的系统分析。为解决该问题,本文提出了面向公共安全事件的网络文本大数据结构化理论框架,首先,建立了公共安全事件的语义框架,并以地震事件为例构建了相应的结构化表结构;其次,应用训练语料的关联标注解决了事件要素与事件无法匹配的难点;最后,通过使用可融合关联信息的文本解析算法,系统提取了事件类型、事件名称、事件时间、事件位置及其他属性,基本实现了网络文本中不同事件信息的结构化。本文以云南邵通鲁甸地震为例,展示了地震事件的网络文本信息的结构化过程与结果,为分析地震所受的关注程度以及救援状况提供了重要参考。在上述研究的基础上,开发了面向公共安全事件的网络文本信息挖掘系统,展示了地震事件文本的结构化解析以及由此实施的事件关注度分析。
        The information of public security event contained in text can be the data source of the evaluation and the relief if it can be structured into a relational database.Although previous research can extract the information of events into different attributes,the determination on the attribution of the attribute information to specific event remains unsolved.To solve the problem,this paper proposes a theoretical frame of public security event themed web text structuring,which is composed of three parts.First,an event semantic model is used to construct the seismic event semantic framework which defines abstract elements of event and their semantic relationships.Taking seismicity as an example,spatial element,time element,attribute element,source element are defined as basic elements.Spatial element includes earthquake latitude,longitude,depth and location.Attribute element is further subdivided into four sub-elements:Cause,result,behavior and influence element.Next,an annotation system is applied to typical event materials to label semantic elements,e.g.the place name where an earthquake took place,that is,instantiation of the abstract elements.The key to this step is labeling the relations between elements and specific event.Finally,the event text is structured into event type,event name,event time,event location and other attributes by using the text information extraction algorithm.The algorithm used the labeled materials in the last step as training data to optimize parameters,which can incorporate linked information.The extracted event text(e.g.words,phrases) finally is normalized to structured information for further analysis.An event information mining platform following the whole frame is developed,which includes the modules of webpage searching,text cleaning,event information extraction,visualization and analyzing.The platform processed the whole Chinese webpages of 2014 and found 85 506 seismicity reports.Taking Yunnanludian earthquake as an example,we display the structuring process and result of related web text,which can be the important reference for the relief of the disaster and the analysis of public concern.With the platform,we can demonstrate the seismic text structuring result and its social concern across China,which can be a new tool of event information mining and analyzing.
引文
[1]Sakaki T,Okazaki M,Matsuo Y.Earthquake shakes Twitter users:Real-time event detection by social sensors[C]Raleigh:International Conference on World Wide Web2010:851-860.
    [2]仇培元,陆锋,张恒才,等.蕴含地理事件微博客消息的自动识别方法[J].地球信息科学学报,2016,18(7):886-893[Qiu P Y,Lu F,Zhang H C,et al.Automatic identification method of micro-blog messages containing geographical events[J].Journal of Geo-information Science2016,18(7):886-893.]
    [3]袁烨城,刘海江,裴韬,等.基于语义知识的空间关系识别研究[J].地球信息科学学报,2014,16(5):681-690.[Yuan Y C,Liu H J,Pei T,et al.Spatial relation extraction from Chinese characterized documents based on semantic knowledge[J].Journal of Geo-Information Science2014,16(5):681-690.]
    [4]余丽,陆锋,张恒才.网络文本蕴涵地理信息抽取:研究进展与展望[J].地球信息科学学报,2015,17(2):127-134[Yu L,Lu F,Zhang H C.Extracting geographic information from web texts:Status and development[J].Journa of Geo-information Science,2015,17(2):127-134.]
    [5]Rafea A,Mostafa N A.Topic extraction in social media[C].San Diego:International conference on collaboration technologies and systems,2013:94-98.
    [6]Petkos G,Papadopoulos S,Aiello L,et al.A soft frequen pattern mining approach for textual topic detection[C]Thessaloniki:International conference on web intelligence,Mining and Semantics,2014:1-10.
    [7]谭红叶,郑家恒,刘开瑛.中国地名的自动识别方法研究[C].北京:全国计算机语言联合学术会议,1999.[Tan HY,Zheng J H,Liu K Y.Chinese place name automatic recognition[C].Beijing:National Academic Conference on computer languages,1999.]
    [8]肖计划.地名识别与匹配的概率统计方法[J].测绘科学技术学报,2014,31(4):408-412.[Xiao J H.Method of recognition and match of place name based on statistic[J].Journal of Geomatics Science and Technology,201431(4):408-412.]
    [9]丁效.句子级中文事件抽取关键技术研究[D].哈尔滨:哈尔滨工业大学,2011.[Ding X.Research on sentence level Chinese event extraction[D].Harbin:Harbin Institute of Technology,2011.]
    [10]吴家皋,周凡坤,张雪英.HMM模型和句法分析相结合的事件属性信息抽取[J].南京师大学报(自然科学版)2014,37(1):30-34.[Wu J G,Zhou F K,Zhang X Y.Research of the extraction method of event properties based on the combining of HMM and syntactic analysis[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(1):30-34.]
    [11]马林兵,龚健雅.空间信息自然语言查询接口的研究与应用[J].武汉大学学报·信息科学版,2003,28(3):301-305.[Ma L B,Gong J Y.Application of spatial information natural language query interface[J].Geomatics and Information Science of Wuhan University,2003,28(3):301-305.]
    [12]乐小虬,杨崇俊,于文洋.基于空间语义角色的自然语言空间概念提取[J].武汉大学学报(信息科学版),2005,30(12):1011-3011.[Le X Q,Yang C J,Yu W Y.Spatial concept extraction based on spatial semantic role in natural language[J].Geomatics and Information Science of Wuhan University,2005,30(12):1011-3011.]
    [13]乐小虬,杨崇俊.非受限文本中深层空间语义的识别方法[J].计算机工程,2006,32(4):36-38.[Le X Q,Yang CJ.Recognition of deep spatial semantics from unrestricted text[J].Computer Engineering,2006,32(4):36-38.]
    [14]蒋文明.面向中文文本的空间方位关系抽取方法研究[D].南京:南京师范大学,2010.[Jiang W M.Automatic Extraction of Spatial Relations in Chinese text[D].Nanjing:Nanjing Normal University,2010.]
    [15]Li R,Tao X,Tang L,et al.Using maximum entropy model for Chinese text categorization[C].Hangzhou:AsiaPacific Web Conference,2004:578-587.
    [16]李荣陆,王建会,陈晓云,等.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101.[Li R L,Wang J H,Chen X Y,et al.Using maximum entropy model for Chinese text catagorization[J].Journal of Computer Research and Development,2005,42(1):94-101.]
    [17]肖雪.基于最大熵模型的中文文本层次分类方法[J].计算机与网络,2015(9):36-38.[Xiao X.Hierarchical text categorization methods based on maximum entropy model[J].Computer&Network,2015(9):36-38.]
    [18]王江伟.基于最大熵模型的中文命名实体识别[D].南京:南京理工大学,2005.[Wang J W.Research on Chinese named entity recognition based on maximum entropy model[D].Nanjing:Nanjing University of Science and Technology,2005.]
    [19]王胜,朱明.基于最大熵马尔可夫模型的地址信息抽取[J].计算机工程与应用,2005,41(21):192-194.[Wang S,Zhu M.Address information extraction based on MEMM[J].Computer Engineering and Applications,2005,41(21):192-194.]
    [20]钱晶,张玥杰,张涛.基于最大熵的汉语人名地名识别方法研究[J].小型微型计算机系统,2006,27(9):1761-1765.[Qian J,Zhang Y J,Zhang T.Research on Chinese person name and location name recognition based on maximum entropy model[J].Mini-Micro Systems,2006,27(9):1761-1765.]
    [21]Kambhatla N.Combining lexical,syntactic,and semantic features with maximum entropy models for extracting relations[C].Barcelona:Association for Computational Linguistics,2014:22.
    [22]Kumar M A,Gopal M.A comparison study on multiple binary-class SVM methods for unilabel text categorization[J].Pattern Recognition Letters,2010,31(11):1437-1444.
    [23]冯永,李华,钟将,等.基于自适应中文分词和近似SVM的文本分类算法[J].计算机科学,2010,37(1):251-254.[Feng Y,Li H,Zhong J,et al.Text classification algorithm based on adaptive Chinese word segmentation and proximal SVM[J].Computer Science,2010,37(1):251-254.]
    [24]王金华,喻辉,产文,等.基于KNN+层次SVM的文本自动分类技术[J].计算机应用与软件,2016,33(2):38-41.[Wang J H,Yu H,Chan W,et al.Integrating KNN and Hierarchical SVM for Automatic Text Classification[J].Computer Applications and Software,2016,33(2):38-41.]
    [25]李丽双,黄德根,陈春荣,等.用支持向量机进行中文地名识别的研究[J].小型微型计算机系统,2005,26(8):1416-1419.[Li L S,Huang D G,Chen C R,et al.Research on method of automatic recognition of Chinese place names based on support vector machines[J].Mini-Micro Systems,2005,26(8):1416-1419.]
    [26]李丽双,黄德根,陈春荣,等.SVM与规则相结合的中文地名自动识别[J].中文信息学报,2006,20(5):51-57.[Li L S,Huang D G,Chen C R,et al.Identifying Chinese place names based on support vector machines and rules[J].Journal of Chinese Information Processing,2006,20:51-57.]
    [27]唐晋韬,王挺,周会平.面向中文文本的时间本体构建和自动扩充[C].北京:全国信息检索与内容安全学术会议,2005.[Tang J T,Wnag T,Zhou H P.Time ontology construction and auto-population towards Chinese text[C].Beijing:NCIRCS,2005.]
    [28]周凡坤.面向领域的文本信息抽取方法研究[D].南京:南京邮电大学,2014.[Zhou F K.Research of domain-oriented extraction method of text information[D].Nanjing:Nanjing University of Posts and Telecommunications,2014.]
    [29]Wang T,Li Y,Bontcheva K,et al.Automatic extraction of hierarchical relations from text[M].Budva:Springer Berlin Heidelberg,2006.
    [30]Jiang J,Zhai C X.A systematic exploration of the feature space for relation extraction[C].Rochester:Proceedings of NAACL HLT 2007,2007:113-120.
    [31]Bunescu R C,Mooney R J.Subsequence kernels for relation extraction[C].International Conference on Neural Information Processing Systems,2005:171-178.
    [32]Zhou G D,Zhang M,Ji D H,et al.Tree kernel-based relation extraction with context-sensitive structured parse tree information[C].Prague:2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007.
    [33]项乐安.基于多标签分类的空间关系抽取研究[D].南京:南京师范大学,2013.[Xiang L A.Spatial relation extraction based on multi-label classification[D].Nanjing:Nanjing Normal University,2013.]
    [34]张春元.基于CRFs的新闻网页主题内容自动抽取方法[J].广西师范大学学报(自然科学版),2011,29(1):138-142.[Zhang C Y.Automatic web news content extraction based on CRFs[J].Journal of Guangxi Normal University:Natural Science Edition,2011,29(1):138-142.]
    [35]梁吉光,田俊华,姜杰.基于改进HMM的文本信息抽取模型[J].计算机工程,2011,37(20):178-179.[Liang J G,Tian J H,Jiang J.Text information extraction model based on improved HMM[J].Computer Engineering,2011,37(20):178-179.]
    [36]史庆伟,郭朋亮.基于LDA的条件随机场主题模型研究[J].计算机工程与应用,2015,51(7):131-135.[Shi Q W,Guo P L.Conditional random fields topic model based on LDA model[J].Computer Engineering and Applications,2015,51(7):131-135.]
    [37]马龙.基于条件随机域模型的中文地名识别的研究[D].大连:大连理工大学,2009.[Ma L.A study on chinese location names recognition based on conditional random fields[D].Dalian:Dalian University of Technology,2009.]
    [38]高国洋,戚银城,潘德锋.基于条件随机场与规则相结合的中文地名识别[J].电脑开发与应用,2009,22(8):26-28.[Gao G Y,Qi Y C,Pan D F.Recognition of Chinese location name based on combination of conditional random fields with Multi-rules[J].Computer Development&Applications,2009,22(8):26-28.]
    [39]邬伦,刘磊,李浩然,等.基于条件随机场的中文地名识别方法[J].武汉大学学报·信息科学版,2017,42(2):150-156.[Wu L,Liu L,Li H R,et al.A Chinese toponym recognition method based on conditional random field[J].Geomatics and Information Science of Wuhan University,2017,42(2):150-156.]
    [40]Scheffer T,Decomain C,Wrobel S.Active hidden markov models for information extraction[C].Cascais:International Conference on Advances in Intelligent Data Analysis,2001:309-318
    [41]Ojokoh B,Zhang M,Tang J.A trigram hidden Markov model for metadata extraction from heterogeneous references[J].Information Sciences,2011,181(9):1538-1551.
    [42]Zhou D,He Y.Biomedical events extraction using the hidden vector state model[J].Artificial Intelligence in Medicine,2011,53(3):205-213.
    [43]董静,孙乐,冯元勇,等.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007,21(4):80-85,91.[Dong JSun L,Feng Y Y,et al.Chinese automatic entity relation extraction[J].Journal of Chinese Information Processing,2007,21(4):80-85,91.]
    [44]张春菊.中文文本中事件时空与属性信息解析方法研究[D]南京:南京师范大学,2013.[Zhang C J.Interpretation of event spatio-temporal and attribute information in Chinese Text[D].Nanjing:Nanjing Normal University,2013.]
    [45]Sankaranarayanan J,Samet H,Teitler B E,et al.TwitterStand:news in tweets[C].ACM Sigspatial Internationa Conference on Advances in Geographic Information Systems,2009:42-51.
    [46]路金泉,徐开勇,戴乐育.基于文本过滤的贝叶斯分类算法的改进[J].计算机与现代化,2016(9):100-103.[Lu JQ,Xu K Y,Dai L Y.Improvement of bayes classification algorithm based on text filtering[J].Computer and Modernization,2016(9):100-103.]
    [47]武建军,李昌兵.基于互信息的加权朴素贝叶斯文本分类算法[J].计算机系统应用,2017,26(7):178-182.[Wu JJ,Li C B.Mutual information-based weighted naive bayes text classification algorithm[J].Computer Systems&Applications,2017,26(7):178-182.]
    [48]刘杰.基于动态贝叶斯网的中文专有名词识别[D].太原山西大学,2006.[Liu J.Chinese proper names recognition based on dynamic bayesian network[D].Taiyuan Shanxi University,2006.]
    [49]顾雪峰.基于动态粒度思想的实体关系识别方法研究[D].太原:山西大学,2006.[Gu X F.Research on entity relation recognition based on dynamic granulation theory[D].Taiyuan:Shanxi University,2006.]
    [50]杨俊,陈贤富.基于KPCA和RBF网络的文本分类研究[J].微电子学与计算机,2010,27(3):122-125.[Yang JChen X F.Text categorization based on KPCA and RBFneural network[J].Microelectronics&Computer,201027(3):122-125.]
    [51]吕淑宝,王明月,翟祥,等.一种深度学习的信息文本分类算法[J].哈尔滨理工大学学报,2017,22(2):105-111.[Lu S BWang M Y,Zhai X,et al.An information text classification algorithm based on DBN[J].Journal of Harbin University of Science and Technology,2017,22(2):105-111.]
    [52]郭东亮,刘小明,郑秋生.基于卷积神经网络的互联网短文本分类方法[J].计算机与现代化,2017(4):78-81.[Guo D L,Liu X M,Zheng Q S.Internet short-text classification method based on CNNs[J].Computer and Modernization,2017(4):78-81.]
    [53]欧嘉致,陈凯江.基于NN/HMM混合模型的汉语地名识别系统[J].计算机工程与应用,2002,38(23):220-222.[Ou J Z,Chen K J,Li Z G.Hybrid neural-network/HMMBased mandarin place name recognition system[J].Computer Engineering and Applications,2002,38(23):220-222.]
    [54]李帅,黄玺瑛,董家瑞.一种基于神经网络的特定文本信息提取方法[C].郑州:中国科协年会,2008.[Li S,Huang X Y,Dong J R.An extracting measure of the specific text information based on neural-network[C].Zhengzhou:The annual meeting of China Association for Science and Technology,2008.]
    [55]吕国英,冯艳,李茹.基于中文框架语义的信息抽取研究[C].北京:全国信息检索与内容安全学术会议,2008.[Lv G Y,Feng Y,Li R.Research of information extraction based on Chinese Frame Net[C].Beijing:NCIRCS,2008.]
    [56]叶开.基于词向量的在线评论话题及其特征抽取研究[D].成都:电子科技大学,2016.[Ye K.Topic and feature extraction in online reviews based on Word2Vec[D].Chengdu:University of Electronic Science and Technology of China,2016.]
    [57]Jiang S,Pang G,Wu M,et al.An improved K-nearestneighbor algorithm for text categorization[J].Expert Systems with Applications,2012,39(1):1503-1509.
    [58]周庆平,谭长庚,王宏君,等.基于聚类改进的KNN文本分类算法[J].计算机应用研究,2016,33(11):3374-3377.[Zhou Q P,Tan C G,Wang H J,et al.Improved KNN text classification algorithm based on clustering[J].Application Research of Computers,2016,33(11):3374-3377.]
    [59]戚后林,顾磊.概率潜在语义分析的KNN文本分类算法[J].计算机技术与发展,2017,27(7):1-5.[Qi H L,Gu L.KNN text classification algorithm with probabilistic latent Semantic Analysis[J].Computer Technology and Development,2017,27(7):1-5.]
    [60]高昂,程越,李进,等.网络新闻事件分类体系及事件本体建模语料库标准化研究[J].情报工程,2017,3(5):43-52.[Gao A,Cheng Y,Li J,et al.The standardization study of netnews events classification system and the events ontology modeling corpus[J].Discovery and Research,2017,3(5):43-52.]
    [61]张春菊,张雪英,王曙,等.中文文本的事件时空信息标注[J].中文信息学报,2016,30(3):213-222.[Zhang C J,Zhang X Y,Wang S,et al.Annotation of Spatio-Temporal Information of Event in Chinese Text[J].Journal of Chinese Information Processing,2016,30(3):213-222.]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700