用户名: 密码: 验证码:
基于LDA和LSA的医学文本和影像分析模型及应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
医学文本及影像数据可以利用语义分析技术来进行建模和统计,从而分析得到各种数据间的数学关系。这种不含有人主观色彩的数据分析技术可以为医生提供客观的诊断依据及辅助诊疗信息。
     对数据信息进行语义建模是语义分析的基础,目前隐变量模型和树模型是国内外在语义建模领域的两种主流研究方向,针对医学信息的特点,两种语义建模方法各有利弊:
     (1)隐变量模型可以较好地从医学信息集中提取出“概念、规则和模式”之间存在的潜在关联性。由于隐变量模型都是基于词袋(bag of words)的设计思想,因此建模过程中忽略了信息中语义元素的结构、位置和层次等浅层语义特征,而在医学信息应用的各个环节(比如检索、文本生成等)中都不同程度地需要利用这些语义特征。
     (2)树模型能利用拓扑结构反映出语义元素之间语义相关、相对位置或空间分布等关联性,如剖析树、上下文树等。树模型的建模对象一般是简单的概率关系或字面语义,缺乏从隐含语义的角度对信息进行的分析,因此无法从更深层次对医学信息进行处理和利用(比如辅助诊断)。
     在对以上模型研究的基础之上,针对目前医学信息语义分析技术存在的实际问题,本文从医学文本语义检索、医学图像语义标注、基于语义分析的诊断文本生成三个方面进行了研究,提出了相应的语义建模和语义信息处理方法,论文的主要研究内容及取得的创新性成果如下:
     (1)在医学文本语义信息处理方面,提出了一种将隐含语义分析与树模型相融合的LSA-tree模型。利用该模型可以对具有半结构的文本病历实现从字面语义到隐含语义的综合提取。这种方法首先利用语义窗口对文本进行分割,之后再将窗中词划分为几个子树,然后计算子树中核心词与相关词之间的字面语义参数,最后通过LSA在隐含语义空间中的映射,提取出核心词之间的关联性。通过实验可以证明,对文本病例采用基于LSA-tree模型的语义检索系统,由于LSA-tree模型更加准确和全面地表达了文本病例的语义信息,因此不仅简化了原LSA模型复杂度并且实现了医学专业词的语义消歧(多词一义),从而提高了检索精度。
     (2)在医学图像语义信息处理方面,提出了一种基于LDA-tree模型的X线相干散射图像语义标注方法。针对X线相干散射图像中存在的可识别特征较少、图像抽象本体、图像特征互扰,首先提出一种基于树结构的图像分解方法,利用这一方法图像被分解成含有图像语义特征的区域和片段(子图),之后在这些子图中提取了图像的形态学特征、光度学特征以及拓扑学特征,并对图像的能量分布曲线和拓扑结构信息进行了量化编码。进一步地,为了跨越语义鸿沟实现图像语义的文本标注,本文引入了LDA模型的参数估计和变分推理过程,并利用图形词袋将图像树模型与LDA模型进行了联合。通过实验可以证明,采用基于LDA-tree模型的语义标注方法实现了对X线相干散射图像的图像语义标注,并且LDA-tree语义标注方法的匹配准确度要优于基于PLSA的语义标注方法,其对于X线相干散射的成像差异、噪声和图像特征互扰等影响因素也有较好的抑制作用。
     (3)在医学语义文本生成和辅助诊断方面,提出了一种用于生成医学图像诊断意见的LDA-LSA-tree模型。在分析医学影像报告文本特点的基础之上,针对LSA-tree模型处理医学影像报告可能存在的语义信息提取不够完备的问题,在字面语义层通过修正平均距离来获得词的上下文位置信息并对停用词进行语义信息统计;为了实现对病症在内容层面上的推理过程,提出了一种基于LSA的K中心内容聚类分析法对医学影像报告文本进行聚类和权值预置,并将文本的内容聚类作为LSA-tree模型的中间语义层。在对自然语义生成技术的研究基础之上,根据自然语言生成系统构造和生成本文过程对语义信息的需求,提出了用于自然语言生成的LDA-LSA-tree模型,用从主题内容到词之间的映射弥补了LSA-tree在语义推理上的欠缺,从而符合自然语言生成系统在内容规划建模方面对“结构构造”和“内容确定”的双重要求。推理部分采用了“关联-加权”的方案,引入词频-逆序文档频率加权法,实现在平滑LDA模型的Gibbs抽取算法过程中进行语义复合加权。通过实验可以证明,目前常见的关键词匹配模型生成文本方法虽然简单易行,但其生成文本的语义匹配度和可读性很低,无法为医生诊断提供更多有价值信息,而本文提出的基于LDA-LSA-tree模型的NLG方法充分考虑到医学诊断报告的各种语义细节,生成结果也类似于人工批注的文本,并且由于提出的LDA-LSA-tree模型有较好的主题模型性能,因此其推理得到诊断信息的准确度也优于其他语义文本生成模型。
     本文采用的文本病例、诊断报告等数据来自于XX肿瘤医院、X线相干散射成像数据来自于XX大学第三医院,每一组数据使用前均经过医学专家会诊审核。实验过程与目前临床实际采用的几种主要及较新的医学信息处理方法进行比较,并用医学专家评价和通用标准综合分析实验结果,可以验证本文方法和模型的有效性。
According to the data of medical text and image, semantic analysis technology can beused for analyzing the mathematical relationship between all kinds of datas through themodeling and statistics. This data analysis technique without subjective can provide anobjective basis of the diagnosis and the clinical information to doctors.
     Semantic modeling of data is the basis of the semantic analysis. At present, the LatentVariable Model and Tree Model are two major research directions in the field of semanticmodeling field at domestic and international. In view of the characteristics of the medicalinformation, both semantic modeling methods have their own advantages and disadvantages:
     (1) Latent variable model could better extract latent association of "concept, rules, andmodel" from the medical information set. Latent variable model is based on the idea of "bagof words", thus the modeling process ignores the structure, location and level of semanticelements, which are the semantic features of varying degree in aspects of medicalinformatics applications (such as retrieval, text generation, etc).
     (2) Tree model can reflect the semantics relation, the relative location or spatialdistribution of correlation between the semantic elements, such as Parse Tree, Context Tree,and so on. Tree model object is generally a simple probabilistic relationship or the literalsemantic, which lacks of information from the perspective of latent semantic analysis, thusmedical information couldn’t be processed and used in a deeper level, such as auxiliarydiagnosis.
     Based on the above model research, for practical problems in the semantic analysis ofmedical information technology, the research from the semantic retrieval of medical text,semantic annotation of medical image and NLG of diagnostic text in this paper arementioned. Then the corresponding semantic modeling and methods are proposed. The thesisresearch content and innovative achievements obtained are as follows:
     (1) For semantic information processing in the medical text, LSA-tree model is formedby the fusion of LSA model and tree model. The LSA-tree model can extract both the literalsemantic and latent semantic from Free-Text Medical Records with semi-structure. Firstly,the text is segmented with semantic window which divides the words into several sub-trees.And the literal semantic parameters are calculated between core words and related words in sub-trees. Finally, the relation between core words is extracted through the LSA mapping inlatent semantic space. Experiment shows that, semantic retrival system for free-text medicalrecords based on LSA-tree model, not only simplifies the complexity of the original LSAmodel but also achieves the semantic disambiguation (synonymy) of medical words, toimprove the retrieval precision.
     (2) For semantic information processing of medical images, the LDA-tree method isused for the semantic annotation of X-ray coherent scattering image. According to the lessidentified features of the X-ray coherent scattering images, abstract ontology of the imageand image feature interference, first I proposed a tree structure-based segmentation methodthat the image is decomposed into regions and fragments which contain the image semanticfeatures (sub-graph). After the geometric properties, photometric features and topologicalproperties are extracted from the sub-graphs, the energy distribution curves and the topologyof the image information to the quantization coding. Furthermore, for crossing the semanticgap and realizing the semantic annotation of image, this paper introduces parameterestimation and variational inference process of LDA model, and uses a bag of visual wordsmodel to combine the image segmentation tree model and LDA model. Through theexperiment it can be proved that semantic annotation methods based on LDA-tree model canrealize the semantic annotation for X-ray coherent scatter images, and LDA-tree semanticannotation method is superior to the PLSA semantic annotation in the aspect of matchingaccuracy, besides, as to for X-ray coherent scatter imaging difference, noise and imagecharacteristics of mutual interference and other influencing factors have a better inhibition.
     (3) For medical text generation and auxiliary diagnosis, the LDA-LSA-tree model isbuilt for generating the medical image diagnosis. The basis of the analysis of textcharacteristics of medical imaging report, for dealing with the incomplete semanticinformation in LSA-tree model, we amendment the average distance to get the wordcontextual location information and add up the semantic information statistics of stop wordsin a literal semantic layer. And we propose a K-medoids Content Cluster analysis methodbased on LSA model to cluster and pre-weight value the medical image report text, and takethe content cluster of text to be the middle semantic layer of LSA-tree model. On the base ofresearch of natural language generation technology, we propose a LDA-LSA-tree methodused for natural language generation according to natural language generation systemstructure and the needs of semantic information in the process of generating the text, itmakes up the shortage of LSA-tree in semantic inference from subject content to mappingbetween the words, so it can fit the double needs of ‘structure’ and ‘content definiteness’ ofcontent planning modeling aspect of natural language generating system. We use 'association-weighting' method in the inference part, introduce term frequency-inversedocument frequency weighting method, realize the semantic complex weighting in Gibbssampling process using smooth LDA model. It proves that although the most commonkeywords matching model to generate text method is simple and easy to use, but thesemantic matching degree and readability of the text generated is very low, it can't providemore meaningful information for doctor's diagnosis. The NLG method based onLDA-LSA-tree in this paper is considered for the medical diagnosis report of semanticdetails sufficiently, so the generated result is also similar to the text of the artificial notation.Because the proposed LDA-LSA-tree model has good theme model performance, theaccuracy of the diagnosis is better than other semantic text generation model.
     The free-text records, diagnosis reports and other data used in this paper are obtainedfrom XX Tumor Hospitals, and X-ray coherent scattering imaging data is obtained fromSino-Japanese XX Hospital, each group data is audited by the medical specialists beforeusing. The experiment process is compared with several major and new medical informationprocessing method used in clinical applications, also the availability of the method andmodeling in this paper can be evaluated by medical specialists and analyzing the results bythe general standard comprehensive.
引文
[1]张浩,崔雷.生物医学文本知识发现的研究进展[J].医学信息学杂志,2008,29(9):5-9.
    [2] BERTAUD V, LASBLEIZ J, MOUGIN F, et al. A unified representation of findingsin clinical radiology using the UMLS and DICOM[J]. International Journal ofMedical Informatics,2008,9(77):621–629.
    [3] DAVID N, SARVNAZ K, LAWRENCE C. Using topic models to interpretMEDLINE's medical subject headings[J]. Lecture Notes in Computer Science,2009,5866LNAI:270-279.
    [4] LEHMANN T M, WEIN B B, KEYSERS D, et al. A monohierarchical multiaxialclassification code for medical images in content-based retrieval[C]. Proceedings of2002IEEE International Symposium on Biomedical Imaging,2002:313–316.
    [5] HOBBS J R. The generic information extraction system[C]. Proceedings of the5thconference on Message understanding,1993:87-91.
    [6] SALTON G, WONG A, YANG C. A vectors space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
    [7] DEERWESTER D, DUMAIS S, FURNAS G, et al. Indexing by latent semanticanalysis[J]. Journal of the American Society for Information Science,1990,41(6):391-407.
    [8] DUMAIS S. Latent semantic indexing[C]. The2nd Text Retrieval Conference,1994:105-116.
    [9] DUMAIS S. Latent semantic indexing[C]. The3rd Text Retrieval Conference,1995:219-230.
    [10] HOLFMAN T. Probabilistic Latent semantic analysis[C]. Proceedings of the15thConference on Uncertainty in Artificial Intelligence,1999:289-296.
    [11] GIROIAMI M., KABAN A. On an equivalence between PLSI and LDA[C].Proceedings of the26nd Annual ACM SIGIR Conference on Research andDevelopment in Information Retrieval,2003:433-434.
    [12] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J]. Journal of MachineLearning Research,2003,3:993-1022.
    [13] BLEI D. Probabilistic models of text and images[D]. Berkeley: University ofCalifornia,2004.
    [14] MARCUS M, SANTORINI B, MARCINKIEWICZ M. Bullding a large annotatedcorpus of english: the penn treebank[J]. Computational Linguistics,1993,19(2):313-330.
    [15] MARCUS M, TAYLOR A, MACINTYRE R, et al. The Penn TreebankProject[EB/OL].(1999-02-02). http://www.cis.upenn.edu/~treebank/.
    [16]杨沐昀.汉英句子对齐及翻译词典和翻译规则的自动获取[D].哈尔滨:哈尔滨工业大学计算机学院,2002.
    [17] CUTURI M, VERT J. The context-tree kernel for strings[J].Neural Networks,2005,18(2005):1111-1123.
    [18]徐超,周一民,沈磊.一种面向隐含主题的上下文树核[J].电子与信息学报,2010,32(11):2695-2700.
    [19] TODOROVIC S, AHUJA N. Unsupervised category modeling, recognition andsegmentation in images[J]. IEEE Trans. Pattern Analysis Machine Intelligence(TPAMI),2007,30(12):2158-2174.
    [20] WAGER K, LEE F, GLASER J. Managing Health Care Information Systems: APractical Approach for Health[M]. San Francisco: Jossey-Bass,2005:237-238.
    [21] KENTALA E, PYYKKO I, VIIKKI K, et al. Production of diagnostic rules from aneurotologic database with decision trees[J]. The Annals of Otology, Rhinology&Laryngology,2000,109(2):170-176.
    [22] KUSIAK A, KERN J A, KERNSTINE K H. Autonomous decision-making: a datamining approach[J]. IEEE Transactions on Information Technology in Biomedicine,2000,4(4):274-284.
    [23] TAKAHASHI Y, INOUE T, FUKUSATO T. Protein induced by vitamin k absence orantagonist Ⅱ-producing gastric cancer[J]. World Journal of GastrointestinalPathophysiology,2010,4(4):129-136.
    [24]刘全喜.医疗文书规范与管理[M].郑州:河南科学技术出版社,2003.
    [25] VISHWANATH A, RAJAN S S, PETER W. The impact of electronic medical recordsystems on outpatient workflows: a longitudinal evaluation of its workflow effects[J].International Journal of Medical Informatics,2010,79(11):778-791.
    [26] JUN L, MEI F X, LAN J L. Increasing the meaningful use of electronic medicalrecords: a localized health level7clinical document architecture system[J]. LectureNotes in Computer Science,2010,6441LNAI(2):491-499.
    [27]李莹.文本病历信息抽取方法研究[D].杭州:浙江大学生物医学工程,2009.
    [28] NELSON S, JAMES T, DAN-SUNG C, et al. Medical Subject Headings (MeSH)
    [EB/OL].(2011-06-22)[2011-07-07]. http://www.nlm.nih.gov/mesh/
    [29] GINTER F, SUOMINEN H, PYYSALO S, et al. Combining hidden markov modelsand latent semantic analysis for topic segmentation and labeling: method and clinicalapplication[J]. International Journal of Medical Informatics,2009,78(12): e1-e6.
    [30]吴军,王作英.汉语信息熵和语言模型的复杂度[J].电子学报,1996(10):69-71+86.
    [31]连乐新,胡仁龙,杨翠丽,袁春风.基于中文宾州树库的浅层语义分析[J].计算机应用研究,2008,25(3):674-676+680
    [32] SEHUTZE H, SILVERSTEIN C. Projections for efficient document clustering [C].Proceedings of20th Annual International ACMSIGIR Conference on Research andDevelopment in Information Retrieval,1997:74-81.
    [33] ZHA H, MARQUES O,SIMON H. A subspace-based model for information retrievalwith applications in latent semantic indexing[C]. Proceedings of Irregular,1998:29-42.
    [34] YANG Y. Noise reduction in a statistical approach to text categorization [C].Proceedings of the1sth Annual International ACMSIGIR Conference on Researchand Development in Information Retrieval,1995:256-263.
    [35] HUSBANDS P, SIMON H, DING C. Term norm distribution and its effects on latentsemantic[C]. Indexing Information processing and Management.2005,41(4):777-787.
    [36]孙雄勇.汉语句类分析中单字处理研究[D].北京:中国科学院声学研究所,2006.
    [37] WINTER.中文搜索引擎技术揭密:中文分词[EB/OL].(2004-08-06).http://www.e800.com.cn/
    [38] HUANG C N, ZHAO H. Which is essential for Chinese word segmentation:Character versus word[C]. In PACL IC20th, Wuhan, China2006.
    [39]张玉,张文举,陈建青. MeSH和本体在医学知识组织中的应用[J].医学信息学杂志,2011,32(6):49-53.
    [40]崔雷.医学数据挖掘[M].北京:高等教育出版社,2006:168.
    [41]刘景鑫,刘太辉,孟凡军,等. X线相干散射CT成像技术[J].医疗设备信息,2005,20(8):35-38.
    [42] FERN M, KEYRIL J, SERIMAA R, etal. Small-angle X-ray scattering studies ofhuman breast tissue samples[J]. Physics in medicine and biology,2002,47(21):577-592.
    [43] ELSHEMEY W M, DESOUKY O S, MOHAMMED M S, et al. Characterization ofcirrhosis and hepatocellular carcino using low-angle X-ray scattering signatures ofserum[J].Physics in medicine and biology,2003,48(17): N239-N246.
    [44] WEN H, BENNETT E E, HEGEDUS M M, et al. spatial harmonic imaging of X-rayscattering-initial result[J]. IEEE transaction on medical imaging,2008,27(8):997-1002.
    [45] POLUDNIOWSKI G, EVANS P M, WEBB S. Rayleigh scatter in kilovoltage X-rayimaging: is the independent atom approximation good enough?[J]. Physics inmedicine and biology,2009,54:6931-6942.
    [46] KING B W, JOHNS P C. An energy-dispersive technique to measure X-ray coherentscattering form factors of amorphous materials[J]. Physics in medicine and biology,2010,55:855–871.
    [47] TAIHUIA L, JUNXI S, JINGXIN L, et al. Medical X-ray coherent scatter imagesprocessing based on the mrf model[C].Proceedings of SPIE,2006,6026:312-320.
    [48]刘景鑫,杨海山,刘太辉. X线相干散射成像技术及其在医学中的应用[J].中华放射学杂志,2006,40(8):879-880.
    [49] TAIHUI L, JINGXIN L, SHUANG Q, et al. The algorithm of extracting energy curvefrom X-ray coherent scatter images[C].6th International Conference on WirelessCommunications Networking and Mobile Computing (WiCOM),2010:1-5.
    [50] KANUNGO T,MOUNT D M, NETANYAHU N S, et al. An efficient K-meansclustering algorithm: Analysis and implementation[J]. IEEE Trans. Pattern Analysisand Machine Intelligence,2002,24:881-892.
    [51] SOILLE P. Morphological Image Analysis: Principles and Applications[M]. SantaClara: Springer,2003.
    [52] TAI-HUI L, JING-XIN L, JING-JUN L. Inner correlation algorithm on determiningring center of X-Ray coherent scatter image[J]. Journal Press of China MedicalDevices,2010,25(7):1-4.
    [53] EAKINS J P, BOARDMAN J M, SHIELDS K. Retrieval of trade mark images byshape feature-the ARTISAN project[C]. IEEE Colloquium on Intelligent ImageDatabases,1996:9/1-9/6.
    [54] HARE J S, LEWIS P H, ENSER, P G B, et al. Mind the Gap: Another look at theproblem of the semantic gap in image retrieval[C]. Multimedia Content Analysis,Management, and Retrieval2006-Proceedings of SPIE-IS and T Electronic Imaging,2006,6073.
    [55] GORDON P M K, BARKER K, SENSEN C W. Programming-by-example meets theSemantic Web: Using ontologies and web services to close the semantic gap[C]. IEEESymposium on Visual Languages and Human-Centric Computing, VL/HCC,2010:133-140.
    [56] JORDAN M, GHAHRAMANI Z, JAAKKOLA T, et al. Introduction to variationalmethods for graphical models[J]. Machine Leaning,1999,37:183-233.
    [57] YOUNG KYUNG L, BYEONG U P. Estimation of Kullback-Leibler divergence bylocal likelihood[J]. Annals of the Institute of Statistical Mathematics,2006,58(2):327-340.
    [58] ABRAMOWITZ M, STEGUN I. Handbook of Mathematical Functions[M]. NewYork: Dover,1970.
    [59] WINN J M. Variational Message Passing and its Applications[D]. Cambridge:University of Cambridge,2003.
    [60] BORMAN S. The Expectation Maximization Algorithm A short tutorial[EB/OL].
    [2006-10-14]. http://www.seanborman.com/publications.
    [61] CHENG X Z. A Note on the Expectation-Maximization (EM) Algorithm [EB/OL].(2004)[2007]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.8289
    [62]刘硕研,须德,冯松鹤,等.一种基于上下文语义信息的图像块视觉单词生成算法[J].电子学报,2010,38(5):1156-1161.
    [63]刘峥.图像的语义标注及其改善问题研究[D].济南:山东大学,2011.
    [64] BARNARD K, DUYGULU P, FORSYTH D, et al. Matching words and pictures[J].Journal of Machine Learning Research,2003,3:1107–1135.
    [65] WANG C, BLEI D, FEI-FEI L. Simultaneous image classification and annotation [C].IEEE Conference on Computer Vision and Pattern Recognition,2009:1903-1910.
    [66]邵虹.基于内容的医学图像检索关键技术研究[D].沈阳:东北大学信息科学与工程学院,2005.
    [67] REITER E, DALE R. Building Natural Language Generation Systems[M].Cambridge: Cambridge University Press,2000.
    [68] JURAFSKY D, MARTIN J H. Speech and Language Processing[EB/OL]. PrenticeHall,2005. http://www.cs.colorado.edu/~martin/slp.html
    [69]武乐斌,王锡明,孙丛,等.医学影像学诊断图谱和报告[M].北京:军事医学科学出版社,2006.
    [70]张通,王国华.实用医学影像学:影像诊断报告书写格式示范[M].北京:军事医学科学出版社,2006.
    [71] GEMAN S, GEMAN D. Stochastic relaxation, Gibbs distributions, and the Bayesianrestoration of images[J]. IEEE Transactions on pattern analysis and MachineIntelligence.1984,6(6):721-741.
    [72] LANDAUER T, MCNAMARA D, DENNIS S, et al. Latent Semantic Analysis: ARoad to Meaning[M]. Hillsdale: Laurence Erlbaum.2007:427-448.
    [73] GRIFFITHS T, STEYVERS M. Finding scientific topics[C]. Proceedings of theNational Academy Science.2004:5228-5235.
    [74] TEH Y, NEWMAN D, WELLING M. A collapsed variational bayesian inferencealgorithm for latent dirichlet allocation[C]. Advanced in Neural InformationProcessing Systems.2006,18:1353-1360.
    [75] SALTON G, MCGILL M. Introduction to Modern Information Retrieval[M]. NewYork: McGraw-Hill,1983.
    [76] AIZAWA A. An information-theoretic perspective of TF-IDF measures[J].Information Processing and Management,2003,39(1):45-65.
    [77] CHRISTOPHER D M, PRABHAKAR R, HINRICH S. Introduction to InformationRetrieval[M]. Cambridge: Cambridge University Press,2008.
    [78] WU H, LUK R, WONG K, et al. Interpreting TF-IDF term weights as makingrelevance decisions[J]. ACM Transactions on Information Systems,2008,26(3):13:1-37.
    [79] SALTON G. The SMART (System for the Mechanical Analysis and Retrieval of Text)Information Retrieval System[EB/OL]. ftp://ftp.cs.cornell.edu/pub/smart/
    [80]郭望皓.对外汉语文本易读性公式研究[D].上海:上海交通大学,2009.
    [81] WANG X, MCCALLUM A. Topics over time: a non-markov continuous-time modelof topic trends[C]. Proceedings of the12th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining,2006:424-433.
    [82] GEOFFREY Z L. Semantic vector space model: Implementation and evaluation[J].Journal of the American Society for Information Science,1997,48(5):395-417.
    [83] XIAOYUE W, ZEWEN H, YUPING L. Text classification model based on semanticpattern vector space[J]. Journal of Information and Computational Science,2010,7(11):2302-2311.
    [84] XIWU G, XIANBING W, RUIXUAN L. A new vector space model exploitingsemantic correlations of social annotations for web page clustering[C]. Web-AgeInformation Management-12th International Conference,2011:106-117.
    [85] ABDULLAH F S, ROSSITZA S. Ontology-based indexing of annotated imagesusing semantic DNA and vector space model[C].2011International Conference onSemantic Technology and Information Retrieval,2011:40-47.
    [86] QILI T, LING G, ZHE Z,et al. The study of semantic query expansion based onimproved vector space model[C].2011IEEE3rd International Conference onCommunication Software and Networks,2011:342-344.
    [87] QINJIAO M, BOQIN F, SHANLIANG P. Latent semantic analysis for queryinterfaces of deep web sites[J]. Journal of Southeast University,2008,24(3):312-314.
    [88] FABIO A G, JUAN C C. Quantum latent semantic analysis[C]. Advances inInformation Retrieval Theory-Third International Conference,2011:52-63.
    [89] WEI S, CHEOL PARK C S. A novel document clustering model based on latentsemantic analysis[C].3rd International Conference on Semantics, Knowledge, andGrid,2007:539-542.
    [90] THOMAS H. Unsupervised learning by probabilistic latent semantic analysis[J].Machine Learning,2001,42(1-2):177-196.
    [91] WEN-BIN Y, LI S, YIN-FENG Q, et al. The hierarchical clustering analysis ofhyperspectral image based on probabilistic latent semantic analysis[J]. Spectroscopyand Spectral Analysis,2011,31(9):2471-2475.
    [92] KOBUS B, PINAR D, DAVID F, et al. Matching words and pictures[J]. Journal ofMachine Learning Research,3(6):1107-1135.
    [93] BLEI D M, MICHAEL J. Modeling annotated data[C]. Proceedings of theTwenty-Sixth Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval,2003:127-134.
    [94] JORDAN BG, BLEI D, XIAOJIN Z. A topic model for word sense disambiguation[C].Proceedings of the2007Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning,2007:1024-1033.
    [95] ZHI-YONG S, JUN S, YI-DONG S. Collective latent Dirichlet allocation[C].8thIEEE International Conference on Data Mining,2008:1019-1024.
    [96] JINFU Y, YANGLI W, MINGAI L. Place recognition based on Latent DirichletAllocation[C]. IEEE International Conference on Mechatronics and Automation,2011:495-500.
    [97] HEPING L, JIE L, SHUWU Z. Hierarchical Latent Dirichlet Allocation models forrealistic action recognition[C]. IEEE International Conference on Acoustics, Speech,and Signal Processing,2011:1297-1300.
    [98]张小平.主题模型及其在中医临床诊疗中的应用研究[D].北京:北京交通大学,2011.
    [99] JOHN F S, EILEEN C W. Implementing a semantic interpreter using conceptualgraphs[J]. IBM Journal of Research and Development,1986,30(1):57-69.
    [100] GUODONG Z, JUNHUI L, JIANXI F,et al. Tree kernel-based semantic role labelingwith enriched parse tree structure[J]. Information Processing and Management,2011,47(3):349-362.
    [101] GUO-DONG Z, QIAO-MING Z. Kernel-based semantic relation detection andclassification via enriched parse tree structure[J]. Journal of Computer Science andTechnology,2010,26(1):45-56.
    [102] CHAO X, YI-MIN Z, LEI S. A context tree kernel based on latent semantic topic[J].Journal of Electronics and Information Technology,2010,32(11):2695-2700.
    [103] STEPHAN K, UWE M. Closure properties of linear context-free tree languages withan application to optimality theory[C]. Algebraic Methods in Language Processing,2006:82-97.
    [104] PAVEL K, PASI F. Compression of map images by multilayer context treemodeling[J]. IEEE Transactions on Image Processing,2005,14(1):1-11.
    [105] SASCHA1S, MICHAEL K, MANUE M, et al. Semantic annotation of medicalimages[C]. Progress in Biomedical Optics and Imaging,2010,7628.
    [106] HARRIS M D, Building a large-scale commercial NLG system for an EMR[C], theFifth International Natural Language Generation Conference,2008:157-160.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700