本体支持的视频情报分析方法与技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

本体支持的视频情报分析方法与技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Video Intelligence Analysis Using Ontology
作者：白亮
论文级别：博士
学科专业名称：军队指挥学
中文关键词：视频情报 ; 情报分析 ; 语义内容分析 ; 本体 ; 视频情报概念探测
英文关键词：Video Intelligence ; Intelligence Analysis ; Semantic Content Analysis ; Ontology ; Concept Detection in Video Intelligence
学位年度：2008
导师：老松杨
学科代码：110504
学位授予单位：国防科学技术大学
论文提交日期：2008-10-01

摘要

信息化与全球化时代视频情报大量涌现并在战略决策中发挥重要作用,研究如何从大量视频情报中获取有价值信息已成为必然,而其核心在于分析获取视频情报包含的语义内容。语义鸿沟的存在使得视频情报语义内容分析面临巨大困难,严重制约了视频情报应用。本文的研究旨在解决上述问题。
     本文首先建立了视频情报分析体系,指明了视频情报分析需要解决的核心问题——视频情报语义内容分析,进而提出本体支持的视频情报语义内容分析框架。重点研究了该框架下的视频情报低层语义内容抽取、视频情报高层语义内容分析等关键问题,并设计与实现了本体支持的视频情报分析平台(VIAPO, Video Intelligence Analysis Platform using Ontology)。论文的主要贡献体现在以下几个方面:
     一、提出了视频情报分析的概念体系和技术体系。在概念体系中明确了视频情报分析的概念、任务和层次结构;在技术体系中提出了视频情报分析技术的体系结构以及关键技术。
     二、提出了本体支持的视频情报语义内容分析框架。定义了视频情报感知概念、元概念、高层概念(元概念和高层概念统称为视频情报概念),将视频情报内容抽象为上述概念以及概念间关系的集合;指明了视频情报感知概念、视频情报概念及其关系的抽象与本体理论的本质联系,提出了本体支持的视频情报知识基础构建方法;提出了紧密结合领域知识、分层跨域语义鸿沟的视频情报分析方法。
     三、提出了基于颜色空间互信息度量和PetriNet模型的镜头探测方法,提高了渐变镜头的探测准确率;提出基于机器学习的视频情报感知概念探测方法。视频情报感知概念的探测需要对大量高维低层感知特征样本数据进行自动分析处理,从中发现有意义的模式,机器学习是解决这一类问题的有效方法。本文分别采用支持向量机、条件随机域、高斯混合模型等机器学习方法来分类识别重要的音频概念、视觉对象概念和运动类型概念,提高了视频情报感知概念的探测准确率。
     四、提出了本体支持的视频情报高层语义分析方法。视频情报高层语义分析包括两个方面:视频情报概念探测和视频情报检索。针对以往基于内容的方法的缺陷,提出了本体支持的元概念探测方法,在感知概念探测的基础上,融合低层感知特征和上下文语义信息探测元概念。区别于以往基于内容的方法以及简单线性加权的融合模型,本文提出了基于贝叶斯网络模型的高层概念探测方法,通过贝叶斯网络建模高层概念与低层概念的关联以探测高层概念,提高了视频情报概念探测的性能。针对视频情报检索个性化的需求,提出了基于概念合成PetriNet的视频情报查询描述模型,通过PetriNet模型描述概念之间的时序关系,自定义的建模用户查询语义,满足了用户个性化的视频情报检索需求。
     五、设计实现了本体支持的视频情报分析平台VIAPO,验证了本体支持的视频情报语义内容分析框架和相关方法的有效性,以及平台在情报分析中的应用效果,为视频情报分析提出了一条可行的解决思路。
     综上所述,本文提出了视频情报分析体系以及本体支持的视频情报语义内容分析框架,深入研究了视频情报语义内容分析技术,完整的实现了视频情报从低层语义抽取到高层语义概念探测的全过程,有效的解决了视频情报分析面临的语义鸿沟难题。本文的研究不仅为视频情报分析建立了一定的理论和实践基础,同时也将对视频语义内容分析技术产生积极影响。
Public video intelligence is emerging as one kind of important resources for analyzing international relations and making strategic decisions. The rapid increase in the available amount of video data is creating a growing demand for efficient methods for understanding and managing it at the semantic level. One of the major challenges facing video semantic content analysis and the related applications is the so-called "the Semantic Gap" between the rich high-level semantics that users desire and the shallowness of the low-level features that the automatic algorithms can extract from the media. In this thesis, we systematically explore the problem of modeling and managing semantics of public video intelligence.
     Firstly, an architecture for video intelligence analysis is proposed. And video semantic content analysis is shown to be the core for video intelligence analysis. Secondly, a general framework for video semantic content analysis is presented based on ontology. Within this framework, methods of low-level semantic extraction and high-level semantic analysis are developed for video analysis. Finally, the above framework and methods are validated by designing and implementing a Video Intelligence Analysis Platform using Ontology (VIAPO). The main contributions of the thesis are as follows:
     We propose an architecture for video intelligence analysis, consisting of concept architecture and technique architecture. Concepts and the hierarchy of video analysis are defined within the concept architecture. And key techniques implementing video analysis are illustrated within the technique architecture.
     We suggest a novel unified framework for video semantic content analysis using ontology. Perception Concept, Meta Concept and High-level Concept are defined. Video semantic content are modeled with the above concepts and the relationships between them. Moreover, the construction of video intelligence knowledge base is proposed using ontology. And we propose a hierarchical approach for bridging the semantic gap combining domain knowledge.
     We address the methods of detecting Perception Concepts using machine learning techniques. In order to detect Perception Concepts, it is necessary to process high-dimensioned low level features automatically and discover meaningful patterns from the large amount of video data. Three methods are proposed to detect Perception Concepts comprehensively, which are composed of Audio Concepts detection based on Support Vector Machine、Visual Object Concept detection based on Conditional Random Field and Motion_Type Concepts detection based Gaussian Mixture Model.
     We develop an approach for high-level semantic analysis using ontology, which consists of concept detection in video intelligence and video intelligence retrieval. Meta Concept detection using ontology is proposed to overcome the drawbacks of traditional content-based methods. Based on Perception Concepts detection, Meta Concepts are detected combined with low-level features and context information. With the results of Meta Concepts detection, a novel method for high-level concept detection is proposed using Bayesian Net, which models the relations between low-level concepts and high-level concepts. With the demand of customizing video intelligence retrieval in mind, we propose a query description model based on Perception Concepts and video concepts composite PetriNet. The temporal relationships between the concepts interested by user are modeled by PetriNet, which supports the customization of video intelligence retrieval.
     We design and implement a Video Intelligence Analysis Platform using Ontology, which gives a sound support to the above framework and methods of video semantic content analysis.
     In conclusion, this thesis provides an in-depth investigation into the architecture of video intelligence analysis, the framework of video semantic content analysis and methods for bridging the semantic gap. This research is the foundation of video intelligence analysis, theoretically and practically. And it also improves the technology of video semantic content analysis.

引文

[1]李耐国.军事情报研究.北京:军事科学出版社,2001年12月.
    [2]方明.未来情报工作的发展方向.情报杂志. 2003, 22(9):98~100.
    [3]彭光谦,姚有志.军事战略学教程.北京:军事科学出版社,2001年11月第1版.
    [4]李明.信息化战争下的美海军战略变化.舰船电子工程. 2003(3):23, 64~67.
    [5]周柏林,华留虎等译.战略评估.北京:国防大学出版社,1999.
    [6]王保存,刘玉建编著.外军信息战研究概览.北京:军事科学出版社,1999年1月第1版.
    [7]夏大永,罗景青,龚亮亮.情报处理中数据挖掘的应用.舰船电子工程. 2004, 24(6):22~25.
    [8]符静.数据挖掘:情报学的发展.大学图书情报学刊. 2005, 23(4):44~45.
    [9]赵刚.建立国家竞争情报体系:目标与原则.情报学报. 2004, 23(3):367~371.
    [10]包昌火,赵刚,黄英.略论竞争情报的发展走向.情报学报. 2004, 23(3): 352~366
    [11] Jan Herring. The Future of Competitive Intelligence: Driven by Knowledge Based Competition. Competitive Intelligence Magazine. 2003, 6(2):6~13.
    [12] Doctrine Division. MCDP 2 Intelligence. US Marine Corps, MCCDC, 1997. Full text at http://www. doctrine.usmc.mil.
    [13]梁全翔.论面向决策的C2系统设计.火力与指挥控制.1998, 23(3): 173~183.
    [14] Kevin F, McCrohan.Competitive Intelligence: Preparing for the Information War. Long Range Planning.1998, 31(4):586~583.
    [15]佘诗武.军事情报及其研究范围.情报杂志. 2000, 19(6):69~70.
    [16] Doctrine Division. MCWP 2-14 Counterintelligence. US Marine Corps, 2000. Full text at http://www. doctrine.usmc.mil.
    [17]于厚海.信息战争形态下军事情报活动的研究与展望.情报科学. 2003, 21(4):349~350.
    [18]周军.试论军事情报的概念.情报杂志. 2004, 23(1):33~34,37.
    [19]王小梅,吴清强,韩涛.情报分析平台的集成化实践.情报分析与研究. 2007, No.7: 54~58.
    [20]龙鳕.我国智能情报检索系统分析.图书馆学研究. 2006: 51~53.
    [21]王磊,张新宇.情报检索进化之路—从情报检索的易用性谈起.情报科学. 2003,21(6): 667~669.
    [22]罗式胜.文献计量学.广州:中山大学出版社,1994.
    [23]邱均平.我国内容分析法的研究进展.图书馆杂志. 2003 (4): 5~8.
    [24] The Pine Ridge Group. T. W. Powell Company. CI Analytical Tools: How Effective Are They?. http://www.scip.orgPciPanal2ysis.asp ,2003.06.13.
    [25]我国企业信息化和竞争情报实态调查.竞争情报解决方案—企业竞争情报系统和竞争情报技能.北京:兵器工业出版社. 2002(11): 88～199
    [26] Paul.G. G, Gerard J.R.Intelligent Fusion and Asset Management Processor. IEEE Information Technology Conference, 1998:15~18.
    [27] Hayes.C.C, Schlabach.J.L, Fiebig.C.B. FOX-GA: An Intelligent Planning and Decision Support Tool. IEEE International Conference on Systems, Man and Cybernetics.1999, 3:2454~2459.
    [28] Gonsalves.P, Cunningham.R, Ton.N. Intelligent Threat Assessment Processor Using Genetic Algorithms and Fuzzy Logic. Proceedings of the Third International Conference on Information Fusion. 2000, 2:18~24.
    [29]李颖敏.通信对抗情报处理技术研究.中国电子学会电子对抗分会第十二届学术年会论文集. 2002:445~452.
    [30]姚奕,刘晓明,黄松.一个基于数据仓库的通用情报处理系统模型.情报科学. 2006, 24(4): 607~611.
    [31]李永波.基于数据挖掘的军事情报分析系统研究[硕士论文].重庆:重庆大学,2005.
    [32] Shih-Fu Chang. The Holy Grail of Content-based Media Analysis. IEEE Multimedia, 2002, 9(2): 6~10.
    [33] Marc Davis, Chitra Dorai, Frank Nack. Understanding Media Semantics. The 11th Tutorial Program of the 11th ACM International Conference on Multimedia. Berkeley, CA, USA, Nov 2003.
    [34] Trecvid2001, http://www-nlpir.nist.gov/projects/trecvid/revised.html.
    [35] Trecvid2002, http://www-nlpir.nist.gov/projects/t2002v/t2002v.html.
    [36] Trecvid2003, http://www-nlpir.nist.gov/projects/tv2003/tv2003.html.
    [37] Trecvid2004, http://www-nlpir.nist.gov/projects/tv2003/tv2004.html.
    [38] Trecvid2005, http://www-nlpir.nist.gov/projects/tv2003/tv2007.html.
    [39] Trecvid2006, http://www-nlpir.nist.gov/projects/tv2006/tv2006.html.
    [40] Trecvid2007, http://www-nlpir.nist.gov/projects/tv2007/tv2007.html.
    [41]曹莉华.视频媒体的基于内容处理和检索的研究与实现[博士论文].长沙:国防科学技术大学, 1998.
    [42]熊华.视频内容结构化技术的研究与实现[博士论文].长沙:国防科学技术大学, 2001.
    [43]王辰.多媒体融合分析技术的研究与实现[博士论文].长沙:国防科学技术大学, 2002.
    [44]谢毓湘.辅助情报分析的新闻视频挖掘技术研究[博士论文].长沙:国防科学技术大学, 2004.
    [45]陈剑赟.体育视频语义内容分析技术研究[博士论文].长沙:国防科学技术大学, 2005.
    [46] Micheal S.Lew, NicuSebe, Ramesh Jain,“Content-Based Multimedia InformationRetrieval: State of the Art and Challenges”, ACM Transactions on Multimedia Computing, Communications and Applications, February 2006, 2(1):1~19.
    [47] Lawrence A. Rowe, Ramesh Jain,“ACM SIGMM Retreat Report on Future Directions in Multimedia Research”, ACM Transactions on Multimedia Computing, Communications and Applications, February 2005, 1(1): 3~13.
    [48] Shih-Fu Chang, Wei-Ying Ma, Arnold Smeulders. Recent Advances and Chanllenges of Semantic Image/Video Search. In Proceeding of IEEE ICASSP 2007: 1205~1208.
    [49] Chabane Djeraba, Moncef Gabbouj, Patrick Bouthemy. Multimedia indexing and retrieval: ever great challenges. Multimedia Tools Application, 2006: 221~228.
    [50] Myron Flickner, Harpreet Sawhney, Wayne Niblack, et al. Query by Image and Video Content: The QBIC System. IEEE Computer, 1995, 28(9): 23~32.
    [51] J. R. Bach, C. Fuller, A.Gupta, et al. The Virage Image Search Engine: an Open Framework for Image Management. In Proceedings of SPIE: Storage and Retrieval for Image and Video Databases IV, San Diego, CA, USA, 1996:76~87.
    [52] Ramesh Jain. InfoScopes: Multimedia Information Systems. In Multimedia Systems and Techniques, Edited by B. Furht, Kluwer Academic Publishers, Boston, 1996: 217~253.
    [53] Yong Rui, Thomas S.Huang, Michael Ortega, Sharad Mehrotra. Relevance Feedback: A Power Tool for Interactive Content-based Image Retrieval. IEEE Transactions on Circuits and System for Video Technology, 1998, 8(5): 644~655.
    [54] John R.Smith and Shih-Fu Chang. VisualSEEK: A Fully Automated Content-Based Image Query System. In Proceedings of the fourth ACM International Conference on Multimedia, Boston, MA, USA, Nov 1996: 87~98.
    [55] Shih-Fu Chang, William Chen, H. J. Meng, et al. VideoQ: An Automated Content based Video Search System Using Visual Cues. In Proceedings of the fifth ACM International Conference on Multimedia, Seattle, USA, Nov 1997: 313-324.
    [56] A. B. Benitez, J. R. Smith, and S.-F. Chang,“MediaNet: A multimedia information network for knowledge representation,”Proc. SPIE, Vol.4211, 2000.
    [57] C.Jorgensen, A.Jaimes, A.B.Benitez, and S.-F.Chang,“A conceptual framework and research for classifying visual descriptors,”J. Amer. Soc. Information Science (JASIS), 2001, 52(11): 938~947.
    [58] A.B.Benitez, S.-F.Chang, and J.R.Smith,“IMKA: A multimedia organization system combining perceptual and semantic knowledge,”in ACM Multimedia, 2001.
    [59] S. Satoh and T.Kanada,“Name-It: Association of face and name in video,”in Proc. CVPR, 1997.
    [60] H.Wactlar, M.Christel, Y.Gong, and A.Hauptmann,“Lessons learned from the creation and deployment of a terabyte digital video library,”IEEE Computer,1999, 32: 66~73.
    [61] A. G. Hauptmann,“Towards a large scale concept ontology for broadcast video,”in Proc. CIVR, 2004: 674~675.
    [62] M.Christel and A.Hauptmann,“The use and utility of high-level semantic features in video retrieval,”in Proc. CIVR, 2005.
    [63] http://www.cdvp.dcu.ie/aboutfishclar.html.
    [64] A.F. Smeaton, et al.., The Físchlár-News-Stories System: Personalised Access to an Archive of TV News. RIAO 2004, Avignon, France, April, 2004: 26~28.
    [65] Hangzai Luo, Jianping Fan.“Building concept ontology for medical video annotation”, in Proc. ACM MM’06, October, 2006: 23~27.
    [66] Jianping Fan, Hangzai Luo, Yuli Gao, Ramesh Jain.“Incorporating concept ontology for hierarchical video classification, annotation and visualization”, IEEE Transaction on Multimedia, 2007, 9(5): 939~957.
    [67] C. G. M. Snoek, M. Worring, and A. G. Hauptmann,“Learning rich semantics from news video archives by style analysis,”ACM Trans. Multimedia Comput., Commun., Applicat., 2006, 2(2): 91~108.
    [68] C. G. M. Snoek, M.Worring, J. Geusebroek, D. C. Koelma, F. J. Seinstra, and A. W. M. Smeulders,“The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing,”IEEE Trans. Pattern Anal. Machine Intell., Oct. 2006, 28: 1678~1689.
    [69] Ballard, D. H., Brown, C. M. 1982. Computer Vision. Prentice Hall, New Jersey, USA.
    [70] Haralick, R. M. and Shapiro, L. G. 1993. Computer and Robot Vision. Addison-Wesley, New York, NY.
    [71]吴玲达,老松扬,王辰等.多媒体信息系统.北京:电子工业出版社,2002.
    [72]李国辉.信息组织与检索.北京:科学出版社,2001.
    [73] Shih-Fu Chang, William Chen, Horace J.Meng, et al. A Fully Automated Conten-based Video Search Engine Supporting Spatiotemporal Queries. IEEE Transactions on Circuit and System for Video Technology, 1998, 8(5): 602~615.
    [74] D. Zhong and Shih-Fu Chang. An Integrated Approach for Content-based Segmentation and Retrieval. IEEE Transactions on Circuit and System for Video Technology, 1999, 9(8): 1259 ~1268.
    [75] Ahmet Ekin, A. Murat Tekalp and Rajiv Mehrotra. Integrated Semantic Syntactic Video Event Modeling for Search and Browsing. Accepted for publication in IEEE Multimedia. http://www.ece.rochester.edu/users/ekin/publications.html
    [76] Milind R. Naphade and Thomas S. Huang. Probabilistic Multimedia Objects Multijects: A novel Approach to Indexing and Retrieval in Multimedia Systems. In Proceedings of IEEE International Conference on Image Processing (ICIP’98), Chicago, IL, USA, Oct 1998, 3: 536~540.
    [77] Shih-Fu Chang. Optimal Video Adaptation and Skimming Using a Utility-based Framework. In Proceedings of Tyrrhenian International Workshop on Digital Communications (IWDC’02), Capri Island, Italy, Sept. 2002.
    [78] LookSmart [Online]. Available: http://www.looksmart.com/.
    [79] Open Project [Online]. Available: http://dmoz.org/.
    [80] Ontology Alignment [Online]. Available: http://oaei.ontologymatching.org/.
    [81] A.D.Maedche, Ontology Learning for the SemanticWeb. New York: Springer- Verlag, 2002.
    [82] P. Buitelaar, P. Cimiano, and B. Magnini, Ontology Learning from Text: Methods, Evaluation, and Applications. New York: IOS, 2005.
    [83] M. Sanderson and W. B. Croft,“Deriving concept hierarchies from text,”in Proc. ACM SIGIR, 1999: 206~213.
    [84] K. Punera, S. Rajan, and J. Ghosh,“Automatically learning document taxonomies for hierarchical classification,”WWW, 2005: 1010~1011.
    [85] A. McCallum, R. Rosenfeld, T. Mitchell, and A. Ng,“Improving text classification by shrinkage in a hierarchy of classes,”in Proc. ICML, 1998: 359~367.
    [86] S. T. Dumais and H. Chen,“Hierarchical classification of Web content,”in Proc. ACM SIGIR, 2000: 256~263.
    [87] M. Ciaramita, T. Hofmann, and M. Johnson,“Hierarchical semantic classification: Word sense disambiguation with world knowledge,”in Proc. IJCAI, 2003.
    [88] Latifur Khan and Dennis Mcleod. Effective Retrieval of Audio Information from Annotated Text Using Ontologies. In Proceedings of the 1st International Workshop on Multimedia Data Mining (MDM/KDD’2000), in conjunction with ACM SIGKDD conference, Boston, MA, USA, Aug 2000: 37~45.
    [89] Latifur Khan and Dennis McLeod. Audio Structuring and Personalized Retrieval Using Ontologies. IEEE Advances in Digital Libraries, Library of Congress, Washington, DC, May 2000.
    [90] Latifur Khan, Dennis McLeod, Eduard Hovy. Retrieval effectiveness of an ontology-based model information selection. The VLDB Journal, 2004, 13: 71~85.
    [91] H. Luo, J. Fan, and G. Xu,“Multi-modal salient objects: General building blocks of semantic video concepts,”in Proc. ACM CIVR, 2004: 374~383.
    [92] M.Bertini, R.Cucchiara, A. Del Bimbo C. Torniai,“Video Annotation with Pictorially Enriched Ontologies”. In Proc. of International Conference on Multimedia and Expo (ICME), 2005.
    [93] M.Bertini, A.Del Bimbo C.Torniai, "Automatic Video Annotation using Ontologies Extended with Visual Information", Proc. of ACM Multimedia, November 2005
    [94] M.Bertini, et al. Dynamic pictorial ontology for video digital libraries annotation. In Proc.of ACM MS’07, Augsburg, Bavaria, Germany, 2007.
    [95] Stamatia Dasiopoulou, et al. Knowledge-Assisted semantic video object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2005, 15(10): 1210~1224.
    [96] Yakup Yildirim, Turgay Yilmaz, Adnan Yazici. Ontology-supported object and event extraction with a genetic approach for object classification. In Proc. ACM CIVR, Amsterdam, 2007: 202~209.
    [97] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.
    [98] A. Hoogs, J. Rittscher, G. Stein, and J. Schmiederer,“Video content annotation using visual analysis and a large semantic knowledgebase,”in IEEE Int. Conf. Computer Vision and Pattern Recognition,Madison, WI, 2003, 2:327~334.
    [99] L. Hollink, M. Worring, and A. T. Schreiber,“Building a visual ontology for video retrieval,”In Proc. ACM Multimedia, Singapore, 2005: 479~482.
    [100] M. Worring and G. Schreiber,“Semantic image and video indexing in broad domains,”IEEE Trans. Multimedia, 2007, 9(5): 909 ~910.
    [101] M. Naphade, J. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis,“Large-scale concept ontology for multimedia,”IEEE Multimedia, 2006, 13(3): 86 ~91.
    [102] C.G. M. Snoek, M.Worring, J.C.van Gemert, J.-M. Geusebroek, and A.W.M. Smeulders,“The challenge problem for automated detection of 101 semantic concepts in multimedia,”in Proc. ACM Multimedia, Santa Barbara, CA, 2006: 421~430.
    [103] M.R.Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.
    [104] T. Gevers and A. W. M. Smeulders,“PicToSeek: Combining color and shape invariant features for image retrieval,”IEEE Trans. Image Processing, 2000, 9(1): 102 ~119.
    [105] W.-Y. Ma and B. Manjunath,“NeTra: A toolbox for navigating large image databases,”Multimedia Syst., 1999, 7(3): 184~198.
    [106] A. Del Bimbo and P. Pala,“Visual image retrieval by elastic matching of user sketches,”IEEE Trans. Pattern Anal. Machine Intell., 1997, 19(2): 121 ~132.
    [107] T.Westerveld, A. de Vries, A. van Ballegooij, F. de Jong, and D. Hiemstra,“A probabilistic multimedia retrieval model and its evaluation,”J.Appl. Signal Processing, 2003, 3(2): 186~197.
    [108] T.-S. Chua, S.-Y. Neo, K.-Y. Li, G. Wang, R. Shi, M. Zhao, H. Xu, Q. Tian, S. Gao, and T. L. Nwe,“TRECVID 2004 search and feature extraction task by NUS PRIS,”in Proc. TRECVIDWorkshop, Gaithersburg, MD, 2004.
    [109] R. Yan, J. Yang, and A. Hauptmann,“Learning query-class dependent weights for automatic video retrieval,”in Proc. ACM Multimedia, New York, 2004: 548~555.
    [110] A. Natsev, M. R. Naphade, and J. Tesic,“Learning the semantics of multimedia queries and concepts from a small number of examples,”in Proc. ACM Multimedia, Singapore, 2005: 598~607.
    [111] L.S.Kennedy, A.Natsev, S.-F. Chang,“Automatic discovery of query-class- dependent models for multimodal search,”in Proc. ACM Multimedia, Singapore, 2005: 882~891.
    [112] G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. Khudanpur, D. Klakow, M. Krause, R. Manmatha, H. Nock, D. Petkova, B. Pytlik, and P. Virga,“Joint visual-text modeling for automatic retrieval of multimedia documents,”in Proc. ACM Multimedia, Singapore, 2005: 21~30.
    [113] Q.Zhu, M-C.Yeh, K.-T. Cheng. Multimodal Fusion using Learned Text Concepts for Image Categorization . In Proc. ACM Multimedia, CA, USA, 2006: 211~220.
    [114] R. Lienhart, C. Kuhmünch, and W. Effelsberg,“On the detection and recognition of television commercials,”in IEEE Conf. Multimedia Computing and Systems, Ottawa, ON, Canada, 1997: 509~516.
    [115] J.Smith and S.-F.Chang,“Visually searching the web for content,”IEEE Multimedia, 1997, 4(3): 12~20.
    [116] Y. Rui, A. Gupta, and A. Acero,“Automatically extracting highlights for TV baseball programs,”in Proc. ACM Multimedia, Los Angeles, CA, 2000: 105~115.
    [117] Cabasson R, Divakaran A. Automatic Extraction of Soccer Video Highlights Using a Combination of Motion and Audio Features. In Proceeding of SPIE: Storage and Retrieval for Multimedia Databases, SPIE Volume 5021, USA, 2003: 272~276.
    [118] Vasanth Tovinkere, Richard J. Qian. Detecting Semantic Events in Soccer Games: Towards a Complete Solution, In Proceedings of IEEE International Conference of Multimedia and Expo (ICME’01), Tokyo, Japan, 2001: 1040~1043.
    [119] David A. Sadlier, Noel O’Connor, Sean Marlow, Noel Murphy. A Combined Audio-Visual Contribution to Event Detection in Field Sports Broadcast Video. Case Study: Gaelic Football. In Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT’03), Darmstadt, Germany, Dec 2003.
    [120] Surya Nepal, Uma Srinivasan, Graham Reynolds. Automatic Detection of‘Goal’Segments in Basketball Videos. In Proceedings of the 9th ACM International Conference on Multimedia, Ottawa, Canada, Sep 2001: 261~269.
    [121] Noboru Babaguchi, Yoshihiko Kawai, Tadahiro Kitashi. Event-based Indexing of Broadcasted Sports Video by Intermodal Collaboration. IEEE Transactions onMultimedia, 2002, 4(1): 68~75.
    [122] Lew.M.S, Huijsmans.N. Information theory and face detection. In Proceedings of the International Conference on Pattern Recogntion. Vienna, Austria, 1996: 601~605.
    [123] Chua.T.S, Zhao.Y, Kankanhalli.M.S. Detection of human faces in a compressed domain for video stratification. The Visual Computer, 2002, 18(2): 121~133.
    [124] Yang M.H, Kriegman D.J, Ahuja N. Detecting faces in images: A survey. IEEE Trans. Patt. Analy. Machine Intell. 2002, 24(1): 34~58.
    [125] Lew M.S. Next generation Web searches for visual content. IEEE Comput. 2000: 46~53.
    [126] Fan J., Gao Y., Luo, H. Multi-level annotation of natural scenes using dominant image components and semantic concepts. In Proc. of the ACM International Conference on Multimedia. ACM, New York, NY, 2004: 540~547.
    [127] Rautianen M., Seppanen T., Penttila J., Peltola J. Detecting semantic concepts from video using temporal gradients and audio classification. In Proceedings of the 3rd International Conference on Image and Video Retrieval. Springer-Verlag, London, UK, 2003: 260~270.
    [128] M. Naphade and T. Huang, A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans. Multimedia. 2001, 3(1): 141~151.
    [129] A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M. Naphade, A. Natsev, C. Neti, H. Nock, J. Smith, B. Tseng, Y.Wu, and D. Zhang,“IBM research TRECVID-2003 video retrieval system,”in Proc. TRECVID Workshop, Gaithersburg, MD, 2003.
    [130] X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.
    [131] M. Campbell and et al. IBM research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
    [132] S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
    [133] R. Kohavi and G. H. John,“Wrappers for feature subset selection,”Artif. Intell., 1997, 97: 273~324.
    [134] T. Ho,“The random subspace method for constructing decision forests,”IEEE Trans. Pattern Anal. Machine Intell., 1998, 20(8): 832~844.
    [135] A.Kojima, T. Tamura, and K. Fukunaga,“Natural language description of human activities from video images based on concept hierarchy of actions,”Int. J. Comput. Vis., 2002, 50(2): 171~184.
    [136] A.Jaimes and J.R.Smith, Semi-automatic, data-driven construction of multimediaontologies. in Proc. IEEE ICME, 2003.
    [137] C.A.Lindley, A multiple-interpretation framework for modeling video semantics. in ER-97 Workshop on Conceptual. Modeling in Multimedia Information Seeking, 1997.
    [138] J. Hunter,“Enhancing the semantic interoperability of multimedia through a core ontology,”IEEE Trans. Circuits, Syst., Video Technol. 2003, 13: 49~58.
    [139] C. G. M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring, Adding semantics to detectors for video retrieval. IEEE Trans. Multimedia, 2007, 9(5): 975~986.
    [140] M. Koskela, A. F. Smeaton, and J. Laaksonen. Measuring concept similarities in multimedia ontologies: Analysis and evaluations. IEEE Trans. Multimedia, to be published.
    [141] W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.
    [142] Guo-Jun Qi, et al. Correlative multi-label video annotation. In Proc. of ACM MM’07, Germany, 2007: 17~26.
    [143] Zhiwei Gu, et al. Multi-layer multi-instance kernel for video concept detection. In Proc. of ACM MM’07, Germany, 2007: 349~352.
    [144] Alejandro Jaimes, et al. Multimedia Information Retrieval: What is it, and why isn’t anyone using it?. In Proc. of ACM MIR’05, Singapore, 2005: 3~8.
    [145] Bliujute R., et al. Developing a DataBlade for a new index. In Proceedings of IEEE International Conference on Data Engineering. (March) Sydney, Australia 1999: 314~323.
    [146] Egas R., et al. Adapting k-d trees to visual retrieval. In Proceedings of the International Conference on Visual Information Systems. (June) Amsterdam, A. Smeulders and R. Jain, Eds., 1999: 533~540.
    [147] Shih-Fu Chang. The Holy Grail of Content-based Media Analysis. IEEE Multimedia, Vol.9, No.2, April/June, 2002: 6~10.
    [148] Chong-Wah Ngo, Ting-Chuen Pong, Hong-Jiang Zhang. Recent Advances in Content Based Video Analysis. International Journal of Image and Graphics, 2001, 1(3): 445~468.
    [149]徐培德等.军事运筹学基础.长沙:国防科学技术大学出版社,2007.
    [150] Alejandro Jaimes and Shih-Fu Chang. Concepts and Techniques for Indexing Visual Semantics. In Image Databases: Search and Retrieval of Digital Imagery, Edited by Vittorio Castelli and Lawrence D. Bergman, John Wiley & Sons Inc. 2002: 497~565.
    [151] Ji Zhang, Wynne Hsu, Mong Li Lee. An Information-driven Framework for Image Mining. In Proceedings of the 12th International Conference on Databaseand Expert Systems Applications (DEXA’01), Munich, Germany, Sep 2001: 232~242.
    [152] Ji Zhang, Wynne Hsu, Mong Li Lee. Image Mining: Issues, Frameworks and Techniques. In Proceedings of the Second International Workshop on Multimedia Data Mining, San Francisco, CA, USA, Aug 2001: 13~20.
    [153] D.Reidsma, J.Kuper, T.Declerck, H.Saggion, and H.Cunningham. Cross document ontology based information extraction for multimedia retrieval. In Supplementary proceedings of the ICCS03, Dresden, July 2003.
    [154] V.Mezaris, I.Kompatsiaris, N.Boulgouris, and M.Strintzis. Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2004, 14(5): 606~621.
    [155] A.Jaimes, B.Tseng, and J.Smith. Modal keywords, ontologies, and reasoning for video understanding. In International Conference on Image and Video Retrieval (CIVR 2003), July 2003.
    [156] S.Dasiopoulou, V.K.Papastathis, V.Mezaris, I.Kompatsiaris and M.G.Strintzis. An Ontology Framework for Knowledge-Assisted Semantic Video Analysis and Annotation. Proc. 4th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2004) at the 3rd International Semantic Web Conference (ISWC 2004), November 2004.
    [157] J.Strintzis, S.Bloehdorn, S.Handschuh, S.Staab, N.Simou, V.Tzouvaras, K.Petridis, I.Kompatsiaris, and Y.Avrithis. Knowledge representation for semantic multimedia content analysis and reasoning. In European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, Nov.2004.
    [158] I.Kompatsiaris, V.Mezaris, and M.G.Strintzis. Multimedia content indexing and retrieval using an object ontology. Multimedia Content and Semantic Web Methods, Standards and Tools, Editor G.Stamou, Wiley, New York, NY, 2004.
    [159] Christopher Welty, Ontology Research, AI Magazine, 2003:11~12.
    [160] R.F.Neches, R.Finin, T.Gruber, T.Patil, R.Senator, T.Swartout, Enabling Technology for Knowledge Sharing. AI Magazine, 1991: 36~56.
    [161] T.R.Gruber. A translation approach to portable ontologies. Knowledge Acquisition, Vol.5, No.2, 1993: 199~220.
    [162] W. N. Borst. Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis, University of Twente, Enschede, 1997.
    [163] R. P. B. Swartout, K. Knight, T. Russ.Toward Distributed Use of Large-Scale Ontologies. Ontological Engineering, 1997: 138~148.
    [164] Fensel D. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer. 2001.
    [165] Noy F.N., McGuinness D.L. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, March, 2001.
    [166] Fonseca, F. Egenhofer M., Agouris, P.,Camara G. Using Ontologies for Intergrated Geographic Information Systems. Transactions in GIS, 2002, 6(3): 114~126.
    [167] Starlab, Systems Technology and Application Research Laboratory home page. Faculty of Sciences, Department of Computer cience, Vrije Universiteit Brussel. Available at: http://www.starlab.vub.ac.be/research/indexbody.htm.
    [168] M.Uschold. Ontologies Principles, Methods and Applications. Knowledge Engineering Review, 1996, 11(2): 46~56.
    [169] Grigoris Antoniou, Frank van Harmelen. Web Ontology Language: OWL.
    [170]邓志鸿、唐世渭等. Ontology研究综述.北京大学学报(自然科学版), 2002, 38(5).
    [171] J. S. Hare, P. H. Lewis, P. G. B. Enser, and C. J. Sandom,“Mind the gap: Another look at the problem of the semantic gap in image retrieval,”Proc. SPIE, Vol. 6073, 2006.
    [172] Vapnik V. The Nature of Statistical Learning Theory. New York:Springer-Verlag, 1995.
    [173] Vladimir N.Vapnik著.统计学习理论.北京:电子工业出版社,2004.
    [174]施光燕,董加礼.最优化方法.北京:高等教育出版社,1999.
    [175]胡适耕,施保昌.最优化原理.武汉:华中理工大学出版社,2000.
    [176] Cortes C, Vapnik V. Support Vector Networks. Machine Learning, 1995:273~297.
    [177] Osuna E,Freund R, Girosi F.An Improved Training Algorithm for Support Vector Machines.In: PrincipeJ , Gile L,Morgan N,Wilson Eeds,Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Proceeding, New York:IEEE , 1997: 276~285.
    [178] Chang C C, Hsu C W, Lin C J . The analysis of decompositon methods for support vector machines. In Workshop on Support Vector machines, IJCAI, 1999.
    [179] Joachims T.Making Large-Scale SVM Learning Practical. In: Scholkopf B,Burges C J C, Smola A eds, Advances in Kernel Methods Support Vector Learing, Cambridge, MA:MIT Press,1998: 169~184.
    [180] Platt J C. Fast Training of SVMs Using Sequential Minimal Optimization. In:Scholkopf B, Burges C J C, Smola A J eds, Advances in Kernel Methods-Support Vector Learning, Cambridge, MA: MIT Press 1998:185~208.
    [181] Ahmed.S.N. Incremental Learning with Support Vector Machines.(IJCAI99), Workshop on Support Vector Machines, Stockholm,Sweden,August 2, 1999.
    [182] N.V.Patel and I.K.Sethi,“video shot detection and characterization for video database”, Pattern Recognition, 1997, 30(4): 583~592.
    [183] S.Tsekeridou and I.Pitas,“Content-base video parsing and indexing based on audio-visual interaction”, IEEE Trans. Circuits and Systems for Video Technology, 2001, 11(4): 522~535.
    [184] A.Hanjalic,“Shot-boundary detection: Unraveled and resolved?”IEEE Trans. Circuits and Systems for Video Technology, 2002, 12(2): 90~105.
    [185] C.L.Huang and B.Y.Liao,“A robust scene-change detection method for video segmentation”, IEEE Trans. Circuits and Systems for Video Technology, 2001, 11(12):1281~1288.
    [186]佟子键等.一种基于有限自动机的渐变镜头检测算法.计算机科学. 2006,33(1): 252~255.
    [187] R.Lienhart,“Reliable dissolve detection”, in Proc.SPIE Storage and Retrieval for Media Databases 2001, January 2001, 4315: 362~374.
    [188] M.S.Drew, Z.N.Li, and X.Zhong,“Video dissolve and wipe detection via spatio-temporal images of chromatic histogram differences”, in Proc. 2000 IEEE Int.Conf.Image Processing, 2000, 3: 148~158.
    [189] A.D.Bimbo,“Visual Information Retrieval”, Morgan Kaufmann Publishers, Inc, San Francisco, California, 1999.
    [190]张婵,高新波,姬红兵.视频关键帧提取的可能性C2模式聚类算法.计算机辅助设计与图形学学报. 2005, 17(9): 2040~2046.
    [191]王方,石须,吴伟鑫.基于自适应阈值的自动提取关键帧的聚类算法.计算机研究与发展. 2005, 42(10): 1752~1757.
    [192]林通,张宏江,封举富,石青云.镜头内容分析及其在视频检索中的应用.软件学报. 2002, 13(8): 1577~1585.
    [193]李远宁,刘汀,蒋树强,黄庆明.基于“bag of words”的视频匹配方法.通信学报. 2007, 28(12): 147~151.
    [194]卢坚,陈毅松,孙正兴,张福炎.语音/音乐自动分类中的特征分析.计算机辅助设计与图形学学报. 2003, 14(3): 233~237.
    [195] L. Lu, Stan Li, H, J. Zhang. Content-based Audio Segmentation Using Support Vector Machines. Proc. of ICME’01, 2001: 956~959
    [196] L. Lu, H. Jiang, H. J. Zhang. Content Analysis for Audio Classification and Segmentation. IEEE TRANSACTIONS ON SPEECH AND AUDIO PRO- CESSING. 2002, 10(7): 165~176.
    [197] Ivan Gonzalez and Alan F.Smeaton.“Spatio-temporal Region Segmentation in Video Sequences”, Technical Report in Center for Digital Video Pocessing of Dublin City University, http://www.cdvp.dcu.ie. 2007.
    [198] R. Vidal and R. Hartley.“Motion segmentation with missing data using power factorization and GPCA.”Proc. Conf. Computer Vision and Pattern Recognition, 2004, 2: 310~316.
    [199] C.R.del-Blanco, F.Jaureguizar, L.Salgado and N.García.“Target Detectionthrough robust motion segmentation and tracking restrictions in aerial Flir images”. In Proceedings of the 14th IEEE International Conference on Image Processing, San Antonio, Texas, USA, 16~19 September 2007.
    [200] X. He, R. S. Zemel, and M. A. Carreira-Perpinan.“Multiscale conditional random fields for image labeling.”Proc. Conf. Computer Vision and Pattern Recognition, 2004, 2: 695~702.
    [201] S. Khan and M. Shah.“Object based segmentation of video using color, motion, and spatial information.”Proc. Conf. Computer Vision and Pattern Recognition, 2001, 2: 746~751.
    [202] T. Adamek and N. O'Connor Interactive Object Contour Extraction for Shape Modeling. 1st International Workshop on Shapes and Semantics, Matsushima, Japan, 2006: 31~39.
    [203] T. Adamek and N. O'Connor“Using Dempster-Shafer Theory to Fuse Multiple Information Sources in Region-Based Segmentation”In Proceedings of the 14th IEEE International Conference on Image Processing/, San Antonio, Texas, USA, 16-19 September 2007.
    [204] H. Greenspan, J. Goldberger, and A. Mayer.“Probabilistic space-time video modeling via piecewise GMM.”IEEE Trans. Patt. Anal. Mach. Intel., 2004, 26: 384~396.
    [205] D.DeMenthon and D.Doermann.“Video retrieval using spatio-temporal descriptors.”Proc. ACM Int’l Conf. Multimedia, 2003: 508~517.
    [206] K. Mc Guinness, G. Keenan, T. Adamek and N. O'Connor“Image Segmentation Evaluation Using an Integrated Framework”. VIE 2007 - Proceedings of the IET 4th International Conference on Visual Information Engineering 2007, London, U.K., 2007.
    [207] J. Lafferty, A. McCallum, and F. Pereira.“Conditional random fields: Probabilistic models for segmenting and labeling sequence data.”Proc. Int’l Conf. Machine Learning, 2001: 282~289.
    [208] D. C He, L Wang. Texture Features based on Texture Spectrum. Pattern Recognition, 1991, 24(5):391~399.
    [209] Ojala T, Pietikainen M, Harwood. A comparative study of texture measure with classification based on feature distributions. Pattern Recognition, 1996, 29(1): 51~59.
    [210]钟玉琢,王琪,贺玉文.基于对象的多媒体数据压缩编码国际标准——MPEG-4及其校验模型.北京:科学出版社,2000.
    [211] J C Huang, W S Hsieh. Automatic feature-based global motion estimation in video sequences. IEEE Trans. Consumer Electronics, 2004, 50(3): 911~915.
    [212]陈韩锋,戚飞虎.基于运动矢量场的双迭代全局运动估计方法.通信学报, 2004, 25(6): 126~131.
    [213]耿玉亮,须德.一种鲁棒的摄像机运动分类算法.电子学报. 2006, 34(7): 1342~1346.
    [214] K.Nigam, J.Lafferty, and A.McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999: 61~67.
    [215] http://www.nuance.com
    [216]郭金林.辅助视频情报分析的字幕探测技术[硕士论文].长沙:国防科学技术大学, 2008.
    [217] Http://www.nlp.org.cn/project/project.php?proj_id=6.
    [218] Salton G. Development in automatic text retrieval. Science, 1991, 253(5023): 974~979.
    [219] P. Resnik,“Using information content to evaluate semantic similarity in a taxonomy,”in Int. Joint Conf. Artificial Intelligence, Montréal, QC, Canada, 1995: 448~453.
    [220] J. Platt,“Probabilities for SV machines,”in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Sch?lkopf, and D. Schuurmans, Eds. Cambridge, MA: MIT Press, 2000: 61~74.
    [221]史忠植.第七章:贝叶斯网络.知识发现.北京:清华大学出版社, 2002: 169-202.
    [222] Pearl. J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc. San Francisco, CA, USA, 1988.
    [223] C. Huang and A. Darwiche. Inference in Belief Networks: A Procedural Guide. International Journal of Approximate Reasoning, 1996, 15(3): 225~263.
    [224] Peterson, James Lyle. Petri Net Theory and the Modeling of Systems. Englewood Cliffs, N.J: Prentice-Hall, 1981.
    [225] Wasfi Al-Khatib, Arif Ghafoor, An approach for video meta-data modeling and query processing. In Proceedings of the seventh ACM international conference on Multimedia, Orlando, Florida, USA, October, 1999: 215 ~224.
    [226] T. D. C. Little, G. Ahanger, R. J. Folz, et al, A digital on-demand video service supporting content-based queries. In Proceedings of the first ACM international conference on Multimedia, Anaheim, California, USA, September, 1993: 427 ~ 436.
    [227] Smeaton, A.F.; Gregan, A., Distributed Multimedia QOS Parameters from Presentation Modeling by Coloured Petri Nets. Lecture Notes in Computer Science; Multimedia, Hypermedia, and Virtual Reality: Models, Systems, and Applications; 1st International Conference, MHVR'94, Moscow, Russia, September, 1994: 47 ~ 60.
    [228] T.Danisman, A.Alpkocak,“Dokuz Eylul University Video Shot Boundary Detection at TRECVID 2006”, in Proc. TRECVID 2006 Workshop, November13-14, 2006: 125~130.
    [229] G.Camara Chavez, F.Precioso, M.Cord, S.Philipp-Foliguet, A.de A.Araujo,“Shot Boundary Detection at TRECVID 2006”, in Proc. TRECVID 2006 Workshop, November 13-14, 2006: 261~268.
    [230] S. Srinivasan, D. Petkovic, D. Ponceleon. Towards Robust Features for Classifying Audio in the CueVideo System. Proc. of the seventh ACM international conference on Multimedia99, 1999: 393~400.
    [231] L. Lu, H. Jiang, H. J. Zhang. Content Analysis for Audio Classification and Segmentation. IEEE Transactions on Speech and Audio Processing, 2002, 10(7): 253~265.
    [232]卢坚,陈毅松,孙正兴,张福炎.基于隐马尔可夫模型的音频自动分类.软件学报. 2002, 13(8):1593~1597.
    [233] Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification.北京:机械工业出版社,2003.
    [234]白亮,老松杨,胡艳丽.支持向量机训练算法比较研究.计算机工程与应用, 2005, 41(17): 79～84.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700