辅助视频内容分析的音频技术研究与实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

辅助视频内容分析的音频技术研究与实现

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

作者：程捷
论文级别：硕士
学科专业名称：电子与信息工程
中文关键词：多媒体 ; 基于内容 ; 音频辅助 ; 视频 ; 文本分类
英文关键词：Multimedia ; Content-based ; Audio-assistant ; Video ; Text classifying
学位年度：2003
导师：吴玲达
学科代码：081002
学位授予单位：中国人民解放军国防科学技术大学
论文提交日期：2003-03-01

摘要

纵观多媒体研究，在多媒体自动内容处理方面变得越来越重要。自动视频内容分割、视频类型识别、自组织的视频库都是研究者的首选方向。
     但是，目前仅仅根据视频数据来提取视频内容难度很大。单独依靠现有的视频处理技术不能准确地分割视频场景和对视频内容进行分类。而使用对应的音频和文字信息处理来辅助视频流分析，就可以较好地解决视频场景分割和视频内容分类的问题。
     而且，近几年来，音频处理技术发展迅速，语音识别技术已趋于成熟，对于大词汇量连续语音识别率很高。利用语音识别技术和自然语言处理技术对音频流中的语音段进行处理，就可以解决音频内容的提取和分类问题，这样就更有利于检索的进行，进而可以对所对应的视频段进行内容分类，这些都为我们的研究创造了条件。
     本文提出了一种实用而高效的基于内容的音频辅助视频内容分析技术，并实现了较完善的音频辅助新闻视频场景分割和新闻视频内容分类。
     1．通过对音频特性的研究，详细分析了音频的物理和生物学特征，在经典的短时分析的基础上，提出了一系列音频特征的提取方法。
     2．通过对音频数据特点的研究，根据辅助视频分析的需要，提出了一套音频数据的基于内容的分析方法，包括长时间音频分类与分段，说话人改变探测和语音文字内容提取。
     3．结合音频分析方法，提出了一套音频辅助视频分析方法，将视频和音频信息结合起来，实现了有更好效果的视频场景探测和故事切分方法。
     4．通过文本分类方法，实现了视频数据基于内容的分类方法，使得视频媒体的浏览和检索在基于内容方面有更好的效果。
Looking at multimedia research, the field of automatic content processing of multimedia data become more and more important. Automatic cut detection in the video domain genre recognition or automatic creation of digital video libraries are key topic addressed by researcher.
    But at present, prior work on video content analysis is difficult of approach good result using video frequency data alone and is inapplicability to scene segmentation and video content classified. Using corresponding audio and text information to assist video analysis can deal with this problem.
    Further more, in present years, audio processing technology developed rapidly and speech recognition technology already grown up and achieve high veracity in vocabulary speech recognition. Speech recognition and natural language process technology can receive and sort audio content. This is the first step toward retrieval entire video and sort video content.
    In this paper, we deal with the problem of segmenting news video data into semantically coherent scene using audio and video data, besides, classifying of news video content.
    1. According to research of audio, we analyze physics and biology characters and pick up a serial of audio characters on the base of classic short-time analysis.
    2. We develop a serial audio content analysis algorithm about audio classification and segmentation, speaker change detection and speech recognition.
    3. Merging audio analysis method we develop the algorithm about video structure analysis. Fusing video and audio analysis result we realization the method of video scene detection and story segmentation.
    4. Via text classification we develop content-based classification algorithm of video data. In this way we can approach more available result on video browse and retrieval.

引文

[1] E.Wold, T.Blum, and D.Keslar, Content-based classification, search, and retrieval of audio, http://www.musclefish.com/frameset.html
    [2] E.Wold, T.Blum, and D.Keslar, Content-based classification, search, and retrieval of audio, IEEE Multimedia, Fall, 1996, pp.27-36
    [3] J. H. Wright, M. J. Carey, and E. S. Parris. Improved topic spotting through statistical modelling of keyword dependencies. In Proc. ICASSP 95, pages 313-316, Detroit, MI, May 1995． IEEE.
    [4] J. T. Foote, A Similarity Measure for Automatic Audio Classification, In Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Stanford, March 1997．
    [5] Jonathan Foote, Content based retrieval of music and audio, Multimedia Storage and Archiving Systems II, Proc. of SPIE, Vol. 3229, pp. 138-147, 1997．
    [6] Tong zhang and C.C Jay Kuo ,Heuristic Approach For Generic Audio Data Segmentation and Annotation,In Proc ACM'99 ,pp.67-76
    [7] Tong Zhang and C.-C. Jay Kuo, Content-Based Classification and Retrieval of Audio, Proceedings of SPIE's Conference on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego, July, 1998 .
    [8] Tong Zhang and C.-C. Jay Kuo, Hierarchical System for Content-Based Audio Classification and Retrieval, Proceedings of SPIE's Conference on Multimedia Storage and Archiving Systems III, SPIE Vol.3527, p398-409, Boston, Nov.,1998．
    [9] Ruben Gonzalez and Kathy Melih, Content based retrieval of audio, ATNAC'96 Proceedings.
    [10] J. T. Foote, "An Overview of Audio Information Retrieval." ACM-Springer Multimedia Systems. In press.
    [11] M. G Brown, J. T. Foote, G J. F. Jones, K. Sp?rck Jones, and S. J. Young, Automatic Content-Based Retrieval of Broadcast News, In Proc. ACM Multimedia 95, San Francisco, November 1995．
    [12] Yuh-Lin Chang, I.Kamel and R.Alouso, Integrated image and speech analysis for content-based video indexing. IEEE Inter. Conf. on Multimedia Computing and Systems, June 1996, Japan.
    [13] ISO/IEC JTC1/SC29/WG11, MPEG-7 applications document v.7, Mpeg 98/N2462, Atlantic City, Oct. 1998．
    [14] McNab R.J., Smith L.A. and Witten I.H, Signal processing for melody transcription, Australasian Computer Science Conference, pp 301-307, Melbourne, Australia; January. 1996．


    [15] McNab R.J., Smith L.A., Witten I.H., Henderson C.L. and Cunningham SJ, Towards the digital music library: tune retrieval from acoustic input, Proc Digital Libraries '96, pp 11-18． 1996．
    [16] McNab RJ., Smith L.A., Witten I.H. and Henderson C.L, Tune retrieval in the multimedia library, submitted to Multimedia-Tools and Applications, 1996．
    [17] Hauptmann, A.G, "Speech Recognition in the Informedia Digital Video Library: Uses and Limitations." ICTAI-95 7th IEEE International Conference on Tools with AI, Washington, DC, Nov 6-8,1995．
    [18] Hauptmann, A., Witbrock, M., Rudnicky, A., and Reed, S., "Speech for Multimedia Information Retrieval." UIST-95 Proceedings of User Interface Software and Technology, Pittsburgh, PA, November, 1995．
    [19] Wactlar, H., Hauptmann, A., Witbrock, M.,"Informedia: News-on-Demand Experiments in Speech Recognition." Proceedings of ARPA Speech Recognition Workshop. Arden House, Harriman, NY, Feb 18-21,1996．
    [20] Hauptmann, A.G and Witbrock, M., Informedia: News-on-Demand-Multimedia Information Acquisition and Retrieval, in Maybury, M. (ed). Intelligent Multimedia Information Retrieval (Postscript available.) 1997
    [21] J.T.Foote, GJ.FJones, K.Sparck Jones and S J.Young,Talker-independent keyword spotting for information retrieval, Proceedings of EUROSPEECH-95, Madrid,Spain, pp2145-2148, September 1995
    [22] M.GBrown, J.T.Foote, GJ.FJones, K.Sparck Jones and S.J.Young, Automatic Content-Based Retrieval of Broadcast News, Proceedings of ACM International Conference on Multimedia, San Francisco, U.S.A., pp35-43, November 1995
    [23] M.GBrown, J.T.Foote, GJ.FJones, K.Sparck Jones and S.J.Young, Video Mail Retrieval by Voice: An Overview of the Cambridge/Olivetti Retrieval System, 2nd ACM International Conference on Multimedia Workshop on Multimedia Data Base Management, San Francisco, U.S.A., pp47-55, October 1994
    [24] J.T.Foote, M.GBrown, GJ.F.Jones, K. Sparck Jones and S.J.Young, Video Mail Retrieval by Voice: Towards Intelligent Retrieval and Browsing of Multimedia Documents, Proceedings of the First International Workshop on Intelligence and Multimodality in Multimedia Interfaces: Research and Applications (IMMI-1) , Edinburgh, Scotland, July 1995
    [25] S. J. Young, M. G Brown, J. T. Foote, G J. F. Jones, and K. Sp?rck Jones. "Acoustic Indexing for Multimedia Retrieval and Browsing." In Proc. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 199-202, Munich, Germany, April 1997．
    [26] J. T. Foote, G J. F. Jones, K. Sp?rck Jones, and S. J. Young, "Robust Talker-Independent Audio Document Retrieval." In Proc. 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA, May 1996．
    [27] J. T. Foote, G J. F. Jones, K. Sp?rck Jones, and S. J. Young, "Talker-Independent Keyword Spotting For Information Retrieval." In Proc. Eurospeech 95, Madrid, September 1995．
    [28] D.Kimber and L.Wilcox, Acoustic segmentation for audio browsers, Proceedings of

    Interface conference, Sydney, Australia, July 1996.
    [29] S.Pfeiffer, S.Fischer and W.Effelsberg, Automatic audio content analysis, Technical report TR-96-008, University of Mannherim, Germany, April 1996.
    [30] A.Ghias et al, Query by humming, ACM Multimedia '95,San Francisco, November 1995.
    [31] Kenney Ng, Towards robust methods for spoken document retrieval. Proceedings of Int. Conf. on Spoken Language Processing, 1998.
    [32] 庞剑锋(Pallg jianfeng) 卜东波(Bu dongbo) 白硕(Bai shuo)。基于向量空间模型的文本自动分类系统的研究与实现
    [33] A Robust Audio Classification and Segmentation Method, Lie Lu, Hao Jiang and HongJiang Zhang Microsoft research, China.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700