用户名: 密码: 验证码:
四维矩阵视频编码及音视频同步技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机网络技术和数字通信技术的迅速发展,多媒体应用已经深入到人们生活的各个领域。越来越多的数据量使媒体数据的存储和传输都成为严重问题。另一方面,多媒体系统结合了多种媒体类型,各个类型之间都存在时间约束关系,必须在数据处理中维持各个媒体对象之间的时间关系,才能保证用户不会遗漏和误解多媒体数据所要表达的信息内容。这使得视频编码和音视频的同步技术已成为多媒体技术的焦点。
     H.264标准是ITU-T和ISO/IEC联合制定的最新编码标准,继承了H.263和MPEG1/2/4视频标准协议的优点,在各个主要的功能模块内部使用了一些先进的技术,提高了编码效率。基于H.264,提出了四维矩阵同步编码模型。包括四维矩阵预测模式编码,子阵划分,四维DCT,量化,音视频同步控制方法,DCT系数的重排序,熵编码。熵编码中,提出了四维矩阵上下文的变长编码方法,以全面去除彩色视频各象素之间、各彩色分量之间以及连续帧之间的相关性,从而实现高信噪比条件下的高倍压缩。
     音视频同步控制方法中,提出了嵌入式音视频同步编码传输算法。将音频压缩码流作为隐藏信号嵌入到视频图像的DCT系数中,然后进行视频的压缩编码。与常用的时间戳的同步模型相比,嵌入式算法没用使用系统时钟,在对视频图像的质量影响较小,节省了传输资源的情况下实现了音视频的同步编码传输。
With the rapid development of multimedia remote sensing, images processing and application technology, video images and audio data are more and more. Many applications of digital video, such as video conferencing, video on demand, distance learning, remote medical treatment, need transmit a large of video images and audio data. To store the video and audio data need enormous storage capacity. If no compression code, there will much difficulty in their store and transmission. Therefore, video compression code technology is one of key issues of related fields.On the other hand, difference from tradition media, multimedia system combines different type media. Among the different type media, exist time relation, to ensure user not miss and misunderstand the media information, must keep time relation of every media object, That is to say, must control the synchronization of different media. Thereout, video compression coding and synchronization technology of audio and video have become the key technologies of multimedia applications. The synchronized technique of video and audio solves synchronized sampling, compression, synchronized transfer, information reception and synchronized play back mainly.
     From the first generation to the second generation of coding technique, video coding develops rapidly. Some new video coding techniques and new standards have been proposed in recent years, the most famous are H.26x and MPEG series. Most of these standards are based on the method of inter-frame motion compensation and two-dimensional discrete cosine transform (2D-DCT) and encode and describe the color video in YCbCr format, which want to take advantage of human visual system (HVS) to save bit expense by decreasing the resolution of two color difference components. Then, and each channel is compressed independently. In fact, the color components (R, G, B) may be strongly correlated. Even after the transformation, brightness and chroma might have correlation. As well known, the three frames of a color image are unified reflection of the same physical model. They have the same texture, edge and gradient of varying gray-level. Each frame can reflect almost all information of an image except color. Human vision characteristics express that the relation of the severalty components is nonlinear. It is clear that if we compress Y, U, V data separately, the inherent correlation between color components can not be utilized efficiently, limiting both compress ratio and PSNR performance. Four- dimension matrix is adopted to represent color video. Thus, color video can be represented in a unified mathematic model. And 4DM-DCT is used to wipe off correlation between neighboring pixels in the same image and adjacent spatial frames.
     Based on H.264 video coding standard, a four-dimension audio and video synchronization coding model is set up first. It includes the following parts: four-dimension sub matrix motion compensation predictive coding, four- dimension DCT, sub-matrix quantization, audio and video synchronized control algorithm , DCT coefficients reset and entropy coding.
     In the entropy coding process, four- dimension matrix contex variable length coding is presented. After four- dimension zigzag scan, DCT coefficients are in a descend order. During inverse order coding process, based on the previous coded coefficient, choose an appropriate code table for the current coefficient. The coding method includes two parts: describe coding and coefficients coding. Coefficients coding includes±1 coefficients coding, no-zero and no±1 coefficients coding, and 0 value among no-zero coefficients coding.
     The optimization of parameters which affect performance of the encoder and decoder are studied by experiments. Finally, comparisons and analysis are made between the proposed method in the thesis and other video coding methods include 2D-DCT/motion compensation, vector quantization (VQ) and Huffman coding. Experiment results show that, the PSNR and the compression ratio (CR) of the proposed algorithm is much better than the traditional method, but lower than VQ, For the relative still video, the CR of the context-based 4D matrix video coding is lower than the Huffman coding under the same parameters and the same PSNR, but for the relative complex moving video, the result is reverse. Experimental results prove that the proposed method has better compression effect for video coding based on 4D matrix.
     On audio and video synchronized control, embedding synchronized coding for audio and video scheme is proposed. Audio compress bit stream is hiding data, embedded into video images DCT medium frequency coefficients. Then the hybrid signal is encoded and transmitted. In the decoder, audio bits are extracted from DCT coefficients, and audio and video are reconstructed respectively to playback. Three embedding coding schemes are introduced: 1. Relation of two DCT coefficients embedding scheme
     Two mid-frequency quantization coefficients in a sub-block denoted as BQ ( x1 y1z1t1) and BQ ( x2 y2z2t2) are chosen. Adjust the relation of two coefficients to embed audio bits. If audio bit is 0, modify BQ ( x1 y1z1t1), let its absolute value bigger or equal to the absolute value of BQ ( x2 y2z2t2), otherwise, modify BQ ( x2 y2z2t2), let its absolute value bigger the absolute value of BQ ( x1 y1z1t1).experiments results show that, synchronization of audio and video can be achieved perfect, but the embedding expending of some video sequences exceed 3% that of MPEG-2.
     2. Fixed point DCT coefficients embedding scheme Choose a fixed point DCT coefficient, If audio bit is 1, modify the coefficient, let absolute value bigger than 1 after quantization process, otherwise, let the coefficient absolute value is less than 1 after quantization process. Then, audio bits embedded. Experiments results show that, PSNR value larger than scheme 1 and embedding expending is less than 3%.
     3. DCT coefficient parity embedding scheme Based on the parity of fixed point DCT coefficient, audio bits are embedded into video images. Both single channel and stereo audio data are embedded in this scheme. Experiments result show that, the PSNR of reconstructed images which embedded into single channel audio only descend about 0.2dB compared with which embedded into stereo audio, the scheme is the most perfect one of the three schemes.
     In embedding synchronization coding schemes, all the three can extract exact audio bits and achieve audio and video synchronization transmission with very little degradation of reconstructed image quality and save transmission channel.
引文
[1] 赵岩,陈贺新。彩色视频的四维矩阵离散余弦变换编码[J]。中国图象图形学报, 2003,8A(6):620-624
    [2] 精英科技,视频压缩与音频编码技术,中国电力出版社,2001
    [3] A.Skodras, C.Christopoulos and T.Ebrahimi “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, Sep. 2001:36-58
    [4] S.W.Golomb “Run-length encoding,” IEEE Trans. On Information Theory, 1996,12(3): 399-401
    [5] W.Berghorn, T. Boskamp et al “Context conditioning and run-length coding for hybrid, embedded progressive image coding”, IEEE Trans. on Image Processing, 2001, 10(12):1791-1800
    [6] G.Lakhani “Modified JPEG Huffman coding,” IEEE Trans. on Image Processing, 2003, 12(2):159-169
    [7] Chia-Chen Kuo; Ming-Syan Chen; Jeng-Chun Chen. An adaptive transmission scheme for audio and video synchronization based on real-time transport protocol[C]. 2001 IEEE International Conference on Multimedia and Expo (ICME 2001), Tokyo, Japan, 22-25 Aug. 2001: 403-406
    [8] D.Chevion, E.D.Karnin and A.C.Walach “High efficiency, multiplication free approximation of arithmetic coding,” Proc. of Data Compression Conference, 1991:43-52
    [9] 徐孟侠. 图像编码的进展. 通信学报,1993,14(2):40-47
    [10] 徐孟侠. 数字图像通信及图像压缩编码的进展. 通信学报,1992,13(6):97-99
    [11] Dai Yang, Hongmei Ai et al “Adaptive Karhunen-Loeve transform for enhanced multichannel audio coding,” Proc. of SPIE Mathematics of data/image coding, compression, and encryption IV, with applications, 2001,Vol.4475:43-54
    [12] N.Ahmed, T.Natarajan and K.R.Rao “Discrete cosine transform,” IEEE Trans. on Computer, 1974, C-23: 90-93
    [13] A.B.Watson, “Image compression using the discrete cosine transform,” Mathematica Journal, 1994, 4(1):81-88
    [14] S.A.Mallat “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 1989, 11(7):674-693
    [15] W.Sweldens “The Lifting Scheme: A custom-design construction of biorthogonal wavelets,” Appl.Comut.Harmon.App.l, 1996, 3(2):186-200
    [16] I.Daubechies and W.Sweldens “Factoring wavelet transforms into lifting steps,”Journal of Fourier Analysis and Application, 1998, 5(3):245-267
    [17] A.R.Calderbank, I.Daubechies et al “Wavelet transforms that map integers to intergers,” Applied and Computational Harmonic analysis, 1998, 5(3):332-369
    [18] 杨长生. 图象与声音压缩技术. 浙江大学出版社,2001 年 4 月
    [19] 陈廷标,夏良正. 数字图像处理. 人民邮电出版社,1994:138-155
    [20] 王东生,曹磊. 混沌、分形及其应用. 中国科技大学出版社,1995 年
    [21] ITU-T Recommendation G.711 “Pulse code modulation (PCM) of voice frequencies,” Nov.1988
    [22] Yap-Peng Tan, Senior Member, IEEE, and Haiwei Sun.”Fast Motion Re- Estimation for Arbitrary Downsizing Video Transcoding using H.264/AVC Standard”, IEEE Transaction on Consumer Electronics, Vol.50, No.3, August 2004:887-894
    [23] 张骥祥,戴居丰,张春田。数字视频处理中块匹配运动估计技术。电视技术,2004(5):28-30
    [24] F.Parke “Parameterized-models for facial animation,” IEEE Computers Graphics and Application Magazine, 1982, 2(6):61-68
    [25] R.Forchheimer “Image coding: From waveforms to animation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989, 37(12):2008-2023
    [26] 沈兰荪 图象编码与异步传输 人民邮电出版社,1998 年 5 月.
    [27] ISO/IEC 11172(MPEG1 DIS). Coding of moving pictures and associated audio for digital storage media at up to about 1.5Mbps. Nov. 1991
    [28] T.Koya, K.Iinuma, A Hirano, Y.Iiyima, and T.Ishi-guro. “Motion-compensated inter-frame coding for video”, conferencing in Proc.NTC81, New Orleans, LA, Nov.1981: C9.6.1–C9.6.5
    [29] S.kappagantula and K.R.Rao. “Motion compensated inter-frame image prediction”, IEEE Trans. Commun, Sept.1985, COM-33:1011-1015
    [30] J.R.Jain and A.K.Jain. “Displacement measurement and its application in inter-frame image coding”, IEEE Trans. Commun, Dec.1981, COM-29:1799-1808
    [31] Ghanbari. “The cross-search algorithm for motion estimation”, IEEE Trans. Commun, July 1990, COM-38(7):90-95
    [32] W.Lee, J.F.Wang, J.Y.Lee, and J.D.Shie. “Dynamic search-window adjustment and interlaced search for block-matching algorithm”, IEEE Trans. CASVT, Feb.1993:85-87
    [33] Li,B.Zeng and M.Liou, “A. new three-step search algorithm for block motion estimation”, IEEE Trans.CASVT, Aug.1994, 4(4):438-442
    [34] M.Po and W.C.Ma. “A novel four-step algorithm for fast block motion estimationl”, IEEE Trans. CASVT, June 1996, 6(3):674-693
    [35] Thomas M.Foltz and Byron M.Welsh. “Symmetric Convolution of Asymmetric Multidimensional Sequences Using Discrete Trigonometric Transforms”, IEEETransactions on Image Processing, May. 1999,8(5): 640-651
    [36] Michaels Vrchel and Michael Unser. “Multichonnel Restoration with A Limited Priori Information”, IEEE Transactions on Image Processing, Apr. 1999, 8(4): 671-684
    [37] Qing Yang and Song Dema. “Intrinsic Multiscale Representation Using Optical Flow in the Scale-space”, IEEE Transactions on Image Processing, Mar. 1999, 8(3): 632-645
    [38] Fan Changxin and Lu Zhaoyang. “Recent Progress in Image Coding”, Journal of China Institute of Communnications, May 1998, 19(5): 143-156
    [39] CCITT Recommendation H.261. “Video codec for audio visual services at p×64kbits/s”[S].1990
    [40] ITU-T H.261. “Video codec for audio visual services at p×64kbits/s”[S].1993
    [41] ITU-T H.263. “Video coding for low bit rate communication”[S].1998
    [42] ISO/IEC 11172-2. “Coding of moving pictures and associated audio for digital storage media at up or about 1.5 Mbit/s (MPEG VIDEO)”[S]. 1993
    [43] ISO/IEC 13818-2. “Generic coding of moving pictures and associated audio information (MPEG2 video)”[S].1996
    [44] ISO/IEC 14496-2. “Information technology-Coding of audio-visual objects-Part2: Visual”[S].1999
    [45] ITU-T H.264. “Advanced video coding for generic audiovisual services”[S].2003
    [46] http://www.vcodex.com. H.264/MPEG-4 part10 white paper
    [47] I.Burnett, R.Van de Walle et al “MPEG-21:goals and achievements,” IEEE Multimedia, 2003:60-70
    [48] ISO/IEC JTC1/SC29/WG11 TR 21000-1“Information technology-Multimedia Framework MPEG-21 Part 1: Vision, technologies and strategy,” Nov.2001:1-32
    [49] 姚庆栋,毕厚杰,王兆华,徐孟侠。图像编码基础,清华大学出版社:2006年8月
    [50] M.Eskicioglu, P.S.Fisher and S.Chen “Image quality measures and their performance,” IEEE Trans. on Communications, 1995, 43(12):2959-2965
    [51] Avcibas, B.Sankur and K.Sayood “Statistical evaluation of image quality measures,” Journal of Electronic Imaging, 2002, 11(2):206-223
    [52] 黄铁军,高文“AVS 标准制定背景与知识产权状况”,数字电视与数字视频,2005(7):4-7
    [53] 胡瑞敏,艾浩军,张勇,“数字音频压缩技术和 AVS 音频标准的研究”,数字电视与数字视频,2005(7):21-23
    [54] 王明伟,“A V S 中的音视频编码压缩技术”,数字电视与数字视频,2006(6):13-16
    [55] Roger C.Schank, “Active Learning through Multimedia”, IEEE Multimedia Magazine, 1994, Vol.30 (5):69-78
    [56] Ming Ouhyoug, Wen-chin.Chen, et al. “The MOS Multimedia E-mail System”, Proceeding of IEEE Multimedia, 1994:315-324
    [57] William J. Clark “Multipoint Multimedia Conferencing”, IEEE Communications Magazine, May 1992, 30 (5):44-50
    [58] K.H.Smith, “Accessing Multimedia Network Service”, IEE Communication Magazine, May 1992, 30 (5):72-80
    [59] P.Venkat Tangan,Harrick M.Vin, and Srinivas Ramanathan, “ Designing An On-Demand Multimedia Service”, IEEE Communications Magazine, July 1992:56-64
    [60] 卢选民,“基于 TPN 的多媒体动态同步模型及同步控制机制研究”,西北工业大学博士学位论文,2002
    [61] 宋军,顾冠群,“多媒体同步”,中国图象图形学报,1997,2(6):405-409
    [62] Horn, H. and Stefani, J.B, On programming and Supporting Multimedia Object Synchronization,The computer Journal,1993,36 (1):4-18
    [63] Blair,G A,N etworkIn terfaceU nitto S upportC ontinuousM edia,IEEE Journal on Selected Areas in Communications,1993,11(2):264-275
    [64] Eun, S.B, “Specification of Multimedia Composition and a Visual Programming Environment”, Proc.Of ACM Multimedia’ 93, June 1993, 167-173
    [65] Anderson, D. P and Homsy,G, “A Continuous media I/O Server and Its Synchronization Mechanism”, IEEE Computer,1991,24 (10):51-57
    [66] Isidor Kouvelas, Vicky Hardman etc. “Lip synchronization for use over the Internet: analysis and implementation”, In Proc. IEEE GLOBECOM'96: 893-89
    [67] Hodges M. E., “A construction set for multimedia applications”. IEEE Software, 1989:37-43
    [68] Gibbs S, “Composite multimedia and active objects”, Proc. of OOPSLA'91, 1991: 97-112
    [69] Little T. D.C., Ghafoor “A. Synchronization and storage models for multimedia objects”. IEEE Journal on Selected Areas in Communications,1990, 4, 8 (3):413- 422
    [70] Hoepner P., “Synchronizing the presentation of multimedia objects.” Computer communications, 1992, 15 (9): 557-564
    [71] Allen JF. “Maintaining knowledge about temporal intervals,” Communications of the ACM, 1983, 26(11):832-840
    [72] Meira, S.R.L., Moura, A.E.L. “A scripting language for multimedia presentations”, Multimedia Computing and Systems, 1994., Proceedings of the International Conference on,Boston, MA, USA,15-19 May 1994:484-489
    [73] Shepherd D, Salmony. “Extending OSI to support synchronization required by multimedia applications”. Computer Communications, 1990. 9, 13 (7) :399-406
    [74] Von Rossum G, Jansen J., Mullender K. S. et al. “CMIFed: a presentation environment for portable hypermedia documents”. Proceedings of ACM Multimedia’93, USA:Anaheim, 1993,8: 183-188
    [75] G.Lakhani “Modified JPEG Huffman coding,” IEEE Trans. on Image Processing, 2003, 12(2):159-169
    [76] Ramanathan, S.and R angan, P. V. , “Feedback Techniques for Intra-Media Continuity and Inter-Media Synchronization in Distributed Multimedia System”, The Computer Journal, 199 3, 36 (1):19-31
    [77] Vukoutic, M .Z .and Niemegeers, I. G, “Multimedia Communication System: Upper Layers in the OSI Reference Model”, IEEE Journal on Selected Areas in Communications, 1992, 10 (9) :1397-1402
    [78] G. Blakowski and R. Steinmetz. “A media synchronization survey: reference mode, specification and case studies”. IEEE Journal on Selected Areas in Communications. 1996,14(1):5-35
    [79] Escobar J, Deutsch D, Partridge C. “Flow Synchronization Protocol”. GLOBECOMp92[C]. Orlando, Florida: IEEE Common Soceity , 1992. 1381-1387
    [80] Ferrari D. “Delay Jitter Control Scheme for Packet2Switching Internet2Works” [J]. Computer Common , 1992 , 15(6) : 367-373
    [81] Rangan P. V., Ramanathan S , Vin H. M, Technique for Multimedia Synchronization in Network File Systems[J] . Computer Commun., 1993 , 16(3) : 168-176
    [82] Lu G.J., Pung H. K., Chua T.S, “Temporal Synchronization Support for Distributed Multimedia Information Systems” [J]. Computer Common , 1994 , 17(12) : 852-862
    [83] Rothermel K, Helbig T. “An Adaptive Protocol for Synchronizing Media Streams” [J]. Multimedia Systems , 1997 , 5(5):324-336
    [84] Biersack E, Geyer W. “Synchronized Delivery and Payout of Distributed Stored Multimedia Streams” [J]. Multimedia Systems, 1999, 7(1):70-90
    [85] Emilia Stoica, Hussein Abdel-Wahab and Kurt Maly, “Synchronization of Multimedia Streams in Distributed Environments”, International conference on Multimedia Computing and Systems '97, Ottawa, Ont., Canada, Jan. 1997: 395-402
    [86] Chia-Chen Kuo; Ming-Syan Chen; Jeng-Chun Chen. “An adaptive transmission scheme for audio and video synchronization based on real-time transport protocol”[C]. 2001 IEEE International Conference on Multimedia and Expo (ICME 2001), Tokyo, Japan, 22-25 Aug. 2001: 403-406
    [87] Sridevi Palacharla, Ahmed Karmouch, Samy A. Mahmoud , “Design and Implementation of a Real-time Multimedia Presentation System using RTP”, Computer Software and Applications Conference, (COMPSAC '97), 1997:376-381
    [88] Hua Zhu, Imrich Chlamtac, Jorge Cobb, and Guoping Zheng. “SMART: A Synchronization Scheme for Providing Multimedia Quality in the Emerging Wireless Internet”[C]. IEEE Vehicular Technology Conference, Orlando, FL, USA. October 2003:525-528
    [89] Herng-Yow Chen,Nien-Bao Liu, Chee-Wen Shiah,”A Novel Multimedia Synchronization Model and Its Applications in Multimedia System”, IEEE Transactions on Consumer Electronics,Vol.41,No.1,Feb 1995
    [90] Yong Xie, Changdong Liu, Myung J. Lee, ”Adaptive Multimedia Synchronization in a Teleconference System”, Multimedia Syst. 1999,7(4):326-337
    [91] 吴炜,常义林。“一种 MPEG2 媒体同步控制算法”[J],系统工程与电子技术,2005, 27(1):173-177
    [92] 张伟,周源华,叶玮. “HDTV 系统音视频显示时间标签同步生成和自适应交织策略”[J], 上海交通大学学报,2003,37(6):832-835
    [93] 虞正华, 余松煜, 楚明,“MPEG-2 解码中的音视频同步及其实时实现”,上海交通大学学报, 1998,32(9):100-102
    [94] T. Chen, "Audio visual Speech Processing: Lip Reading and Lip synchronization". IEEE Signal Processing Magazine, Jan, 2001:9-21
    [95] Miquel Mujal,R.Lynn Kirlin,”Compression Enhancement of Video Motion of Mouth Region Using Joint Audio and Video Coding”, Fifth IEEE Southwest Symposium on Image Analysis and Inerpretation(SSIAI’02), 2002:82-86
    [96] Sumedha Kshirsagar, Nadia Magnenat-Thalmann, “Lip Synchronization Using Linear Predictive Analysis”, IEEE International Conference on Multimedia and Expo, 2000, (ICME 2000), Vol.2: 1077-1080.
    [97] Tsuhan Chen, “Lip Synchronizaton Using Speech-Assisted Video Processing”, IEEE Signal Processing Letters, Apr.1995, 2(4),57-59
    [98] Tsuhan,Chen ,Hans Peter Graf, Barry Haskell, et al., “Speech-Assisted Lip Synchronization inAudio-VideoCommunications,1995 International Conference on Image Processing (ICIP'95), Vol.2:579-582
    [99] Y.W. Lee, R. K. Ward, F.Kossentini, and M. J. T. Smith. “Very low rate DCT- based video coding using dynamic VQ”. in Proc. ICIP’96, Sept.1996: 669 -672
    [100] S. Gadkari, and K. Rose. “Robust vector quantizer design by noisy channel relaxation”. IEEE Trans. Commun., 1999, vol. 47: 1113 -1116,
    [101] W. Hwang, F. Lin, and C. Lin. “Fuzzy Channel-Optimized Vector quantization”. IEEE Commun. Letters, 2000, 4(12):408-410
    [102] 赵岩,彩色视频的四维矩阵模型理论及压缩编码研究,吉林大学博士毕业论文,2003
    [103] 杜相文,面向对象的彩色视频四维矩阵 DCT 编码,吉林大学博士毕业论文,2005
    [104] Smoot,S., and L.A.Rowe, “Study of DCT Coefficient Distributions”[C], SPIE2657, Human Vision and Electronic Imageing,1996:403-411

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700