基于深度图像绘制的自由视点视频关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于深度图像绘制的自由视点视频关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Key Technology of Free-View Video on the Basis of Depth Image Rendering
作者：郁理
论文级别：博士
学科专业名称：电路与系统
中文关键词：自由视点视频 ; 立体匹配 ; DIBR虚拟视点绘制 ; 3D立体显示
英文关键词：Free-view video ; Stereo match ; DIBR view synthesis ; 3D Stereo display
学位年度：2010
导师：郭立
学科代码：080902
学位授予单位：中国科学技术大学
论文提交日期：2010-04-01

摘要

基于深度图像绘制(Depth-image Based Rendering, DIBR)的自由视点视频能为用户提供一定范围内任意的视点画面,有望成为下一代数字视频标准。DIBR自由视点视频的实现涉及到图像校正、立体匹配、多视点编解码和虚拟视点绘制等关键技术。目前,这些技术存在着诸多问题：现有的图像校正方法未考虑到视点间的缩放、旋转等问题,在鲁棒性和高效性上尚待提高；当前的立体匹配算法效率低下,且在遮挡、低纹理等区域难以准确匹配；虚拟视点绘制方法在处理画面中存在的伪影和空洞方面仍然存在较多缺陷,影响了视点的绘制质量。
     本论文针对DIBR自由视点视频中存在的问题,提出了一种非对称的编解码方案,研究内容涉及到了高效立体匹配、鲁棒性图像校正、多视点编解码、高质量虚拟视点绘制等关键技术的算法实现,并对3D立体显示方法展开了探讨。
     本论文的主要工作内容和创新点如下：
     1)针对DIBR中的存在问题,提出了一种DIBR自由视点视频的编解码方案,该方案将立体匹配,图像校正,多视点编码等计算繁重的部分交予发送方完成,接受方仅需负责相对简单的解码与绘制。这种不对称的编解码配置方案充分考虑到了发送方与接受方的计算能力,接受方仅需轻量级的计算,降低了用户接受显示设备的要求。
     2)针对多视点间的旋转,缩放,遮挡等问题,提出了一种基于SURF特征点匹配的颜色校正方法,实验表明,该方法具有较好的鲁棒性,通过颜色匹配,可以有效提高多视点编码的压缩效率,并能改善虚拟视点的绘制质量以及立体观看的舒适度。
     3)针对当前全局立体匹配算法效率低下以及在遮挡区域难以准确匹配的问题,提出了一种高效的基于分层置信传播的立体匹配新方法。该方法首先采用局部匹配获得初始代价,并根据初始匹配的结果检测遮挡和误匹配像素；其次,使用平面估计对初始代价进行了修正,以改善遮挡区域的匹配效果；最后使用改进的分层置信传播快速估计最小化能量,降低了低纹理区域的误匹配,提高了整体的平滑度。实验表明,该方法在匹配准确度和速度上具有较高的性能。
     4)针对DIBR虚拟视点合成中存在的伪影和空洞,提出一种基于深度和图像的视点绘制新方法。首先,该方法使用视点同步生成机制得到虚拟视点的图像和深度信息,并根据深度信息擦除背景伪影；其次,通过基于深度的空洞填补和边界处理方法进一步消除视点图像中的空洞和前景边缘失真。实验结果表明,与MPEG的3DV／FTV标准参考方法相比,本文生成的虚拟视点PSNR值提高了2dB,画面效果明显优于标准参考方法。
Free viewpoint video can provide users a range of arbitrary viewpoint by depth-image based rendering (Depth-image Based Rendering, DIBR), which is expected to be the next generation of digital video standards. The realization of DIBR free viewpoint video is related to the key technologies such as image correction, stereo matching, multi-view video coding and virtual view synthesis. There are many problems in the implementation of these technologies:the scaling and rotation problems between viewpoints is not be considered in the existing image correction methods, the robustness and efficiency need to be further improved; the current stereo matching algorithm is inefficient, and it is difficult to match in the occlusion or low-texture regions accurately; virtual view synthesis method still defect in dealing with artifacts and holes, which affects the rendering quality.
     To solve those problems, this dissertation proposed a non-symmetrical DIBR code/decode solution, which contains efficient stereo matching, robust image correction, multi-viewpoint coding/decoding, high-quality virtual viewpoint rendering, and the method of 3D stereoscopic display.
     The main work and innovation of this dissertation are as follows:
     I) With the analysis of the key technologies, a code/decoder solution of DIBR free viewpoint video is proposed. In this solution, the sender will be in charge of the heavy computing tasks such as stereo matching, image correction and Multi-viewpoint Video coding, the receiver only needs to decode the videos and rendering the virtual viewpoint. This configuration reduces the display device requirements
     2)Proposed a color correction method based on SURF feature point matching to solve the occlusion, scaling, and rotation problems. which improves the multi-viewpoint coding compression efficiency, and improve the quality of virtual view.
     3) Aiming at the problem of low efficiency and mismatching caused by occlusion in global stereo matching, this dissertation proposed a novel method of stereo matching with high efficiency. Firstly, by initial local matching, occlusion and mismatched pixels can be checked out:then, the result of plane estimation is used to correct the initial match cost, thus pixels in occlusion region would be refined:at last, an improved hierarchical belief-propagation is used to estimate the minimal global energy to reduce the mismatching in low texture region and enhance the smoothness. Experimental results demonstrate the outstanding performance of the proposed method.
     4) For existing artifacts and holes in DIBR virtual viewpoint synthesis, this dissertation proposed a new rendering method on the basis of depth and image. Firstly, this dissertation presents a simultaneous rendering mechanism to generate the depth and image of virtual view at the same time, and then the ghosting is erased by the depth comparison; secondly, by means of depth-based hole filling and boundary treatment, the holes and boundary distortion is removed. Experimental results demonstrate that, by our method, the PSNR of virtual view improves 2 dB than MPEG-3DV/FTV reference method.

引文

杨铀,郁梅,蒋刚毅,“交互式三维视频系统研究进展”计算机辅助设计与图形学学报：第21卷第5期,2009,569-578页
    范良忠,蒋刚毅,郁梅,“自由视点电视的光线空间实现方法”计算机辅助设计与图形学学报,第18卷第2期,2009,170-179页
    A. Fusiello, E. Trucco, and A. Verri. A compact algorithm for rectification of stereo pairs. Machine Vision and Applications,12(1),2000, p16-22
    A. Smolic, K. Muller, P. Merkle, C. Fehn, P. Kauff, P. Eisert, and T. Wiegand, "3D video and free viewpoint video-Technologies, applications and MPEG standards," ICME 2006, Jul.2006.
    A. Smolic, Hideaki Kimata, Anthony Vetro, "Development of MPEG Standards for 3D and Free Viewpoint Video", SPIE Conference Optics East 2005:Communications, Multimedia & Display Technologies, Vol.6014, pp.262-273,2005
    A. Smolic, Hideaki Kimata, Anthony Vetro, "Development of MPEG Standards for 3D and Free Viewpoint Video", SPIE Conference Optics East 2005:Communications, Multimedia & Display Technologies, Vol.6014, pp.262-273, November 2005
    A. Smolic, and D. McCutchen, "3DAV Exploration of Video-Based Rendering Technology in MPEG". IEEE Trans. on Circuits and Systems for Video Technology, Vol.14, No.3, pp. 348-356, March 2004.
    A.Telea. An image inpainting technique based on the fast marching method. Proceedings. of Journal of Graphics Tools,2004(9):25-36
    C. L. Zitnick S. B. Kang, M. Uyttendaele, S. Winder, and R.Szeliski, "High-Quality Video View Interpolation Using a Layered Representation", ACM SIGGRAPH and ACM Trans. on Graphics, Los Angeles, CA, USA, August 2004.
    Chih-Wei Tang,"Spatiotemporal Visual Considerations for Video Coding" IEEE Trans. on Multimedia, Vol.9, NO.2,2007
    Comaniciu D, Meer P. Mean Shift:A robust approach toward feature space analysis IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24 (5):603-619
    D Scharstein, R. Szeliski A taxonomy and evaluation of dense two frame stereo correspondence algorithms.International Journal of Computer Vision,2002,47 (1/2/3):7-42
    D Min, D,.Kim,K.Sohn. Virtual view rendering system for 3DTV. Proceedings of 3DTV Conference:The True Vision-Capture. Transmission and Display of 3D Video,2008:249-252
    E. Martinian, A. Behrens, J. Xin, A. Vetro, and H. Sun, "Extensions of H.264/AVC for multiview video compression," in Proc. Int. Conf. Image Process., Oct.8-11,2006, pp.2981-2984.
    E. Martinian, A. Behrens, J. Xin, and A. Vetro, "View synthesis for multiview video compression," in Proc. Picture Coding Symp., Beijing,2006.
    E.H. Adelson, J.R. Bergen, The plenoptic function andthe elements of early vision, in:M. Landy, J. Anthony Movshon (Eds.), Computational Models of VisualProcessing, The MIT Press, Cambridge, MA,1991 (Chapter 1).
    Felzenszwalb P F, Huttenlocher D P. Efficient belief propagation for early vision. In:Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA,2004,1:261-268.
    Franco J, Boyer E.2008. Efficient Polyhedral Modeling from Silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence,31 (3):414-427.
    Fehn C.'A 3D-TV system based on video plus depth information " Proceedings of the 37th A silo mar Conference on Signals, Systems, and Computers, PacificGrove,2003:1529-1533
    H Bay, T Tuvtellars, Van Gool L. SURF:speeded up robust features. Proceedings of the European Conference on Computer Vision,2006, p404-417.
    ISO/IEC JTCI/SC29/WG11, "Applications and Requirements for 3DAV", Doc. N5877. Trondheim, Norway, July 2003.
    ISO/IEC JTC1/SC29/WG11, "Report on 3DAV Exploration". Doc. N5878, Trondheim, Norway, July 2003.
    ISO/IEC JTC1/SC29/WGI I, "ISO/IEC 14496-16/PDAMI", Doc. N6544, Redmont, WA, USA, July 2004.
    ISO/IEC JTC1/SC29/WGI I, Reference Softwares for Depth Estimation and View Synthesis, Doc. M15377,2008
    J. Shade, S. Gortler, L.W. He, and R. Szeliski, "Layered Depth Images", Proc. SIGGRAPH'98, Orlando, July 1998.
    Kanade T, Okutom M. A stereo matching algorithm with an adaptive window:theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence,1994,16 (9): 920-932.
    K. J. Yoon and I.S. Kweon, Locally Adaptive Support-Weight Approach for VisualCorrespondence Search, In:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, California, USA,2005, Vol. Ⅱ:924-931
    Kolmogorov V, Zabih R. Computing visual correspondence with occlusions using graph cuts. In: Proceedings of International Conference on Computer Vision. Vancouver, Canada,2001, Ⅱ: 508-515.
    Klaus A, Sormann M, Karner K, Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure. In:Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong, China,2006,Vol 3,Page(s):15-18
    K. Mueller, A. Smolic, P. Merkle, M. Kautzner, and T. Wiegand, "Coding of 3D Meshes and Video Textures for 3D Video Objects", Proc. PCS 2004, Picture Coding Symposium, San Francisco, CA, USA, December 15(?)17.2004.
    L. McMillan."An image-based approach to three-dimensional computer graphics". Ph.D.Thesis. Chapel Hill:University of North Carolina,1995. An image-based approach to three-dimensional computer graphics
    Lou J G, Cai H, Li J. Interactive multiview video delivery based on IP multicast. Advances in Multimedia,2007:article ID 97535
    M.Flierl and B. Girod, "Multiview Video Compression". IEEE SIGNAL PROCESSING MAGAZINE,2007
    M. L. Gong, R. Yang, L. Wang, and M. W. Gong, A performance study on different cost aggregation approaches used in real-time stereo matching, International Journal of Computer Vision,2007,75(2):283-296
    M. Tanimoto, "Overview of FTV(free-virepoint television)" Proceedings of the 2009 IEEE international conference on Multimedia and Expo,2009, Pages:1552-1553
    M. Magnor. and B. Girod, "Data Compression for Light-Field Rendering", IEEE Trans. on Circuits and Systems for Video Technology, vol.10, no.3, p338-343,2000.
    M. Levoy, P.Hanrahan.1996. Light field rendering. Computer Graphics(SIGGRAPH'96), August 1996, pp.31-42.
    M Tanimoto, "Free Viewpoint Television-FTV", Proc. PCS 2004, Picture Coding Symposium, San Francisco, CA, USA, December 15.-17.2004.
    M Tanimoto, M Wildeboer. Frameworks for FTV coding, Proceedings of the 27th conference on Picture Coding Symposium,2009:1-4
    Middlebury stereo matching evaluation site..http://vision.middlebury.edu/stereo
    Microsoft 3D video Sequences. http://research.microsoft.com/en-us/um/people/sbkang/3dvideodownload/
    Matusik W, Pfister H.3DTV:a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. ACM Transactions on Graphics,2004,23 (3): 814-824
    Oliveira M.M. Bishop G. McAllister D. Relief Texture Mapping. Proceedings of the 27th annual conference on Computer graphics and interactive techniques,2000:359-368
    P. Debevec, C. Taylor, and J. Malik, "Modeling and rendering architecture from photographs:A hybrid geometry-and image based approach", Proceedings of SIGGRAPH 1996, pp.11-20, 1996.
    P. Eisert, E. Steinbach, and B. Girod, "Automatic Reconstruction of Stationary 3-D Objects from MultipleUncalibrated Camera Views", IEEE Transactions on Circuits and Systems for Video Technology, Vol.10, No.2,pp.261-277, March 2000.
    P. J. Narayanan, Sashi Kumar Penta, Sireesh Reddy K.:Depth Texture Representation for Image Based Rendering. ICVGIP 2004:p113-118
    Pulli K, Cohen M, Duchamp T, et al. View-based Rendering:Visualizing Real Objects from Scanned Range and Color Data. Eurographics Rendering Workshop,1997:23-24
    Petrovic G, de With P H N. Near-future streaming framework for 3D-TV applications. Proceedings of IEEE International Conference on Multimedia and Exposition, Toronto,2006: p1881-1884
    Q. Yang, L. Wang, R. Yang. Real-time global stereo matching using hierarchical belief propagation.In:Proceedings of the British Machine Vision Conference 06, British,2006, Vol 111:989
    Q. Yang, L. Wang, R. Yang, Stereo Matching with Color-Weighted Correlation, Hierarchical Belief Propagation and Occlusion Handling, IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,Vol 31, No.3
    R. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision. Cambridge, U.K.: Cambridge Univ. Press,2000.
    R.Y. Tsai, "A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV camera and lenses", IEEE Journal of Robotics and Automation, Vol. RA-3, No.4, August 1987.
    Redert A, de Beeck M O, Fehn C, et al. ATTEST:advanced three-dimensional television system technologies Proceedings of International Symposium on 3D Data Processing Visualization and Transmission, Thessaloniki,2002:313-319
    R. I. Hartley, Theory and practice of projective rectification, International. Journal of Computer Vision 35 (2) (1999) 115-127
    Sun J, Zheng NN, Shum H Y. Stereo matching using belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25 (7):787-800.
    S. E. Chen and L. Williams, "View interpolation for image synthesis." Proc. SIGGRAPH. pp. 279-288,1993.
    S. M. Seitz, and C. R. Dyer, "Photorealistic Scene Reconstruction by Voxel Coloring", International Journal of Computer Vision,35(2),1999, pp.151-173.
    S Jo, D Lee, Y Kim, et al, Development of a simple free viewpoint video system. Proceedings of International Conference on Multimedia and Expo,2008:1577-1580
    S.U.Yoon,S.Y.Kim,E.K.Lee A framework for representation and processing of multiview video using the concept of layered depth image Journal of VLSI Signal Processing Systems for Signal Image and Video Technology 46,(2007)432.
    Tao H, Sawhney H S, Kumar R, A global matching framework for stereo computation In: Proceedings International Conference on Computer Vision, Vancouver, Canada,2001,1:532-539.
    T. Fujii, M. Tanimoto, "Free-Viewpoint TV System Based on Ray-Space Representation", SPIE ITCom Vol.4864-22, pp.175-189 (2002).
    Vetro A, Matusik W, Pfister H, et al, Coding approaches for end to end 3D TV systems Proceedings of Picture Coding Symposium, San Francisco,2004
    Veksler O. Fast variable window for stereo correspondence using integral images. In:Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madision, Wisconsin, USA,2003:556-561.
    Wang Zeng-Fu, Zheng Zhi-Gang, A region based stereo matching algorithm using cooperative optimization. In:Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Alaska, USA,2008, Page(s):1-8
    Wei-chao Chen. Light field mapping:Efficient representation and hardware rendering of surface light fields. ACM Transactions on Graphics.2002.
    Yebin Lin, Qionghai Dai, Wenli Xu.2006. A Real Time interactive Dynamic Light Field Transmission System. Multimedia and Expo,2006 IEEE International Conference on 9-12 July 2006, Pages 2173-2176.
    Yizong Cheng:Mean Shift, Mode Seeking, and Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8):790-799(1995)
    Y.Mori, N. Fukushima, T.Fujii, et al. View generation with 3D warping using depth information for FTV. Signal Processing:Image Communication,2009,24:65-72
    Young-Chang Chang and John F. Reid, "RGB calibration for color image analysis in machine vision", IEEE Transactions on Image Processing, Vol.5, No.10, October 1996, pp.1414-1422.
    Z Arican, S Yea, A Sullivan, A Vetro, Intermediate view generation for perceived depth adjustment of stereo video-Proceedings of SPIE,2009

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700