使用GPU模拟地震波传播的性能研究

英文篇名：Performance Study of Seismic Wave Propagation Simulation Using GPU
中文刊名：系统仿真学报
英文刊名：Journal of System Simulation
作者：刘伟峰 ; 王永胜 ; 张天雷 ; 张兵
英文作者：LIU Wei-feng1 ; WANG Yong-sheng2 ; ZHANG Tian-lei1 ; ZHANG Bing3 ; 4 (1.Information and Technology Research Institute ; SINOPEC Exploration & Production Research Institute ; Beijing 100083 ; China ; 2. Center of Seismic Data Processing and Interpretation ; SINOPEC Exploration & Production Research Institute ; Beijing 100083 ; China ; 3. Nanjing Institute of Geophysical and Prospecting ; SINOPEC Exploration & Production Research Institute ; Nanjing ; Jiangsu 210014 ; China ; 4. School of Ocean & Earth Science ; State Key Laboratory of Marine Geology ; Tongji University ; Shanghai 200092 ; China)
中文关键词：地震波传播的数值模拟 ; 地震波可视化 ; 图形处理器 ; 计算统一设备架构
英文关键词：numerical simulation of seismic wave propagation ; seismic wave visualization ; GPU ; CUDA
出版日期：2009-10-23
机构：中国石油化工股份有限公司石油勘探开发研究院信息技术研究所;中国石油化工股份有限公司石油勘探开发研究院地震资料处理与解释中心;中国石油化工股份有限公司石油勘探开发研究院南京石油物探研究所;同济大学海洋与地球科学学院海洋地质国家重点实验室;
年：2009
期：S1
出版单位：系统仿真学报

摘要

地震波传播的高性能数值模拟是地震研究的重要组成部分。通过挖掘地震波传播弹性动力学方程和其有限差分离散的并行性,着重对地震波传播模拟在GPU体系结构上的性能进行研究。提出了使用GPU模拟地震波传播的优化算法,包括GPU上特有的区域分解法和子区域网格上最大化访存联合的两类片内存储器访问方案。实验表明,优化后的GPU实现与使用英特尔线程构建模块优化的双核CPU上的实现相比获得了42倍以上的加速比。
High performance numerical simulation of seismic wave propagation plays an important role in seismic research. In this paper an optimized simulation algorithm of seismic wave propagation on the graphics processing unit (GPU) is presented. Based on parallelism analysis of elastodynamic equations and their finite-difference discretization, emphasis is placed on optimizations directly targeted at GPU architecture to best exploit the computational capabilities available. We discuss the specific implementation details of GPU kernels for domain decomposition method. We also describe two optimized on-chip memory access schemes with maximized memory coalescing for the meshes on the subdomains. The experimental results show that the optimized GPU implementation is more than 42 times faster than an Intel Threading Building Blocks (TBB) optimized dual-core CPU counterpart.

引文

[1]Owens J,Luebke D,Govindaraju N,Harris M,Krüger J,Lefohn A,Purcell T.A Survey of General-Purpose Computation on Graphics Hardware[J].Computer Graphics Forum(S1467-8659),2007,26(1):80-113.
    [2]Deschizeaux B,Blanc J.Imaging Earth's Subsurface Using CUDA[C]//Nguyen H.GPU Gems3.Boston:Addison-Wesley,2008:831-850.
    [3]Komatitsch D,Michéa D,Erlebacher G.Porting a High-order Finite-element Earthquake Modeling Application to NVIDIA Graphics Cards Using CUDA[J].Journal of Parallel and Distributed Computing(S0743-7315),2009,69(5):451-460.
    [4]Virieux J.P-SV Wave Propagation in Heterogeneous Media:Velocity-stress Finite-difference Method[J].Geophysics(S0016-8033),1986,51(4):889-901.
    [5]Reinders J.Intel Threading Building Blocks[M].Sebastopol:O'Reilly Media,2007:180-183.
    [6]吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504.
    [7]nVidia.GeForce GTX285Specification[EB/OL].(2008).http://www.nvidia.com/object/product_geforce_gtx_285_us.html.
    [8]Intel.Microprocessor Export Compliance Metrics[EB/OL].(2008).http://www.intel.com/support/processors/sb/cs-023143.htm.
    [9]刘伟峰,王智广.细粒度并行计算编程模型研究[J].微电子学与计算机,2008,25(10):103-106.
    [10]Young E,Jargstorff F.Image Processing&Video Algorithms with CUDA[C].Santa Clara:nVISION08,2008.
    [11]Andrade D,Brodman J,Fraguela B,Padua D.Hierarchically Tiled Arrays Vs.Intel Threading Building Blocks for Programming Multicore Systems[C].Goteborg:Programmability Issues for Multi-Core Computers,(MULTIPROG'08),in conjunction with HiPEAC'08,2008.
    [12]Andrade D,Fraguela B,Brodman J,Padua D.Task-parallel versus Data-parallel Library-based Programming in Multicore Systems[C].Weimar:17th EUROMICRO International Conference on Parallel,Distributed,and Network-based Processing(PDP2009),2009,101-110.