使用GPU模拟地震波传播的性能研究
详细信息 本馆镜像全文    |  推荐本文 | | 获取馆网全文
摘要
地震波传播的高性能数值模拟是地震研究的重要组成部分。通过挖掘地震波传播弹性动力学方程和其有限差分离散的并行性,着重对地震波传播模拟在GPU体系结构上的性能进行研究。提出了使用GPU模拟地震波传播的优化算法,包括GPU上特有的区域分解法和子区域网格上最大化访存联合的两类片内存储器访问方案。实验表明,优化后的GPU实现与使用英特尔线程构建模块优化的双核CPU上的实现相比获得了42倍以上的加速比。
High performance numerical simulation of seismic wave propagation plays an important role in seismic research. In this paper an optimized simulation algorithm of seismic wave propagation on the graphics processing unit (GPU) is presented. Based on parallelism analysis of elastodynamic equations and their finite-difference discretization, emphasis is placed on optimizations directly targeted at GPU architecture to best exploit the computational capabilities available. We discuss the specific implementation details of GPU kernels for domain decomposition method. We also describe two optimized on-chip memory access schemes with maximized memory coalescing for the meshes on the subdomains. The experimental results show that the optimized GPU implementation is more than 42 times faster than an Intel Threading Building Blocks (TBB) optimized dual-core CPU counterpart.
引文
[1]Owens J,Luebke D,Govindaraju N,Harris M,Krüger J,Lefohn A,Purcell T.A Survey of General-Purpose Computation on Graphics Hardware[J].Computer Graphics Forum(S1467-8659),2007,26(1):80-113.
    [2]Deschizeaux B,Blanc J.Imaging Earth's Subsurface Using CUDA[C]//Nguyen H.GPU Gems3.Boston:Addison-Wesley,2008:831-850.
    [3]Komatitsch D,Michéa D,Erlebacher G.Porting a High-order Finite-element Earthquake Modeling Application to NVIDIA Graphics Cards Using CUDA[J].Journal of Parallel and Distributed Computing(S0743-7315),2009,69(5):451-460.
    [4]Virieux J.P-SV Wave Propagation in Heterogeneous Media:Velocity-stress Finite-difference Method[J].Geophysics(S0016-8033),1986,51(4):889-901.
    [5]Reinders J.Intel Threading Building Blocks[M].Sebastopol:O'Reilly Media,2007:180-183.
    [6]吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504.
    [7]nVidia.GeForce GTX285Specification[EB/OL].(2008).http://www.nvidia.com/object/product_geforce_gtx_285_us.html.
    [8]Intel.Microprocessor Export Compliance Metrics[EB/OL].(2008).http://www.intel.com/support/processors/sb/cs-023143.htm.
    [9]刘伟峰,王智广.细粒度并行计算编程模型研究[J].微电子学与计算机,2008,25(10):103-106.
    [10]Young E,Jargstorff F.Image Processing&Video Algorithms with CUDA[C].Santa Clara:nVISION08,2008.
    [11]Andrade D,Brodman J,Fraguela B,Padua D.Hierarchically Tiled Arrays Vs.Intel Threading Building Blocks for Programming Multicore Systems[C].Goteborg:Programmability Issues for Multi-Core Computers,(MULTIPROG'08),in conjunction with HiPEAC'08,2008.
    [12]Andrade D,Fraguela B,Brodman J,Padua D.Task-parallel versus Data-parallel Library-based Programming in Multicore Systems[C].Weimar:17th EUROMICRO International Conference on Parallel,Distributed,and Network-based Processing(PDP2009),2009,101-110.

版权所有:© 2023 中国地质图书馆 中国地质调查局地学文献中心