面向地震数据处理的并行与分布式编程框架
详细信息 本馆镜像全文    |  推荐本文 | | 获取馆网全文
摘要
本文提出了一个适用于地震资料处理的并行与分布式编程框架GeoPF。该框架构建在集群系统之上,采用粗粒度数据并行执行模型,它可以调度串行语言编写的处理模块,同时运行在多个计算节点或者单个节点内的多个CPU核上,隐藏了计算节点及其CPU核的调度、通讯与节点故障恢复、模块之间的数据传输等并行编程细节。经过实验评估,GeoPF框架从串行到并行的线性加速性能有所提高,处理相同任务的时间从21h33min缩减到15min27s,效果显著。GeoPF与商用的地震数据处理系统相比,在业务流程方面有一些相同特点,其不同之处就是GeoPF的处理模块具有自动并行特点,而大部分地震处理模块只能是串行方式。
In this paper a parallel and distributed programming framework called GeoPF which applies in seismic data processing was presented.The framework was built on cluster system,coarse grain data parallel was used to execute modeling,it could dispatch processing modules which was programmed by string language,and it run on multi computational nodes or multi-CPU cores of a single node,it concealed many parallel programming details,such as computational nodes and dispatch of their CPU cores,communication and node failure recovery,data transmission between modules and so on.It can be seen from the test evaluation that acceleration performance from string to parallel for GeoPF framework is raised,the time to process a same task was dramatically deduced from 21h 33min to an amazing 15min 27s.Compared with commercial seismic data processing systems,both of them have same characteristics for their flowcharts,but the processing modules of the GeoPF can automatically run parallelly,most modules in commercial seismic data processing systems only can run in string mode.
引文
[1]张军华,仝兆岐.地震资料处理中的并行计算机技术(综述).物探化探计算技术,2002,(01):31~37
    [2]Sudhakar Yerneni,Suhas Phadke,Dheeraj Bhardwaj,Subrata Chakraborty,Richa Rastogi.Imaging subsurface geology with seismic migration on a computing cluster.Current Science,2005,88(3):468~474
    [3]Herb Sutter.The free lunchis over:a fundamental turn toward concurrency in software.Dr.Dobb's Journal,2005,30(3):http://www.gotw.ca/publications/con-currency-ddj.htm
    [4]Sutter H,Larus J.Software andthe concurrency revolu-tion.ACMQueue,2005,3(7):54~62
    [5]Steven Fraser,Dennis Mancl.No silver bullet:software engineering reloaded.IEEE Software,2008,25(1):91~94
    [6]Krste Asanovic,Ras Bodik,James Demmel,Tony Keave-ny,Kurt Keutzer,John D Kubiatowicz.The Parallel Com-puting Laboratory at U.C.Berkeley:a research agenda based on the Berkeley view.Technical Report,No.UCB/EECS-2008-23.2008.http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-23.html
    [7]L de Alfaro,T A Henzinger.Interface theories for component-based design.In:Proc.of EMSOFT2001,Tahoe City,CA,Springer-Verlag,2001,148~165
    [8]Michael I Gordon,William Thies,Saman Amarasinghe.Exploiting coarse-grained task,data,and pipeline paral-lelismin stream programs.In:Proceedings of the12th international conference on Architectural support for programming languages and operating systems,New York,NY,USA,ACMPress,2006,151~162
    [9]Michael D Beynon,Tahsin Kurc,Umit Catalyurek,Chialin Chang,Alan Sussman,Joel Saltz.Distributed processing of very large datasets with DataCutter.Par-allel Computing,2001,(27):1457~1478
    [10]Michael Isard,Mihai Budiu,Yuan Yu,Andrew Birrell,Dennis Fetterly.Dryad:Distributed data-parallel pro-grams from sequential building blocks.In:European Conference on Computer Systems(EuroSys),Lisbon,Portugal,2007
    [11]Remzi H Arpaci-Dusseau,Eric Anderson,Noah Treu-haft,David E Culler,Joseph M Hellerstein,David Patterson,Kathy Yelick.Cluster I/O with River:mak-ing the fast case common.In:Proceedings of the SixthWorkshop on Input/Output in Parallel and Distributed Systems(IOPADS'99),Atlanta,Georgia,1999,10~22
    [12]Eduardo Pinheiro,Wolf-Dietrich Weber,Luiz AndrBarroso.Failure trends in a large disk drive population.In:5th USENIX Conference on File and Storage Tech-nologies,2007,17~29
    [13]王宏琳.地震软件技术——勘探地球物理计算机软件开发.北京:石油工业出版社,2005
    [14]Douglas Thain,Todd Tannenbaum,Miron Livny.Dis-tributed computing in practice:The Condor experience.Concurrency and Computation:Practice and Experi-ence,2004,(17):323~356
    [15]Elizabeth Shriver,Christopher Small,Keith A Smith.Why does file systemprefetching work-In:Proceedings ofthe annual conference on USENIX Annual Technical Conference,1999,6~6
    [16]Jeffrey Dean,Sanjay Ghemawat.MapReduce:Si mplified data processing onlarge clusters.Communications ofthe ACM,2008,51(1):107~113
    [17]William Gropp,Ewing Lusk,Anthony Skjellum.Using MPI:Portable Parallel Programming with the Mes-sage-Passing Interface.Cambridge,MA:MIT Press,1999
    [18]CGGVeritas Corporation,CGG World2000,at URL:ht-tp://www0.cgg.com/corporate/publications/cggworld/cggw31/cggw31.pdf
    [19]Data Processing&Imaging,Paradigm Corporation,ht-tp://www.paradigmgeo.com/Content.aspx-id=47
    [20]WesternGeco,Ω-Suite,WesternGeco Corporation,http://www.westerngeco.com/content/services/dp/omega
    [21]CubeManagerTM-Petroleum Geo-Services,PGS Corpora-tion,http://www.pgs.com/Geophysical-Services/Data-Processing/Technology
    [22]Glenn Chubak,Igor Morozov.Integrated software frameworkfor processing of geophysical data.Computers&Geosciences,2006,32(9):1403~1410
    [23]AE Murillo,J Bell,Distributed Seismic Unix:a tool for seismic data processing.Concurrency:Practice and Ex-pericence,1999,11(4):169~187
    [24]Geist A,Beguelin A,Dongarra J,Jiang W,Manchek R,Sunderam V.PVM:Parallel Virtual Machine:a Users'Guide and Tutorial for Networked Parallel Compu-ting.Cambridge,MA:MIT Press,1994
    [25]Chu L,Tang H,Yang T and Shen K.Opti mizing data aggregation for cluster-basedinternet services.In:Proc.ofthe ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.San Diego,Califor-nia:ACM,2003,119~130
    [26]Sanjay Ghemawat,Howard Gobioff,Shun-Tak Leung.The Google File System.In:Proceedings of the19th ACM Symposium on Operating Systems Principles,2003,20~43

版权所有:© 2023 中国地质图书馆 中国地质调查局地学文献中心