摘要
深入分析了飞腾处理器FT-1500A与商用处理器Intel XEON在性能上的差异。在微基准测试层面,评测了两个平台能够达到的最大可获得性能(浮点性能、访存延迟和访存带宽)。在应用层面,选取一个典型的海洋预报数值模拟软件,研究了如何将一个开源代码移植到飞腾处理器和商用处理器上,探讨了该软件在两个平台上的单核性能与多核性能,分析了性能差异的原因并提出了相应的优化建议。认为FT-1500A已经有良好的生态基础(操作系统、编译器和工具链),使得移植典型科学计算程序简单可行,虽然跟商用平台相比,飞腾处理器在性能上存在着差距,但考虑到其在功耗上的优势,飞腾处理器将是一个非常具有应用前景的平台。
We give an in-depth performance comparison between FT-1500 Aand Intel XEON processors.At the micro benchmarking level,we measure the maximum performance(FLOPS,memory access latency,and bandwidth)that the two platforms can achieve.At application level,we select a typical ocean forecasting numerical simulation software,and study how to port an open source code to FT-1500 A processor and commercial Intel XEON processor,discuss the single-core performance and the multi-core performance of the software on the two platforms,analyze the reasons for performance difference,and propose corresponding optimization suggestions.Overall,we conclude that the FT-1500 Aprocessor already has a good ecosystem basis including operating system,compiler and the related tools,which facilitates the porting process of classical scientific programs.Although there is a noticeable performance slowdown compared to the commercial Intel XEON processor,we argue that FT-1500 Aprocessor is still a promising candidate for future applications especially when power consumption is taken into account.
引文
[1]Saavedra R H,Smith A J.Measuring cache and TLB performance and their effect on benchmark runtimes[J].IEEETransactions on Computers,1995,44(10):1223-1235.
[2]Peng L,Peir J-K,Prakash T K,et al.Memory hierarchy performance measurement of commercial dual-core desktop processors[J].Journal of Systems Architecture,2008,54(8):816-828.
[3]Molka D,Hackenberg D,Schone R,et a.Memory performance and cache coherency effects on an Intel NEHALEMmultiprocessor system[C]∥Proc of the 18th International Conference on Parallel Architectures and Compilation Techniques,2009:261-270.
[4]Volkov V,Demmel J W.Benchmarking GPUs to tune dense linear algebra[C]∥Proc of the 2008ACM/IEEE Conference on Supercomputing,2008:1-11.
[5]Wong H,Papadopoulou M M,Sadooghi-Alvandi M,et al.Demystifying GPU microarchitecture through microbenchmarking[C]∥Proc of 2010IEEE International Symposium on Performance Analysis of Systems&Software(ISPASS),2010:235-246.
[6]Thoman P,Kofler K,Studt H,et al.Automatic OpenCL device characterization:Guiding optimized kernel design[C]∥Proc of the 17th International Conference on Parallel Processing-Volume Part II,2011:438-452.