飞腾处理器与商用处理器性能比较

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

飞腾处理器与商用处理器性能比较

详细信息查看全文 | 推荐本文 |

英文篇名：Performance comparison between FT-1500A and Intel XEON
作者：方建滨 ; 杜琦 ; 唐滔 ; 陈顼颢 ; 黄春 ; 杨灿群
英文作者：FANG Jian-bin;DU Qi;TANG Tao;CHEN Xu-hao;HUANG Chun;YANG Can-qun;School of Computer,National University of Defense Technology;
关键词：飞腾处理器 ; 微基准测试 ; 性能比较
英文关键词：Feiteng processor;;micro benchmarking;;performance comparison
中文刊名：JSJK
英文刊名：Computer Engineering & Science
机构：国防科技大学计算机学院;
出版日期：2019-01-15
出版单位：计算机工程与科学
年：2019
期：v.41;No.289
基金：国家自然科学基金(61602501)
语种：中文;
页：JSJK201901001
页数：8
CN：01
ISSN：43-1258/TP
分类号：5-12

摘要

深入分析了飞腾处理器FT-1500A与商用处理器Intel XEON在性能上的差异。在微基准测试层面,评测了两个平台能够达到的最大可获得性能(浮点性能、访存延迟和访存带宽)。在应用层面,选取一个典型的海洋预报数值模拟软件,研究了如何将一个开源代码移植到飞腾处理器和商用处理器上,探讨了该软件在两个平台上的单核性能与多核性能,分析了性能差异的原因并提出了相应的优化建议。认为FT-1500A已经有良好的生态基础(操作系统、编译器和工具链),使得移植典型科学计算程序简单可行,虽然跟商用平台相比,飞腾处理器在性能上存在着差距,但考虑到其在功耗上的优势,飞腾处理器将是一个非常具有应用前景的平台。
We give an in-depth performance comparison between FT-1500 Aand Intel XEON processors.At the micro benchmarking level,we measure the maximum performance(FLOPS,memory access latency,and bandwidth)that the two platforms can achieve.At application level,we select a typical ocean forecasting numerical simulation software,and study how to port an open source code to FT-1500 A processor and commercial Intel XEON processor,discuss the single-core performance and the multi-core performance of the software on the two platforms,analyze the reasons for performance difference,and propose corresponding optimization suggestions.Overall,we conclude that the FT-1500 Aprocessor already has a good ecosystem basis including operating system,compiler and the related tools,which facilitates the porting process of classical scientific programs.Although there is a noticeable performance slowdown compared to the commercial Intel XEON processor,we argue that FT-1500 Aprocessor is still a promising candidate for future applications especially when power consumption is taken into account.

引文

[1]Saavedra R H,Smith A J.Measuring cache and TLB performance and their effect on benchmark runtimes[J].IEEETransactions on Computers,1995,44(10):1223-1235.
    [2]Peng L,Peir J-K,Prakash T K,et al.Memory hierarchy performance measurement of commercial dual-core desktop processors[J].Journal of Systems Architecture,2008,54(8):816-828.
    [3]Molka D,Hackenberg D,Schone R,et a.Memory performance and cache coherency effects on an Intel NEHALEMmultiprocessor system[C]∥Proc of the 18th International Conference on Parallel Architectures and Compilation Techniques,2009:261-270.
    [4]Volkov V,Demmel J W.Benchmarking GPUs to tune dense linear algebra[C]∥Proc of the 2008ACM/IEEE Conference on Supercomputing,2008:1-11.
    [5]Wong H,Papadopoulou M M,Sadooghi-Alvandi M,et al.Demystifying GPU microarchitecture through microbenchmarking[C]∥Proc of 2010IEEE International Symposium on Performance Analysis of Systems&Software(ISPASS),2010:235-246.
    [6]Thoman P,Kofler K,Studt H,et al.Automatic OpenCL device characterization:Guiding optimized kernel design[C]∥Proc of the 17th International Conference on Parallel Processing-Volume Part II,2011:438-452.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700