用户名: 密码: 验证码:
飞腾处理器与商用处理器性能比
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Performance comparison between FT-1500A and Intel XEON
  • 作者:方建滨 ; 杜琦 ; 唐滔 ; 陈顼颢 ; 黄春 ; 杨灿群
  • 英文作者:FANG Jian-bin;DU Qi;TANG Tao;CHEN Xu-hao;HUANG Chun;YANG Can-qun;School of Computer,National University of Defense Technology;
  • 关键词:飞腾处理器 ; 微基准测试 ; 性能比
  • 英文关键词:Feiteng processor;;micro benchmarking;;performance comparison
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:国防科技大学计算机学院;
  • 出版日期:2019-01-15
  • 出版单位:计算机工程与科学
  • 年:2019
  • 期:v.41;No.289
  • 基金:国家自然科学基金(61602501)
  • 语种:中文;
  • 页:JSJK201901001
  • 页数:8
  • CN:01
  • ISSN:43-1258/TP
  • 分类号:5-12
摘要
深入分析了飞腾处理器FT-1500A与商用处理器Intel XEON在性能上的差异。在微基准测试层面,评测了两个平台能够达到的最大可获得性能(浮点性能、访存延迟和访存带宽)。在应用层面,选取一个典型的海洋预报数值模拟软件,研究了如何将一个开源代码移植到飞腾处理器和商用处理器上,探讨了该软件在两个平台上的单核性能与多核性能,分析了性能差异的原因并提出了相应的优化建议。认为FT-1500A已经有良好的生态基础(操作系统、编译器和工具链),使得移植典型科学计算程序简单可行,虽然跟商用平台相比,飞腾处理器在性能上存在着差距,但考虑到其在功耗上的优势,飞腾处理器将是一个非常具有应用前景的平台。
        We give an in-depth performance comparison between FT-1500 Aand Intel XEON processors.At the micro benchmarking level,we measure the maximum performance(FLOPS,memory access latency,and bandwidth)that the two platforms can achieve.At application level,we select a typical ocean forecasting numerical simulation software,and study how to port an open source code to FT-1500 A processor and commercial Intel XEON processor,discuss the single-core performance and the multi-core performance of the software on the two platforms,analyze the reasons for performance difference,and propose corresponding optimization suggestions.Overall,we conclude that the FT-1500 Aprocessor already has a good ecosystem basis including operating system,compiler and the related tools,which facilitates the porting process of classical scientific programs.Although there is a noticeable performance slowdown compared to the commercial Intel XEON processor,we argue that FT-1500 Aprocessor is still a promising candidate for future applications especially when power consumption is taken into account.
引文
[1]Saavedra R H,Smith A J.Measuring cache and TLB performance and their effect on benchmark runtimes[J].IEEETransactions on Computers,1995,44(10):1223-1235.
    [2]Peng L,Peir J-K,Prakash T K,et al.Memory hierarchy performance measurement of commercial dual-core desktop processors[J].Journal of Systems Architecture,2008,54(8):816-828.
    [3]Molka D,Hackenberg D,Schone R,et a.Memory performance and cache coherency effects on an Intel NEHALEMmultiprocessor system[C]∥Proc of the 18th International Conference on Parallel Architectures and Compilation Techniques,2009:261-270.
    [4]Volkov V,Demmel J W.Benchmarking GPUs to tune dense linear algebra[C]∥Proc of the 2008ACM/IEEE Conference on Supercomputing,2008:1-11.
    [5]Wong H,Papadopoulou M M,Sadooghi-Alvandi M,et al.Demystifying GPU microarchitecture through microbenchmarking[C]∥Proc of 2010IEEE International Symposium on Performance Analysis of Systems&Software(ISPASS),2010:235-246.
    [6]Thoman P,Kofler K,Studt H,et al.Automatic OpenCL device characterization:Guiding optimized kernel design[C]∥Proc of the 17th International Conference on Parallel Processing-Volume Part II,2011:438-452.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700