用户名: 密码: 验证码:
X流处理器主机接口部件设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着计算机应用领域不断拓展,流媒体应用及科学计算正成为微处理器的一种重要负载。流体系结构是针对流媒体应用及科学计算的新型高性能的计算机体系结构。本课题所属的项目——X处理器的研制是这种体系结构的一种实现。
     本文选取了流体系结构作为研究目标,设计实现了X流处理器的主机接口部件。在流处理器与外部的通信中,流指令和数据被封装成数据报文的格式,通过主机接口调度给X流处理器,完成流在主处理器和X流处理器之间的流转。
     X流处理器主机接口的设计主要包括模块划分、逻辑设计及模拟验证。论文阐述了主机接口部件的通信协议和通信模块的设计,讨论了主机接口各个功能模块的设计与实现,详细论述了对设计实现的主机接口采用的层次化验证方法及其验证过程。
     X流处理器的主机接口结构采用RTL级Verilog语言描述,并在nc_verilog上对其完成了自底向上的逐层模拟验证测试,通过直接测试向量,高覆盖率的随机测试向量和实际程序的测试,保证了设计的正确性和测试的完备性。流片后的结果显示,X处理器能够正确的工作,达到了预定的目标。
With the expanding area of compute application, stream media and scientific computing is becoming an important kind of application of processor. Stream architecture which aims at media processing and scientific computing is a new high performance architecture. X processor is an implementation of this architecture.
     This paper selects the stream architecture as the goal of research, and aims to design and implement the host interface of the X-stream processor. In the communication between the stream processor and the host, the stream instruction and data are packaged in the form of message packets, and the packets are scheduled to the stream processor by the host interface, so the stream can flow between the host and the stream processor.
     The design of the X-Stream processor host interface mainly includes module partition, logic design, simulation and verification. This paper discusses the communication protocol of the host interface and the design of communication components, realizes the design and verification of all components, and presents the method and process to test the design.
     The host interface of the X-stream processor is described in Verilog hardware language. The simulation and verification test of the design is completed in the NC-Verilog by the down-top methodology. The design has passed standard and random test vectors of high coverage rate and actual testing procedures. These solution ensures the correctness of the design and the completeness of the verification.
引文
[1] MScott Rixner, Stream Processor Architecture. Kluwer Academic Publishers. Boston, MA, 2001
    [2] William J.Dally, Patrick Hanrahan, Mattan Erez et al. Merrimac: Supercomputing with Streams, SC'03, November 15-21, 2003, Phoenix, Arizona, USA
    [3] NVIDIA GeForce FX 5900 GPUs:Powering a New Generation of Graphics,http://www.nvidia.com
    [4] AJ KleinOsowski, John Flynn, Nancy Meares, and David J.Lilja. Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research. In Workshop on Workload Characterization, International Conference on Computer Design (ICCD), September 2000
    [5] Andreas Olofsson and Fredy Lange. A 4.32GOPS 1W general-purpose DSP with an enhanced instruction set for wireless communication. In 2002 International Solid-State Circuits Conference Digest of Technical Papers, pages 54–55,443, 2002
    [6] William J.Dally. Stream Processing Matching VLSI Capabilities to Application Demands. UCSD03, January 13,2003
    [7] VIRAM1: A Media-Oriented Vector Processor with Embedded DRAM", J. Gebis, S. William, C. Kozyrakis, D. Patterson. 41st Design Automation Student Design Contenst, San Diego, CA, June 2004
    [8] C. E. Kozyrakis et al. Scalable Processors in the Billion-Transistors Era: VIRAM.. IEEE Computer, Vol 30 Issue 9,1997.9
    [9] M. B. Taylor et al. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. ISCA2004,2004
    [10] Eylon Caspi, Michael Chu, Randy Huang, Nicholas Weaver, Joseph Yeh,John Wawrzynek, and Andre DeHon. Stream computations organized for reconfigurable execution (score): Extended abstract.In Conference on Field Programmable Logic and Applications, LNCS,pp605~614.Springer-Verlag, August 28-30 2000
    [11] Sibyte. SB-1250 Product Data Sheet, rev 0.2 edition, October 2000
    [12] Lawrence T. Clark, Eric J. Hoffman, Jay Miller, Manish Biyani, Yuyun Liao, Stephen Strazdus, Michael Morrow, Kimberley E. Velarde, and Mark A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. In IEEE Journal of Solid State Circuits, pages 1599–1608, November 2001
    [13] Srinivas K. Raman,Vladimir Pentkovski,Jagannath Keshava,Implementing Streaming SIMD Extensions on the Pentium III Processor,IEEE Micro,2000 7/8
    [14] Texas Instruments. TMS320C6713, TMS320C6713 Floating-Point Digital SignalProcessors, sprs186c-december 2001-revised march 2003 edition, March 2003
    [15] Selliah Rathnam and Gerrit A. Slavenburg. An architectural overview of the programmable media processor, TM-1. In Proceedings of COMPCON, pages 319–326, February 1996
    [16] Thomas Brooks and Findlay Shearer. Communications core meets 3G wireless handset challenges. Wireless Systems Design, pages 51–56, October 2000
    [17] B.khailany, W.J.Dally et al.. Imagine: media processing with streams. IEEE micro, 2001.3
    [18] Karthikeyan Sankaralingam et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS architecture. 30th Annual International Symposium on Computer Architecture, May 2003
    [19] Eylon Caspiet et al. A Streaming MultiThreaded Model. the Third Workshop on Media and Stream Processors. in conjunction with MICRO34, Austin, Texas, 2001.12
    [20] C. E. Kozyrakis. A Media-Enhanced Vector Architecture for Embedded Memory Systems. Report No. UCB/CSD-99-1059, 1999.7
    [21] M. B. Taylor et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, 2002.3
    [22] Peter Mattson. A Programming System for the Imagine Media Processor, Dept. of Electrical Engineering. Ph.D. Thesis ,Stanford University.2001
    [23] Saman Amarasinghe et al. Stream Languages and Programming Models. PACT 2003, September 27, 2003
    [24] Francois Labonte et al, The Stream Virtual Machine, PACT 2004, 2004.9
    [25] Saman Amarasinghe, William. Stream Architectures. PACT 2003, September 27, 2003
    [26] Peter Mattson. Communication Scheduling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 82–92, 2000
    [27] 孙家广等著。计算机图形学(第三版),清华大学出版社,2000
    [28] Mattan Erez,Jung Ho Ahn, Ankit Garg, William J.Dallyet et al. Analysis and Performance Results of a Molecular Modeling Application on Merrimac. SC’04, Pittsburg, Pennsylvania, USA, November 6-12, 2004
    [29] William Dally, Scott Rixner, John Owens, Ujval Kapasi. The Imagine Instruction Set Architecture. http://cva.stanford.edu,August 8, 2002
    [30] 张晨曦,王志英,张春元,戴葵,朱海滨著,计算机体系结构,高等教育出版社,2000
    [31] Scott Rixner,William J. Dally, Brucek Khailany, Peter Mattson, Ujval J. Kapasi, and John D. Owens. Register organization for media processing. In Proceedings ofthe Sixth International Symposium on High Performance Computer Architecture, pages 375-387, January 2000
    [32] 王保恒,肖晓强,张春元,文梅,计算机原理与设计,高等教育出版社,2005.3
    [33] David Goldberg. Computer Arithmetic, Appendix H of “Computer Architecture: A Quantitative Approach “by John Hennessy and David Patterson, Third Edition,page Appendix H. Morgan Kaufmann,机械工业出版社(国内出版),2002.9
    [34] Hema Kapadia, Katayoun Falakshahi, and Mark Horowitz. Arrayof-arrays architecture for floating point multiplication. In Advanced Research in VLSI, pages 150–157, March 1995
    [35] 王金明,杨吉斌编著。数字系统设计与 verilogHDL,电子工业出版社,2002
    [36] Ujval Kapasi, Brucek Khailany,Microcontroller/Cluster Architecture Changes,http://cva.stanford.edu,January 19, 2000
    [37] U. J. Kapasi et al, Efficient Conditional Operations for Data-parallel Architectures, 33rd Annual International Symposium on Microarchitecture pages 159–170,2001
    [38] U.J. Kapasi, Conditional Techniques for Stream Processing Kernels, Ph.D. Thesis, Dept. of Electrical Engineering, Stanford University, 2004
    [39] Patterson D A, Hennessy J L. Computer Architecture: A Quantitative Approach. 3rd ed. Morgan Kaufmann Publish. 2002
    [40] UScott Rixner, William J. Dally, Ujval J. Kapasi, Peter R. Mattson and John D. Owens. Memory access scheduling. In 27th Annual International Symposium on Computer Architecture, pages 128–138, June 2000
    [41] Peter Mattson et al. Imagine Programming System Developer’s Guide. http://cva.stanford.edu, 2002
    [42] Texas Instruments. TMS320C6713, TMS320C6713 Floating-Point Digital Signal Processors, sprs186c-december 2001-revised march 2003 edition, March 2003
    [43] Zhibo Chen, Peng Zhou, Yun He, Fast Integer Pel and Fractional Pel Motion Estimation for JVT, 6th Meeting: Awaji, December 2002
    [44] Karthikeyan Sankaralingam et al. universal mechanism for data-parallel architectures. Proceedings of the 36th International Symposium on Microarchitecture,IEEE Micro,2003
    [45] Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, and Matthew C. French. A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels. ISCA03, 2003
    [46] TI company.Reed Solomon Decoder: TMS320C64x Implementation. Application report, 2000
    [47] S Lin, D.J.Costello. Error Control Coding: Fundamental and Application. Prentice Hall, 1983
    [48] 伍楠,流处理器 MASA 内核的研究及实现, 国防科学技术大学硕士论文,2005年4月
    [49] 王保恒,肖晓强,张春元,文梅,计算机原理与设计,高等教育出版社,2005.3
    [50] John Hennessy & David Patterson, Computer Architecture: A Quantitative Approach Third Edition Morgan Kaufmann 电子工业出版社 2004.7
    [51] U. J. Kapasi et al, Efficient Conditional Operations for Data-parallel Architectures, 33rd Annual International Symposium on Microarchitecture pages 159~170,2001
    [52] Jinwoo Suh, Eun-Gyu Kim, Stephen P. Crago, Lakshmi Srinivasan, and Matthew C. French. A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels. ISCA03, 2003
    [53] U.J. Kapasi, Conditional Techniques for Stream Processing Kernels, Ph.D. Thesis, Dept. of Electrical Engineering, Stanford University, 2004
    [54] MASA 课题研究小组技术报告.doc,内部资料,2005.8

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700