网络处理器设计的若干关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

网络处理器设计的若干关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Some Key Techniques in the Design of Network Processors
作者：张晓明
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：网络处理器 ; 设计空间开发 ; 报文并行调度 ; 报文缓冲 ; 协处理器
英文关键词：Network processor ; Design space development ; Packet parallel scheduling ; Packet buffering ; Co-Processor
学位年度：2006
导师：张民选 ; 孙志刚
学科代码：081203
学位授予单位：国防科学技术大学
论文提交日期：2006-04-01

摘要

为支持不断增长的网络应用业务,网络设备越来越呈现出智能化处理特点。智能化处理不仅要求网络设备具有多层交换、安全处理和流量管理等功能,还必须具有强大的协议处理能力和灵活的可编程性,能够快速适应新型业务的添加和配置。因此,基于ASIP(Application Specific Instruction Processor)技术的网络处理器在网络设备中得到了广泛的应用,成为推动下一代互联网发展的核心器件。
     本文以网络处理器系统设计与实现为目标,从系统设计的角度研究网络处理器的早期设计和性能评价方法,并深入研究了网络处理器中若干关键的实现技术。主要创新点包括:
     (1)针对网络处理器设计评价和方案优选困难的问题,结合MPSoC(Multi-ProcessorSystem on Chip)系统设计和网络报文处理特点,提出了网络处理器设计空间开发框架YH-NPDF(YinHe Network Processor Design Framework)。该框架基于平台化设计思想,采用反应式数据流进程网络RDPN(Reactive Dataflow Process Network)描述网络应用,并与参数化硬件体系结构模型相结合评价网络处理器的处理性能,通过整体退火遗传算法快速搜索设计空间和优选系统设计方案。YH-NPDF在网络应用建模、硬件资源模型和设计方案优选等方面能够较好地适应网络处理器设计和开发中的智能化报文处理需求。
     (2)针对并行结构的网络处理器,提出基于模糊反馈控制环(F2CL,Fuzzy FeedbackControl Loop)的报文并行调度算法。该算法使用F2CL机制改善系统负载均衡状况;采用流cache缓存报文流的调度信息,在负载不均衡时优选调节重负载流,在流超时情况下允许对同一流内的后继报文实施重映射,从而有效控制报文乱序。实验结果表明,该算法能够在保持负载均衡的同时获得较好的报文保序效果,综合性能优于目前已有的同类算法。
     (3)针对网络处理器中报文缓冲的特点,提出基于流水输入/并行输出(PIPO,Pipelining Input and Parallel Output)的多通道报文缓冲结构。PIPO采用流水输入调度技术处理输入端的写请求序列,采用并行输出调度技术调度输出端的读请求序列,同时采用访问策略优化输入输出端口的存储访问效率。与传统的FCFS调度方法相比,PIPO具有更高的带宽利用率和更低的输入输出端口瞬时带宽抖动。
     论文还介绍了在Altera FPGA上基于SopC(System on Programmable Chip)的网络处理器原型实现。该原型包含4个微处理器核,通过软件控制和协处理器加速可支持4个千兆以太网接口。利用该原型,论文对并行处理结构中的指令集扩充和协处理器共享机制进行了深入分析和探讨。同时对文中提出的F2CL调度算法等关键技术进行了验证。本文的工作对网络处理器的设计具有重要的指导意义。
With the development of network applications, network devices need more intelligent processing capability. This requires network devices to have various functions (e.g. multi-layer switching, security processing and traffic management) as well as powerful protocol processing capability and programmability, so that the novel network services can be quickly deployed and configured in these network devices. Thus network processors (NPs) which are based on the technology of Application Specific Instruction Processor (ASIP) emerge timely and are widely used in network domains to meet these requirements. NPs have already become one of the core devices in the next-generation Internet.
     This dissertation focuses on the issues of system design and implementation of NPs. The early design method and performance evaluation of NPs are presented on the standpoint of system design, and several key implementation technologies of NPs are investigated in-depth in this dissertation. The main contributions of the dissertation are as follows:
     (1) Aiming at the optimal decision and performance evaluation of system design in NPs, the YinHe Network Processor Design Framework (YH-NPDF) is constructed according to the characterization of Multi-Processor System on Chip (MPSoC) design and requirements of network packet processing. The YH-NPDF is based on the idea of platform-based design. It adopts the Reactive Dataflow Process Network (RDPN) model to describe network applications and establishes the parameterized model of NPs' hardware resources, where application model is mapped into the parameterized architecture model of NPs to evaluate the NP performance. The global annealing genetic algorithm is used to accelerate the search of design space and to optimize the design decision of NP system. The YH-NPDF can be used to model network applications and hardware resources and support optimal decision to meet the requirements for intelligent packet processing in early system design of NPs.
     (2) In network processors based on parallel processing elements (PEs), a packet parallel scheduling algorithm based on Fuzzy Feedback Control Loop (F2CL) is proposed. This algorithm uses F2CL schemes to improve the degree of load balancing among multiple processing elements, and also deploys a flow cache to buffer the scheduling information of packet streams. The packet reordering is effectively controlled by using the following two methods: when the workloads among PEs become unbalanced, the algorithm prefers to adapt the heavy-loaded flows; the successive packets belonging to the same flow can be remapped to another PE in case of flow timeout. The simulation results show that this algorithm with the well-chosen design parameters can gain preferable effects on packet ordering while preserving load balancing, and has better overall performance on load balancing and packet ordering when compared with other algorithms.
     (3) Based on the characteristics of the packet buffer memory in NPs, a multi-channel packet buffer memory system with the scheme of Pipelining Input and Parallel Output (PIPO) is proposed. PIPO schedules the write-required sequence with pipelining on the input and processes the read-required sequence in parallel on the output. Both actions in PIPO use memory access policy to improve the effectiveness of memory access. The effectiveness of PIPO, adaptive capacity of variable packet length and extensibility of buffer bandwidth are evaluated by theoretical analysis and simulation experiments with extrapolated workloads. Compared with traditional memory scheduling schemes of packet buffering such as FCFS, PIPO gains better effectiveness of memory access and higher utility ratio of buffer bandwidth, meanwhile incurs less jitters of instantaneous bandwidth on both inputs and outputs.
     Furthermore, the prototype system of network processor based on SoPC (System on Programmable Chip) is implemented on Altera FPGA. Four soft processor cores (i.e. Altera Nios II) are embedded into the prototype chip which can support four 1000Mbps Ethernet interfaces through co-processor acceleration under software control. Instruction set extension and co-processor sharing schemes for parallel processing architecture of NPs are analyzed and evaluated in depth in the prototype. Meanwhile, the F2CL-based packet scheduling algorithm is verified. The work in this dissertation can serve as an important guideline for the design of NPs.

引文

[1] Tim Kogel, Heinrich Meyr. Heterogeneous MP-SoC -The Solution to Energy-Efficient Signal Processing. DAC'04, June 7-11,2004

    [2] V Zivkovic, E Deprettere P Wolf, E Kock. Design space exploration of streaming multiprocessor architectures. In: Proceedings of the IEEE Workshop on Signal Processing Systems (SIPS), 2002. 228-234
    [3] S Rajagopal, Ulrich Rckert.S Rajagopal, Design space exploration for real-time embedded stream processors.IEEE micro, 2004, 24(4): 54-66
    [4] Matthias Grnewald, Jrg-Christian Niemann, Mario Porrmann. A framework for design space exploration of resource efficient network processing on multiprocessor SoCs. In: Proceedings of the the 3rd Workshop on Network Processors & Applications, Madrid Spain: Morgan Kaufmann Publishers, 2004
    [5] Robin Melnick and Keith Morris. "AMCC - nPcore~(TM) "NISC" Architecture." White Paper, AMCC Switching & Network Processing, 2003
    [6] Intel Corp. Intel IXP2800 Network Processor,. http://developer.intel.com/design/network/products/npfamily/ixp2800.htm. 2002
    [7] IBM Corporation. IBM PowerNP NP4GS3 Network Processor Datasheet, http://www.ibm.com ,May 2001

    [8] Motorola Motorola Corporation, "Motorola C-5 DCP Architecture Guide", 2001.
    [9] http://www.idc.com
    [10] Agere Inc. PayloadPlus Routing Switch Processor, Preliminary Product Brief, Lucent Technologies, Microelectronics Group, April 2000.

    [11] EZchip Corporation. EZchip Technologies, "Network Processor Designs for Next-Generation Networking Equipment", White paper, December 1999.

    [12] Broadcom Corporation. Practical System Design and Debug considerations for Multiprocessing in the Embedded Environment. White paper, December 2002.
    [13] AMD Corporation. AMD Alchemy Solutions, AulOOO Processor Family. Product brief, AMD, Inc., 2003.
    [14] Jakob Carlstrom and Thomas Boden Synchronous Dataflow Architecture for Network Processors IEEE MICRO Vol.24, No.5, September/October 2004. pl0-18.
    [15] Agere, Inc. Building Next Generation Network Processors. White Paper, Agere, Inc., Sept. 1999
    [16] Douglas E.Comer, Network Systems Design Using Network Processor. (Edition 1st), Published by Prentice Hall, 2003, pp.337-341
    [17] Solidum system corp. Co-Processors and The Role of Specialized Hardware. Networld INTEROP2000, May 2000.
    [18] Gokhan Memik and William H. Mangione-Smith.A Flexible Accelerator for Layer 7 Networking Applications,2002.
    [19] IDT cop. Classification and Content Inspection Co-Processors. PAX.port Product Family.Technology Report.FLYR-IPC2-00112. 2002.
    [20] T. Henriksson, H. Eriksson, et al. VLSI implementation of CRC-32 for 10 Gigabit Ethernet. In 8th IEEE Int. Conf. on Electronics, Circuits and Systems (ICECS), Sept. 2001.
    [21] 李树国,周润德,冯建华,孙义和. RSA 密码协处理器的实现. 电子学报,Vol.29, No.1,1 Nov.2001
    [22] J. Allen, B. Bass, C. Basso,et al.IBM PowerNP network processor: Hardware, software, and applications. IBM Journal of Research and Development, 47(2/3): 177-194, 2003.
    [23] H. Xie and L. Zhao. Architectural Analysis and Instruction Set Optimization for Network Protocol Processors. IEEE ISSS+CODES, Newport Beach, CA, USA. October 2003. . 225-230
    [24] L. Kencl, JY Le Boudec, T. Wolf et al. Adaptive Load Sharing for Network Processors In IEEE INFOCOM 2002, New York
    [25] C. Sauer, M. Gries, J.I. Gomez, K. Keutzer. Towards a Flexible Network Processor Interface for RapidIO, Hypertransport, and PCI-Express. 3rd Workshop on Network Processors (NP3) at the 10th International Symposium on High Performance Computer Architecture (HPCA10), 26-39, February, 2004
    [26] Sundar Iyer and Nick McKeown. Analysis of the Parallel Packet Switch Architecture IEEE/ACM Transactions on Networking, pp. 314-324, April 2003
    [27] NPF site. www.npforum.com
    [28] Streaming Interface (NPSI) (September 2002) http://www.npforum.org/techinfo/HWStreamingIA.pdf

    [29] Look-Aside (LA-1B) Interface (August 2004) http://www.npforum.org/techinfo/LA-1B_Final_Published_IA.pdf
    [30] RapidIO Trade Association. RapidIO interconnect specification, rev. 1.2. www.rapidio.org, June 2002.
    [31] J. Trodden and D. Anderson. HyperTransport System Architecture. Addision-Wesley, 2003.
    [32] PCI Special Interest Group. PCI Express base specification, rev. 1.0a. www.pcisig.com, Apr. 2003.
    [33] Adrian Cosoroaba. Memory Options Explode for Network Processors COMMUNICATION SYSTEMS DESIGN, MAY2002. www.CommsDesign.com.
    [34] Faraydon Karim, Anh Nguyen, Sujit Dey, Ramesh Rao. On-Chip Communication Architecture for OC-768 Network processors. Annual ACM IEEE Design Automation Conference, In Proceedings of the 38th conference on Design automation, Las Vegas, Nevada, United States, 2001
    [35] K. Lahiri, A. Raghunathan, and S. Dey. System level performance analysis for designing on-chip communication architectures. IEEE Trans. on Computer Aided-Design of Integrated Circuits and Systems, 20(6):768-783,2001
    [36] L. Benini, G. de Micheli. Networks on chip: A New SOC Paradigm. IEEE Computer, Vol. 35, no. 1, Jan. 2002, pp.70-78.
    [37] Chidamber Kulkarni, Christian Sauer, Matthias Gries,Programming Challenges in Network Processor Deployment. In proceedings of the Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'03), 178-187, October, 2003.
    [38] Shah N. Understanding network processors [MS. Thesis]. Berkeley: Department of Electrical Engineering and Computer Sciences, University of California, 2001
    [39] Tsai M, Kulkarni C, Sauer C, et al. A benchmarking methodology for network processors. In proceedings of the Workshop on Network Processors(NP-1).Cambridge,MA:Morgan Kaufmann Publishers,Patrick C,et al.,eds.2002
    [40].Nemirovsky A.Towards characterizing network processors:needs and challenges.XStream Logic,Whitepaper,2000.
    [41]Tilman Wolf and Mark A.Franklin,"Design tradeoffs for embedded network processors," in Proc.of International Conference on Architecture of Computing Systems(ARCS),Karlsruhe,Germany,Apr.2002,vol.2299,pp.149-164,Springer Verlag
    [42]M.Gries.Algorithm-Architecture Trade-offs in Network Processor Design:[Ph D dissertation].Switzerland:Swiss Federal Institute of Technology(ETH)Zurich,2001
    [43]Ning Weng and Tilman Wolf.Pipelining vs.multiprocessors - choosing the right network processor system topology,in Proceedings of Advanced Networking and Communications Hardware Workshop(ANCHOR 2004),Munich,Germany,June 2004
    [44]Steve Melvin,Mario Nemirovsky et al.A Massively Multithreaded Packet Processor.the 2nd Workshop on Network Processors(NP-2)Anaheim,California,February 9,2003
    [45]Patrick Crowley,Marc E.Fiuczynski,Jean-Loup Baer.On the Performance of Multithreaded Architectures for Network Processors Technical Report 2000-10-0,University of Washington.
    [46]D.Tullsen,S.Eggers,and H.Levy.Simultaneous Multithreading:Maximizing On-Chip Parallelism.In proceedings of the 22nd Annual International Symposium on Computer Architecture,pp.392-403.Santa Margherita Ligure,Italy,June 1995.
    [47]E.Seamans and M.Rosenblum,"Parallel Decompositions of a Packet-Processing Workload," Proc.of Advanced Networking and Communications Hardware Workshop(ANCHOR)held in conjunction with the 31st Annual International Symposium on Computer Architecture(ISCA 2004),Munich,Germany,pp.40-48,2004.
    [48]Pradeep H.Rao and S.K.Nandy.Evaluating Compiler Support for Complexity Effective Network Processing.In Proceedings of Workshop on Complexity-Effective Design held in conjunction with the 30th International Symposium on Computer(wced03),2003.
    [49]D.Patterson,and J.Hennessy,Computer Architecture:A Quantitative Approach,3nd.ed.,San Francisco:Morgan Kaufmann Publishers,2002,chapter 5,P444.
    [50]K.G.Coffman and A.M.Odlyzko,"Is there a Moore's Law for data traffic?," Handbook of Massive Data Sets,eds.,Kluwer,2002,pp.47-93.
    [51]Pankaj Gupta and Nick McKeown.Algorithms for Packet Classification,IEEE Network,March 2001.
    [52]W.Feng,L.N.Bhuyan et al.Performance Characterization of a 10 Gigabit Ethernet TOE,13th International Symposium on High Performance Interconnects(Hot-I05),Stanford,CA,August 2005.
    [53]H.Shimonishi and M.Yoshida,An improvement of weighted round robin cell scheduling in ATM networks,in Proceedings of IEEE Globecom' 97,vol.2,1997,pp.1119-1123.
    [54]林闯,周文江等.基于Intel网络处理器的路由器队列管理:设计、实现和分析,计算机学报,第26卷,第9期,2003.9
    [55]徐恪,林闯,吴建平.可编程路由器中基于缓冲队列长度阀值的处理器调度,电子学报,第29卷,第11期,2001.11
    [56]A.K.Parekh and R.G.Gallager.A Generalized Processor Sharing Approach for Flow Control-the Single Node Case.In proceedings of INFOCOM'92,vol.2,May 1992,pp951-924
    [57] Sundar Iyer, Ramana Rao Kompella, and Nick McKeown Analysis of a Memory Architecture for Fast Packet Buffers. IEEE - High Performance Switching and Routing, Dallas, Texas, May 2001, pp. 368-373
    [58] A. Nikologiannis and M. Katevenis. Efficient per-flow queueing in DRAM at OC-192 line rate using out-of-order execution techniques. In Proceedings of the IEEE International Conference on Communications, pages 2048.2052, June 2001
    [59] NPC, http://www.networkprocessors.com
    [60] ACM SIGDA Conference, www.sigda.com
    [61] A Ferrari, A Sangiovanni-Vincentelli. System Design: Traditional Concepts and New Paradigms, Proceedings of the 1999 In: Proc of the IEEE International Conference on Computer Design. Austin Texas: IEEE Computer Society Press, 1999.
    [62] K Keutzer, et al. System-level design: orthogonalization of concerns and platform-based design. IEEE Transactions on CAD, 2000, 19 (12): 1523—1543
    [63] Madhu Sudanan Seshadri, John Bent, Tevfik Kosar. Network processors: Guiding design through analysis. http://www.cs.wisc.edu/~johnbent/Projects/net_proc.pdf, November 2001.
    [64] Matthias Gries. Algorithm-Architecture Trade-offs in Network Processor Design. Diss., Technische Wissenschaften ETH Z¨(?)rich, Nr. 14191, 2001.
    [65] L. Thiele, S. Chakraborty, M. Gries, S. Knzli. Design Space Exploration of Network Processor Architectures. First Workshop on Network Processors at the 8th International Symposium on High Performance Computer Architecture (HPCA8), pages 30-41, Cambridge, MA, February2002.
    [66] P. Crowley and J-L. Baer. A Modeling Framework for Network Processor Systems. in Proceedings of 1st Workshop on Network Processors, held in conjunction with the 8th International Symposium on High-Performance Computer Architecture, Cambridge, Massachusetts, 2002.
    [67] Patrick Crowley, Jean-Loup Baer. A Hybrid Framework for Network Processor System Analysis. in Proceedings 1st Workshop on Network Processors, held in conjunction with the 8th International Symposium on High-Performance Computer Architecture, Cambridge, Massachusetts, February 2002.
    [68] E Kohler, R Morris, et al. The click modular router. ACM Transactions on Computer Systems, 2000,18(3):263-297
    [69] Grunewald, Matthias; Niemann, Jorg-Christian; Porrmann, Mario; Ruckert, Ulrich: A framework for design space exploration of resource efficient network processing on multiprocessor SoCs. In: Crowely, Patrick; Franklin, Mark A.; Hadimioglu, Haldun; Onufryk, Peter Z. (editors): Network Processor Design: Issues and Practices volume 3. Morgan Kaufmann Publisher, 2005, Section 12, pages 245-277
    [70] Ramaswamy, R., Weng, N., and Wolf, T. Application analysis and resource mapping for heterogeneous network processor architectures. In Proc. of Third Workshop on NP-3, Feb, 2004
    [71] M. Gries: Methods for Evaluating and Covering the Design Space during Early Design Development, UCB/ERL Technical Memorandum M03/32, Electronics Research Laboratory, University of California at Berkeley, Aug. 2003
    [72] P Paulin, C Pilkington, E Bensoudane. StepNP: A System-Level Exploration Platform for Network Processors. IEEE Design & Test of Computers. 2002, Vol.19, No.6. pages17-26
    [73] Silicon and software systems www.s3group.com/network processing/, 2003.
    [74] Hao Yi, Zhangxi Tan, Chuang Lin, et al. A Context-Based Simulation Tool for Design and Evaluation of Network Processors. Simulation Modelling Practice and Theory, May 2005.

    [75] 阎守孟. 面向网络处理器的软件平台关键技术研究. 博士学位论文,西北工业大学 2005 年10月.
    [76] G. Kahn. The Semantics of a Simple Language for Parallel Programming. In: Proc of Information Processing. Amsterdam, The Netherlands, 1974
    [77] E A Lee. D G Messerschmitt. Synchronous Data Flow. IEEE Transactions on Computers, 1987, 75 (9):1235-1245
    [78] T Murata. Petri Nets: Properties, Analysis and Applications. In: Proc of the IEEE 1989, 77(4)
    [79] E. A. Lee and T. M. Parks, "Dataflow Process Networks," In: Proc of the IEEE, 1995, 83( 5): 773-801. (http://ptolemy.eecs.berkeley.edu/papers/processNets)
    [80] E.A. de Kock et al. YAPI: Application Modeling for Signal Processing Systems. In: Procof 37th Design Automation Conference. Los Angeles, CA, June 2000. 402-405
    [81] Bart Kienhuis, Ed F. Deprettere. Modeling Stream-Based Applications Using the SBF Model of Computation. IEEE Workshop on Signal Processing Systems (SIPS 2001), Antwerp, Belgium, September pages 26-28, 2001
    [82] E. Lee. Overview of the Ptolemy Project. University of California, EECS Dept., Berkeley, CA. Tech Rep: UCB/ERL M01/11, March 2001
    [83] A. Girault, B. Lee, and E. Lee. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 1999, 18(6): 742-760
    [84] K. Strehl, et al. FunState - an Internal Design Representation for Codesign. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2001, 9(4):524-544
    [85] F Balarin, M Chiodo, P Giusto, et al. Hardware-Software Co-Design of Embedded Systems: The Polis Approach. In: Proc of International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1997.
    [86] S. Neuendorffer, E. A. Lee. Hierarchical Reconfiguration of Dataflow Models. In: Proc of Second ACM-IEEE International Conference on Formal Methods and Models for Codesign, 2004
    [87] M.C.W. Geilen and T. Basten. Reactive Process Networks. In: Proc of Fourth ACM International Conference on Embedded Software, Pisa, Italy, 2004. 137-146
    [88] Patrick Crowley, M.E. Fiuczynski, J.-L. Baer, & B. N. Bershad, Characterizing processor architectures for programmable network interfaces, in Proceedings of the 2000 International Conference on Supercomputing, May 2000.
    [89] Muthu Venkatachalam, Prashant Chandra, RajYavatkar. A highly flexible, distributed multiprocessor architecture for network processing. IEEE Computer Networks Vol.41 2003, pp563-586
    [90] Nie XN, Gazsi L, Engel F, Fettweis G. A New Network Processor Architecture for high-speed communications. in Proceedings of the IEEE workshop on signal processing systems. IEEE Computer Society Press, 1999. 548-557.
    [91] Niti Madan and Erik Brunvand. A Case for Asynchronous Microengines for Network Processing. In Advanced Networking. and Communications Hardware Workshop (ANCHOR 2004),. June 2004.
    [92] Madhusudanan Seshadri, Mikko Lipasti. A Case for Vector Network Processors. NPC-West2002,2002.
    [93] LI Xudong XU Yang LIU Bin et al.Hardwired Logic and Multithread Design in Network Processors. TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 14/20 pp207-212 Volume 9, Number 2, April 2004
    [94] Yu-Kwong Kwok, I.Ahmad. Static Scheduling algorithms for Allocating Directed Task Groups to Multiprocessors. ACM Computer Surveys, Vol.31, No.4, December 1999. pp406-471.
    [95] A. Srinivasan, et al. Multiprocessor Scheduling in Processor-based Router Plantforms:Issues and Ideas, Network Processor Design: Issues and Practices Volumn 2 November 2003.
    [96] Mark A. Franklin and Seema Datar. Pipeline Task Scheduling on Network Processors. In Workshop on Processor & Applications (NP-3), Madrid Spain, February 2004.
    [97] William Plishker and Kaushik Ravindran et al. Automated Task Allocation on Single Chip, Hardware Multithreaded, Multiprocessor Systems. In Proceedings of Workshop on Embadded Parallel Architectures (WEPA-1). February 2004.
    [98] Yan Shoumeng, Zhou Xingshe, Wang LINGMIN, et al. GA-based Automated Task Assignment on Network Processors. ICPADS2005, July 2005.
    [99] Cao Z, Wang Z, Zegura E. Performance of hashing-based schemes for Internet load balancing. In: Nokia FB, ed. Proc. of the IEEE INFOCOM 2000. Piscataway: IEEE Computer and Communications Societies, 2000. 332-341
    [100] J. Wang and Klara Nahrstedt. Parallel ip packet forwarding for tomorrow's ip routers. In IEEE Workshop on High Performance Switching and Routing, pages 353-357, Dallas, TX, May 2001
    [101]G. Dittmann, A. Herkersdorf, Network processor load balancing for high-speed links. in 2002 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2002), San Diego, CA, USA, July 2002, pp. 727-735
    [102]L. Kencl, J. Le Boudec. Adaptive load sharing for network processors. In IEEE INFOCOM 2002, New York, NY, USA, June 2002, pp. 545-554
    [103] Jiani Guo, Jingnan Yao and Laxmi Bhuyan. An Efficient Packet Scheduling Algorithm in network processors. In Proceedings of IEEE INFOCOM 2005, July 2005
    [104]W. Shi, M. H. MacGregor et al. An Adaptive Load Balancer for Multiprocessor Routers. University of Alberta, Edmonton, AB, T6G 2E8, Canada http://www.cs.ualberta.ca/~pawel/PAPERS/. 2004
    [105] P. Pappu, T. Wolf. Scheduling Processing Resources in Programmable Routers. In Proceedings of IEEE INFOCOM 2002, pp. 104 -112, July 2002
    [106] S. A. Moyer. Access Ordering and Effective Memory Bandwidth. PhD thesis, University of Virginia, De-partment of Computer Science, Apr. 1993. Also as TR CS-93-18.
    [107]S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 128-138, June 2000
    [108] S.A. McKee et al., "Design and Evaluation of Dynamic Access Ordering Hardware", Proc. International Conference on Supercomputing, June 1996, pp. 125-132
    [109] Wei-fen Lin and Steven K. Reinhardt Doug Burger Reducing DRAM Latencies with an Integrated Memory Hierarchy Design in the 7th International Symposium on High-Performance Computer Architecture, January 2001.
    [110]B. K. Mathew, S. A. McKee, J. B. Carter, and A. Davis. Design of a parallel vector access unit for SDRAM memory systems. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, pages 39-48, Jan. 2000. pages 39-48
    [111]BK Mathew, SA McKee, JB Carter, A. Davis, Algorithmic Foundations for a Parallel Vector Access Memory System , Proc. 12th ACM Symposium on Parallel Algorithms and Architectures, July 2000.
    [112] S. A. McKee and W. A. Wulf. Access ordering and memory-conscious cache utilization. In Proceedings of the First International Symposium on High Performance Computer Architecture, pages 253-262, Jan. 1995. pages 253-262
    [113] J. H. Aylor, and W. A. Wulf. Access order and effective bandwidth for streams on a Direct Rambus memory. In Proceedings of the Fifth International Sym-posium on High-Performance Computer Architecture,pages 80-89, Jan. 1999.
    [114] J Hasan, S Chandra, T N Vijaykumar. Enhancing Row Locality to Improve Network Processor Throughput. Lucent Technologies, Tech Rep: 10009638- 020318-11TM, Bell Labs, Mar 2002
    [115]S Diego. Efficient Use of Memory Bandwidth to Improve Network Processor Throughput. In: Proc of the IEEE ISCA '03, 2003
    [116]Devavrat Shah, Sundar Iyer, Balaji Prabhakar, and Nick McKeown. Maintaining Statistics Counters in Router Line Cards. IEEE Micro, Jan-Feb 2002, pages. 76-81.
    [117] S Kumar, R Venkatesh, J Philip, S Shukla. Implementing Parallel Packet Buffering. Communication Systems Design Magazine Summary. April 29, 2002. http://www.commsdesign.com/design_corner/OEG20020422S0006.
    [118]M Katevenis, P Vatsolaki, A Efthimiou. Pipelined Memory Shared Buffer for VLSI Switches. In: Proc of the ACM SIGCOMM '95, August 1995. pages 39-48
    [119]Dharmapurikar, S. Kumar, S. Lockwood, J. Crowley, P. Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer. Global Telecommunications Conference, 2005. GLOBECOM '05. IEEE Publication Date: 28 Nov.-2 Dec.
    [120] Tan Zhangxi, Lin Chuang, Hao Yin. Optimization and Benchmark of Cryptographic Algorithms on Network Processors. IEEE Micro, Vol.24 No.5, September/October 2004. pp55-69.
    [121]Haiyong Xie, Li Zhou, and Laxmi Bhuyan. Architectural Analysis of Cryptogrphic Applications for Network Processors , IEEE First Workshop on Network Processors, with HPCA-8, Boston, February 2002 (with H. Xie and L. Zhou).
    [122] IDT Inc. Optimum Search Methods for Switch-Router Databases in Access and Metro Edge Networks. IDT White Paper, 2003.
    [123] Cavium Inc. NITROX Soho Secure Comm Processors Cavium. http://cavium.com/processor_security_NitroxSoho.htm

    [124] 王圣,苏金树. TCP加速技术研究综述,软件学报,Vol.15, No.11, 2004.10. pp 1689-1699.

    [125] Mark A. Franklin and Tilman Wolf. Power Considerations in Network Processor Design. In Proceeding of Second Workshop on Network Processor & Application (NP-2) in Conjunction with HPCA-9, Anaheim, CA, USA. February 2003. pp 10-22

    [126] Yan Luo, Jia Yu, et al. Low Power Network Processor Design using Clock Gating. In Proceeding of the 42nd annual conference on Design Automation (DAC) , June 2005. pp712-715
    [127] Francis Chang, Wu-chang Feng, Kang Li, "Approximate Caches for Packet Classification", in Proc. IEEE INFOCOM 2004, March 2004.
    [128] Francis Chang, Wu-chang Feng, Wu-chi Feng, Kang Li, "Efficient Packet Classification with Digest Caches", in Proc. of the Third Workshop on Network Processors & Applications (NP3), Feburary 2004.
    [129] Chun-Liang Lee, Shuo-Cheng Hu, Pi-Chung Wang Efficient Packet Classification Using Spatial Cutting. IEEE Workshop on High performance routing and switch(HPSR). Hong Kong, 2005.
    [130]Haoyu Song, Sarang Dharmapurikar,Jonathan Turner et al. Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. In ACM SIGCOMM'05, 2005
    [131]Karthik Lakshminarayanan, Anand Rangarajan, Srinivasan. Algorithms for Advanced Packet Classification with Ternary CAMs. In ACM SIGCOMM'05,2005
    [132] Tan Mingfeng, Gong Zhenghu, "High Speed IP Lookup Algorithm with Scalability and Parallelism Based on CAM Array and TCAM," in Proceeding of 2004 IEEE International Conference on Communications, pp. 1085-1089, June 2004
    [133]Karlin S. and Peterson L. VERA: An Extensible Router Architecture.In 4th International Conference on Open Architectures and NetworkProgramming (OPENARCH), April 2001.
    [134]Tammo Spalink, Scott Karlin, Larry Peterson, and Yitzchak Gottlieb.Building a robust software-based router using network processors, in Proceedings of Symposium on Operating Systems Principles (SOSP), 2001.
    [135]G. Memik and W. H. Mangione-Smith. NEPAL: A framework for efficiently structuring applications for network processors. In Proc. of Network Processor Workshop in conjunction with Ninth International Symposium on High Performance Computer Architecture (HPCA-9), Feb. 2003.
    [136] Jens Wagner, Rainer Leupers. C Compiler Design for an Industrial Network Processor. In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems, 2001.

    [137]Network Processors and their Impact on Real-time Operating Systems2001q2_p044.pdf
    [138] Yan Shoumeng, Zhou Xingshe. Apacket property-based task scheduling policy for control plan OS in NP-based application. ICESS2005, Springer Verlag, December 2005.
    [139] Mark A. Franklin and Tilman Wolf, A network processor performance and design model with benchmark parameterization, in Proc. of First Network Processor Workshop (NP-1) in conjunction with Eighth International Symposium on High Performance Computer Architecture (HPCA-8), Cambridge, MA, Feb. 2002, pp. 63-74
    [140] Wolf T, Franklin MA. CommBench: a telecommunica-tions benchmark for network processors. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. Austin, TX, 2000. 154-162.
    [141] Memik G, Mangione-Smith B, Hu W. NetBench: a benchmarking suite for network processors. In:Proceedings of the International Conference on Computer-Aided Design (ICCAD). San Jose: IEEE Computer Society Press, 2001. 39-43.
    [142] B.K. Lee and L.K. John, NpBench: A Benchmark Suite for Control Plane and Data Plane Applications for Network Processors.In Proceeding of the IEEE Int'l Conf. Computer Design (ICCD 03), 2003, pp. 226-233.
    [143]Audenaert S,Chandra P.(NPF Benchmarking Working Group co-chairs),Network processors benchmark framework.NPF Benchmarking Workgroup,http://www.npforum.org/.
    [144]Embedded Microprocessor Benchmark Consortium(EEMBC).http://www.eembc.org/.
    [145]B Kienhuis,et al,An Approach for Quantitative Analysis of Application-specific Dataflow Architectures.In:Proc of International Conference of Application-specific Systems,Architectures and Processors,Zurich Switzerland 1997.338-349
    [146]M.A.Rosien,G.J.Smit and T.Krol.Generating a CDFG from C/C++ Code,Department of Computer Science,University of Twente,Enschede,Netherlands.Tech Rep:Document 38152,2002.URL:http://doc.utwente.nl/fid/1179
    [147]Todd Austin et.al.SimpleScalar Tutorial.http://www.simplescalar.com/docs/simple_tutorial_v4.pdf
    [148]A.Agarwal.Performance tradeoffs in multithreaded processors.IEEE Transactions on Parallel and Distributed Systems,3(5):525-539,Sept.1992
    [149]林闯,计算机网络和计算机系统的性能评价,2001,北京,清华大学出版社
    [150]H.Fatemi,H.Corporaal,T.Basten,R.Kleihorst,and R Jonker.Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures.In Proceedings of 7th International Conference Advanced Concepts for Intelligent Vision Systems(ACIVS 2005).Antwerp,Belgium,September,2005.pp689-696.
    [151]Goldberg D E.Genetic algorithms in Search,Optimiazation and Machine Learning.MA:Addison-Wesley Press,1989.
    [152]康立山,谢云等.非数值并行算法(第1册)-模拟退火算法.北京:科学出版社,1997
    [153]张讲社,梁怡.整体退火遗传算法及其收敛充要条件[J].中国科学:E辑,1997,27(2):154-164.
    [154]Hou,E.S.,H.Ren and N.Ansari,Dynamic,Genetic,and Chaotic Programming;Efficient Multiprocessor Scheduling Based on Genetic Algorithms,1992
    [155]Brindle A.Genetic Algorithms for Function Optimization.Ph.D Dissertation,University of Alberta,1981
    [156]W.Stevens TCP Slow Start,Congestion Avoidance,Fast Retransmit,and Fast Recovery Algorithms,IETF RFC 2001.January 1997.
    [157]J.Bennett,C.Partridge,and N.Shectman.Packet reordering is not pathological network behavior.IEEE/ACM Transactions on Networking,1999,7(6):789-798
    [158]K.Thompson,G.Miller,R.Wilder,"Wide area Internet traffic patterns and characteristics," IEEE Network,vol.11,1997
    [159]J.S.Marron,Felix Hernandez-Campos,F.D.Smith.Mice and Elephants Visualization of Internet.In Proceedings of 15th Conference on Computational Statistics,Berlin,Germany,August 24-28,2002.
    [160]Yin Zhang,Lee Breslau,Vern Paxson and Scott Shenker,On the characteristics and origins of internet flow rates.In Proc.of.SIGCOMM,Pittsburgh,PA,USA,August 2002.
    [161]Tatsuya Mori,Ryoichi Kawahara,.Shozo Naito,Shigeki Goto.On the characteristics of Internet traffic variability- Spikes and Elephants.In Proceedings of IEEE/IPSJ SAINT,pp.99-106,Tokyo,Japan,Jan 2004.
    [162]T.Kunz.The influence of different workload descriptions on a heuristic load balancing scheme.IEEE Transactions on Software Engineering,Vol.17,No.7
    [163]Passive Measurement and Analysis(PMA).http://pma.nlanr.net
    [164] NLANR PMA: Special Traces : Abilene-V http://pma.nlanr.net/Special/ipls5.html
    [165] NLANR PMA PSC trace Site. http://pma.nlanr.net/PMA/Sites/ PSC.html
    [166] NLANR PMA FRG trace Site. http://pma.nlanr.net/PMA/Sites/FRG.html
    [167] Maria Gabrani, Gero Dittmann, Andreas Doering, et. al. Design Methodology for a Modular Service-Driven Network Processor Architecture. In Computer Networks - Special Issue on Network Processors, Elsevier Science, Vol. 41, No. 5, pp. 623--640, April 2003.
    [168]T. Karagiannis, M. Molle, and M. Faloutsos., A Nonstationary Poisson View of Internet Traffic, in Proceeding of the IEEE Infocom 2004, IEEE CS Press, Mar. 2004
    [169]M. E. Crovella and A. Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, ACM SIGMETRICS, pp. 160-169,1996.
    [170]J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward Poisson and independent as the load increases. In D. D. Denison, M. H. Hansen, C. C. Holmes, et. al. Nonlinear Estimation and Classification. Springer, 2003.
    [171]W. Stevens TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms, IETF RFC 2001. January 1997
    [172] Packet, bandwidth, application and connection statistics for Abilene-V. http://pma.nlanr.net/Special/ipls5/
    [173]R. Jain. The Art of Computer Systems Performance Analysis. JohnWiley & Sons, New York, 1991.
    [174]Rishi Sinha, Christos Papadopoulos, John Heidemann Internet Packet Size Distributions:Some Observations University of Southern California October 5, 2005 http://netweb.usc.edu/~rsinha/pkt-sizes/
    [175]Cost-Effective Flow Table Designs for High-Speed Routers: Architecture and Performance Evaluation. IEEE Transactions on Computers. Vol.51, No9, September 2002. ppl089-1099.
    [176]J. Carter and M. Wegman. Universal Classes of Hash Functions. Journal of Computer and System Science.Vol.18, No.2, 1979 ppl43-154.
    [177]K. Papagiannaki, N. Taft, S. Bhattacharyya, et. al. On the Feasibility of Identifying Elephants in Internet Backbone Traffic. Sprint ATL Research Report Nr. RR01-ATL-110918. Sprint ATL. Nov 2001.
    [178]C Villiamizar, C Song. High performance tcp in ansnet. ACM Computer Communications Review, vol.24 , No.5, October 1995
    [179] Cisco 12000 Series Gigabit Switch Router (GSR) Gigabit Ethernet LineCard. http://www.cisco.com/warp/public/cc/pd/rt/12000/ prodlit/gspel_ov.htm
    [180]M-series Routers. http://www.juniper.net/products/dsheet/100042.html
    [181]R Morris. TCP Behavior with Many Flows. In: Proc of the 1997 IEEE International Conference on Network Protocols, 1997
    [182]G Appenzeller, I Keslassy, N McKeown. Sizing Router Buffers. In: Proc of the ACM SIGCOMM'04. Portland: ACM press, August 2004. 277-291
    [183] Samsung Corporation. K7N323645M NtSRAM. Available at http://www.samsung.com/Products/Semiconductor/SRAM/index.htm
    [184]ELPIDA 1G bits DDR2 EDE1108 AASE Datasheet. Japan: Elpida Memory, Inc. Oct. 2003. http://www.elpida.com

    [185] RDRAM Advance Information: 1066 MHz RDRAM http://www.rambus.com/downloads/RDRAM1066.512i.0117-030.pdf
    [186]Zhichun Zhu Zhao Zhang Xiaodong Zhang.Fine-grain Priority Scheduling on Multi-channel Memory Systems.In Proceedings of the 8th International Symposium on High Performance Computer Architecture(HPCA-8),2002.
    [187]J Hennessy,D Patterson.Computer Architecture:A Quantitative Approach.Second Edition.Morgan Kaufmann Publishers,ISBN 1-55860-329-8,1996.
    [188]Altera Corporation,Nios Ⅱ Processor Reference handbook,http://www.altera.com.cn/literature/hb/nios2/n2cpu_nii5vl.pdf2004.
    [189]刘达,胡敏,可编程系统芯片(SoPC)发展策略,集成电路应用.2003(1).
    [190]Altera Corporation,Stratix Device Handbook.http://www.altera.com.cn/literature/hb/stx/stratix_handbook.pdf,2004
    [191]T.Chiueh and P.Pradhan,"Cache memory design for network processors," in Proc.of 6th Intl.Symp.on High-Performance Computer Architecture,Toulouse,France,Jan.2000.
    [192]Requirements for IP Version 4 Routers.RFC1812,June 1995.http://www.faqs.org/rfcs/rfc1812.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700