用户名: 密码: 验证码:
片上高性能嵌入式计算—面向软基带的应用并行处理模型及体系结构
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
以无线通信为代表的高性能嵌入式计算已深入到国民生活、工业生产和军事科技等各个领域,由于各种高性能嵌入式计算目标应用的计算复杂度、功耗预算、实时性约束变化多样且不断加强,当前高性能嵌入式计算正在经历由以通用数字信号处理器和通用高性能嵌入式处理器为中心的传统高性能嵌入式计算向领域优化高性能嵌入式计算转变。无线通信协议不断演进,人们对多模通信的需求不断增加,采用可编程处理技术完成基带处理的软基带应用逐渐兴起。由于具备高计算复杂度、高性能功耗比约束、强实时性的特点,软基带应用成为了推动高性能嵌入式计算模式转变的主要因素之一,使得高性能嵌入式计算的体系结构和实现技术面临全新的挑战,特别是以MIMO-OFDM(Multi-Input Multi-Output Orthogonal Frequency Division Multiplexing)系统为代表的软基带应用的计算需求随着无线通信系统的演进不断强化,给面向软基带的片上系统体系结构设计带来了前所未有的挑战。研究面向以MIMO-OFDM系统为代表的软基带应用的片上高性能嵌入式计算体系结构具有明显意义。
     本文在总结分析MIMO-OFDM系统模型的基础上,对以MIMO-OFDM基带系统为代表的软基带应用并行处理模型、面向软基带的可预测异构多核体系结构、面向软基带的多模多域无冲突并行存储结构、LDPC(Low Desity Parity Check)码译码算法和加速处理单元体系结构以及Reed Solomon (RS)码译码算法和加速处理单元体系等几个方面的重点内容展开了深入研究,概括本文所取得的创新性工作主要有以下几点:
     A.给出了面向基于MIMO-OFDM软基带的应用并行处理模型,包括数据流模型、空时二维数据并行模型以及“本原”操作模型。数据流模型描述了基带核心任务之间的数据流关系以及流量量化模型;空时二维数据并行模型描述了核心任务内部存在的特有数据级并行层次和并行度;“本原”操作模型则描述了各种基带任务的任务本身所特有的操作序列模型。通过并行处理模型实例分析了典型MIMO-OFDM基带系统计算模式的演化过程,包括核心操作的复杂度演化、“本原”操作并行度演化、以及任务间通信流量演化。
     B.提出并研究了领域优化可编程处理单元。领域优化可编程处理单元采用标量/向量混合处理技术,并借助VLIW(Very Large Instruction Word)和SIMD(Single Instruction,Multiple Data-stream)技术支持目标应用的各种计算模式;当采用SIMD技术加速核心算法的“本原”操作时考虑到存在大量簇间数据交换操作,本文所提的领域优化可编程向量处理单元内建支持了通用以及若干种专用的簇间数据交换网络。本文采用电子系统级设计方法完成了领域优化可编程向量处理单元的建模和实现,并通过典型算法分析了领域优化可编程向量处理单元相对于传统信号处理器的加速比,结果表明所研究的可编程向量处理单元在处理和目标应用密切相关的核心算法时具有比较明显的加速比。
     C.在总结软基带应用的主要访存模式的基础上提出了专用存储组织模型,并基于该存储组织模型,提出了支持主要访存模式无冲突访问的可扩展多模多域无冲突并行存储体系结构模型和实现技术,结合所提领域专用可编程处理单元进行访存通路流水线实现,并针对无线通信目标应用的存储特点进行了仿真。实现结果表明本文所提出的多模多域无冲突并行存储体系结构的扩展性良好,且实现开销和相关存储结构基本相同;仿真结果表明:针对无线通信目标应用,可扩展多模多域无冲突并行存储体系结构相比于传统并行存储体系结构有明显的加速比。
     D.针对前向纠错系统中新近应用较广的具有很高纠错性能的LDPC码的译码复杂度较高的实际,提出了RMP调度最小和算法(RMP-Min-Sum算法)。RMP-Min-Sum算法采用了行消息传递方式降低算法的迭代次数,并采用最小和译码算法消除译码算法中的非线性操作,降低硬件实现代价,仿真表明RMP-Min-Sum的译码性能和传统和积算法相比具有相当性能,而复杂度则明显降低。本文研究了RMP-Min-Sum软译码的可行性,研究表明当前可编程处理器无法高效能实现长帧LDPC码的译码。最后针对扩展非规则累积LDPC码给出了基于RMP-Min-Sum算法的并行加速单元体系结构,结合DVB-S2 LDPC码进行了加速单元实现,并对加速单元的并行度、复杂度以及吞吐率进行了演化,实现结果表明:采用RMP-Min-Sum算法的LDPC码译码加速单元在保证提供同等量级的吞吐率的基础上,可大幅降低硬件开销。
     E.提出了支持RS码译码结构的宏流水负载均衡的关键多项式求解算法:TD-iBM算法。TD-iBM算法采用了分时调度各个伽罗华域乘法器技术,通过时间换取面积的方式,在保证译码吞吐率的同时,提高了译码加速单元宏流水的均衡度,并降低了译码加速单元的面积开销,提高了译码效率。本文基于TD-iBM算法实现了RS(255,223)码型以及相关截断码的译码加速单元,实验表明:与现有的主要RS译码器相比,本文所实现的RS译码加速单元具有一定译码效率优势。
     F.基于可编程向量处理单元以及若干前向纠错加速单元提出并研究了可预测多核片上系统体系结构以及原型系统。本文采用片内总线作为多核互连的基础,并针对强实时性需求引入了软件可控的时分复用总线,该总线以很小硬件代价提供了设计时可预测、可控的总线分配模式,且设计人员可通过编制不同总线调度程序支持各种总线分配策略;考虑到软件可控的时分复用总线提供了设计时可预测、可控的总线分配策略,其为日益增加的总线功耗的降低提供了可能。基于可编程向量处理单元、前向纠错加速单元以及软件可控的时分复用总线,给出了可预测多核片上系统体系结构原型,并给出了面向简化的MIMO-OFDM基带系统流水映射方式的优化原型实现。
     综上所述,本文面向以MIMO-OFDM基带系统为代表的软基带应用,研究了应用并行处理模型、领域优化多核体系结构和实现技术、高效能并行存储体系结构以及前向纠错系统核心算法改进和加速实现技术,对于推动面向软基带的片上高性能嵌入式计算的研究和实用化具有一定的意义和价值。
High Performance Embedded Computing (HPEC) including wireless computing is ubiquitous in society life, industry application and military technology. The computing complexity, power budget and real-time constraint of different HPEC applications are varied and strengthening, the morden HPEC is undergoing the transformation from the traditional HPEC based on general Digital Signal Processor and general high performance embedded processor to domain specific HPEC. The soft baseband, which is resulting from the evolution of wireless protocols and the requirements of muti-mode communication, is one of the main driving applications of domain specific HPEC. The architecture design of On-Chip High Performance Embedded Computing (OCHPEC) for software baseband is being challenged for its high computing complexity, high performance power ratio constraint and high real-time constraint, especially when MIMO-OFDM baseband system is involved. Besides, to meeting the computing requirements of the evoluting wireless protocols is also an issue of architecture design of OCHPEC for soft baseband.
     Based on the analysis of the MIMO-OFDM system model, this dissertation focuses on the following research points:soft baseband application parallel processing model for MIMO-OFDM, Predictable Heterogeneous Multi-Processor System-on-Chip (PH-MPSoC) architecture, multi-pattern multi-domain conflict-free parallel memory architecture, Low Desity Parity Check (LDPC) code decoding algorithm/accelerator and Reed Solomon code decoding algorithm/accelerator. The key contributions are summarized as follows.
     A. The application parallel processing model for MIMO-OFDM based soft baseband is introduced, including data stream model, space-time 2-dimension data level parallel model and atom operaion model. The data stream model describes the data stream dependency relationship between the baseband tasks, and the stream quantifing model; space-time 2-dimension data level parallel model describes the data level parallelism of the baseband tasks; atom operaion model describes the specific operaion sequences of MIMO-OFDM baseband tasks. A typical MIMO-OFDM baseband system is analyzed based on this application parallel processing model, the complexity evolution of the kernel operaion, the parallelism evolution of the atom operaions and the stream flux evolution of the tasks are showed.
     B. Domain specific programmable process unit is proposed and researched. The proposed programmable process unit adopts scalar and vector hybrid processing technique. It introduces VLIW and SIMD to support the varied computing patterns in target applications. It introduces general and application specific inter-cluster data exchange network to accerating the atom operaion when SIMD technique is used. ESL methodology and tools are used to model and implement the domain specific programmable process unit. Some typical algorithms are used for evaluating the performance of the domain specific programmable process unit. The results shows the proposed domain specific programmable process unit is more efficient than others in the kernel algorithms of wireless communication
     C. A domain specific memory organization model is proposed after analyzing of main memory access patterns. Based on the memory organization model, a Multi-Pattern and Multi-Domain conflict free Parallel Memory Architecture (MPMD-PMA) model for the main memory access patterns is proposed. The implementation detail and the pipelined case study of MPMD-PMA is done based on the proposed domain specific programmable PE, and the memory access simulation experiments have been done in the behavior simulator. The implementation results show the expansibility of MPMD-PMA is equivalent with general PMA; and the simulation results show the performance speedup of MPMD-PMA over general PMA in soft baseband memory access patterns is significant.
     D. RMP-Min-Sum decoding algorithm for LDPC codes, which is the newest FEC code, is proposed to reduce the LDPC decoding complexity. RMP-Min-Sum introduces the Row Message Passage pattern to reduce the decoding iterations, and Min-Sum algorithm to reduce the complexity of the kernel operations. The simulation results show the application value of RMP-Min-Sum decoding algorithm is significant. The hardware cost of the RMP-Min-Sum decoding implementation in software has evaluated, and the results show it is not a high efficient solution for long frame structure LDPC code. Then, the parallel accelerator architecture based on RMP-Min-Sum decoding algorithm for eIRA-LDPC code is proposed, and the case study of DVB-S2 LDPC decoder is showed. The results show hardware cost of the parallel accelerator based on RMP-Min-Sum decoding is less than other LDPC decoder.
     E. TD-iBM:a Key Equation Solver (KES) algorithm for supporting RS decoding balanced macro-pipeline architecture is proposed. It adopts time division scheduling of the Galois Field multipliers in iBM based KES algorithm to reduce the complexity of KES and balance the macro-pipeline. Decoding accelerator architecture for RS (255,223) and its truncation codes is proposed based on TD-iBM algorithm. The results show the decoding accelerator based on TD-iBM algorithm is an area-efficient decoding accelerator.
     F. Predictable Heterogeneous Multi-Processor System-on-Chip (PH-MPSoC) architecture and prototype are proposed based on the programmable process unit and some FEC accelerators. In PH-MPSoC architecture, Software Controlling -Time Division Multiplexing bus (SC-TDM bus) is introduced to offer the predictable interconnection timing. SC-TDM bus offer the design time bus scheduling scheme with little hardware cost, and different bus scheduling patterns can be introduced by programmer. Besides, the design time predictable bus scheduling scheme can be used to reduce the power cost of the interconnection. The PH-MPSoC prototype is proposed, and optimized prototype implementation is designed for the simplified MIMO-OFDM baseband system pipeline mapping scheme.
     In summary, this dissertation investigates soft baseband application parallel processing model, domain specific MPSoC architecture and implementation technology, high efficient parallel memory architecture and FEC decoding algorithm and accelerator architecture. The contributions of this dissertation are useful in driving the development of the OCHEPC for soft baseband processing.
引文
[1]Andreas F. Molisch. Wireless Communications [M]. John Wiley & Sons Ltd., 2011.
    [2]icmade展讯推出全球首款SC8800G多模通信芯片[Z].//www.icmade.com. 2011.
    [3]Mitola J. The Software Radio Architecture [J]. IEEE Communications Magazine, 1995,33(5):26-38.
    [4]Grigorios Kalivas. Digital Radio System Design [M]. John Wiley & Sons Ltd., 2009.
    [5]Wang Jiangzhou, High Speed Wireless Communications, UWB,3G LTE, and 4G Mobile Systems [M]. Cambridge:Cambridge University Press,2008.
    [6]Wolf W. Guest Editor's Introduction:The Embedded Systems Landscape [J]. Computer,2007,40 (10):29-31.
    [7]Wayne Wolf, High Performance Embedded Computing:Architectures, Applications, and Methodologies [M]. Beijing:China Machine Press,2007.
    [8]Group T. F. High Performance Embedded Computing Handbook:A Systems Perspective [M]. Lexington:CRC Press,2008.
    [9]Fisher J. A., Faraboschi P., Young C.. Embedded Computing:A VLIW Approach to Architecture, Compilers and Tools [M]. San Francisco:Morgan Kaufmann,2005.
    [10]M. S. Sharawi, O. V. Korniyenko. Software Defined Radios:A Software GPS Receiver Example [C]//Proc. of IEEE/ACS International Conference on Computer Systems and Applications (AICCSA2007),2007:562-565.
    [11]W. J. Dally, U. J. Kapasi, B. Khailany. Stream Processors:Programmability with Efficiency [J]. ACM Queue,2004,2(1):52-62.
    [12]U. Ramacher. Software Defined Radio Prospects for Multistandard Mobile Phones [J]. IEEE Computer,2007,40(10):62-69.
    [13]Hyunseok Lee, Yuan Lin, Yoav Harel. Software Defined Radio -A High Performance Embedded Challenge [C]//Proc. of International Conference on High Performance Embedded Architectures and Compilers,2005:6-26.
    [14]ITU-T Recommendation H.263. Video Coding for low bit rate communication [S].
    [15]J. Andrews, A. Ghosh, R. Muhamed. Fundamentals of WiMAX:Understanding Broadband Wireless Networking [M]. Prentice Hall,2007.
    [16]Xing Fang, Shuming Chen. The Design and Evaluation of a MIMO-OFDM Benchmark [C]//Proc. of the 4th International Conference on Wireless Communications, Networking and Mobile Computing,2008.
    [17]陈胜刚.片上大规模并行嵌入式计算:层次结构性能模型及H.264并行加速[D].长沙:国防科学技术大学,2006.
    [18]沈嘉,索士强,全海洋等3GPP LTE技术原理与系统设计[M].北京:人民邮电出版社,2008.
    [19]Bernard Sklar,徐平平等译.数字通信-基础与应用(第二版)[M].北京:电子工业出版社,2008.
    [20]ITRS Roadmap. The International Technology Roadmap for semiconductors: Overall Roadmap Technology Characteristics[Z], www.semichips.org.
    [21]D. Garrett, L. Davis, S. ten Brink, B. Hochwald, and G. Knagge. Silicon complexity for maximum likelihood MIMO detection using spherical decoding [J]. IEEE Journal of Solid-State Circuits,2004,39:1544-1552.
    [22]A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and H. Bolcskei. VLSI implementation of MIMO detection using the sphere decoding algorithm [J]. IEEE Journal of Solid-State Circuits,2005,40(7):1566-1577.
    [23]Lin Yuan. Realizing Software Defined Radio- A Study in Designing Mobile Supercomputers. Michigan US:University of Michigan [D].2008.
    [24]Michael Gschwind. The Cell Broadband Engine:Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor [J]. International Journal of Parallel Programming,2007,35(3):233-262.
    [25]文梅.流处理器关键技术研究.长沙:国防科学技术大学[D].2006.
    [26]TI Corporation. TMS320C6414, TMS320C6415, TMS320C6416 Fixed-point digital signal processors[Z]. sprs146e, February 2001, revised October 2002.
    [27]Lin Y, Lee H, Woh M, et al. SODA:A high-performance DSP architecture for software-defined radio [J]. IEEE Micro,2007,27(1):114-23.
    [28]Harri Holma, Antti Tosala. LTE for UMTS OFDMA and SC-FDMA based Radio Access [M]. John Wiley & Sons Ltd.,2009.
    [29]3GPP TS 36.201 V8.2.0 Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Layer-General Description. (Release 8) [S].
    [30]3GPP TS 36.201 V8.2.0 Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Channels and Modulation. (Release 8) [S].
    [31]3GPP TS 36.201 V8.2.0 Evolved Universal Terrestrial Radio Access (E-UTRA); LTE Physical Layer Procedure. (Release 8)[S].
    [32]S. Mamidi, E. R. Blem, M. J. Schulte, J. Glossner, D. Lancu, A. Lancu, M. Moudgill, S. Jinturkar. Instruction set extensions for software defined radio on a multithreaded processor [C]//Proc. of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, San Francisco, 2005,266-273.
    [33]Y. Lin, H. Lee, Y. Harel, M. Woh, S. Mahlke, T. Mudge, and K. Flautner, "A System Solution for High-Performance, Low Power SDR [C]//Software Defined Radio Technical Conference and Product Exposition,2005.
    [34]Torsten Limberg, Markus Winter, Marcel Bimberg, et al. A Fully Programmable 40 GOPS SDR Single Chip Baseband for LTE/WiMAX Terminals [C]//Proc. of ESSCIRC 2008.2008:466-469.
    [35]Anders Nilsson, Eric Tell, and Dake Liu. A Programmable SIMD-based Multi-standard Rake Receiver Architecture European Signal [C]//Proc. of EUSIPCO 2005, Antalya, Turkey,2005.
    [36]Shin MC, Park IC. SIMD processor-based turbo decoder supporting multiple third-generation wireless standards [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,2007,15(7):801-810.
    [37]Eric Tell, Anders Nilsson och Dake Liu. A Programmable DSP core for Baseband Processing [C]//Proc. of the IEEE Northeast Workshop on Circuits and Systems (NEWCAS), Quebec City, Canada,2005:403-406.
    [38]Anders Nilsson and Dake Liu. Multi-standard support in SIMT programmable baseband processors[C]//Proc. of the Swedish System-on-Chip Conference (SSoCC), Kolmarden, Sweden,2006.
    [39]A. Nilsson, E. Tell, D. Liu. An 11mm2 70mW Fully Programmable Baseband Processor for Mobile WiMAX and DVB-T/H in 0.12μm CMOS [C]//Proc of ISSCC2008,2008:266-267+612+255.
    [40]Woh M, Lin Y, Seo S, et al. Analyzing the scalability of SIMD for the next generation software defined radio [C]//Proc. of 2008 IEEE International Conference on Acoustics, Speech and Signal,2008:5388-5391.
    [41]Wu K H, Kanstein A, and Madsen J, et al. MT-ADRES:multi-threading on coarse-grained reconfigurable architecture [J]. International Journal of Electronics,2008,95(7):761-76.
    [42]GLOSSNER J, IANCU D, MOUDGILL M, et al. The Sandbridge SB3011 SDR platform; [C]//Proc. of SympoTIC 2006.2006.
    [43]SVAN BERKEL K, HEINLE F, MEUWISSEN P P E, et al. Vector processing as an enabler for software-defined radio in handheld devices [J]. Eurasip Journal on Applied Signal Processing,2005,2005(16):2613-2625.
    [44]LIN Y, LEE H, WOH M, et al. SODA:A low-power architecture for software radio [C]//Proc. of the 33rd International Symposium on Computer Archtiecture, 2006:89-100.
    [45]WOH M, LIN Y, SEO S W, et al. From SODA to Scotch:The Evolution of a Wireless Baseband Processor [C]//Proc. of the 41st Annual IEEE/ACM International Symposium on Microarchitecture,2008:152-163.
    [46]WOH M, SEO S, MAHLKE S, et al. AnySP:Anytime Anywhere Anyway Signal Processing [C]//Proc. of the ISCA2009, Austin, Texas, USA,2009: 20-24.
    [47]RAMACHER U. Software-defined radio prospects for multistandard mobile phones [J]. Computer,2007,40(10):62-69.
    [48]www.Tennsilica.com
    [49]PARIZI H, NIKTASH A, KAMALIZAD A, et al. A reconfigurable architecture for wireless communication systems [C]//Proc. of the Third International Conference on Information Technology:New Generations,2006:250-254.
    [50]MEI B F, VERNALDE S, VERKEST D, et al. ADRES:An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix[C]// Proc. of the 13th International Conference on Field-Programmable Logic and Applications.2003.
    [51]RAUWERDA G K, HEYSTERS P M, SMIT G J M. Towards software defined radios using coarse-grained reconfigurable hardware [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,2008,16(1):3-13.
    [52]Yong Li, Zhi ying Wang, Jian Ruan, and Kui Dai. A Low-Power Globally Synchronous Locally Asynchronous FFT Processor. [C]//Proc. of HPCC 2007, LNCS 4782:168-179.
    [53]Alice Wang. A 180-mV Sub threshold FFT Processor Using a Minimum Energy Design Methodology [J]. IEEE JOURNAL OF SOLID-SATTE CIRCUITS, 2005,40(1):310-319.
    [54]Shuming Chen, Xing Fang, VLSI Implementation of Soft Sphere Detection with Depth-first Search [C]//Proc. of IEEE International Performance Computing and Communications Conference, Austin, Texas, USA,2008.
    [55]Ada S. Y. Poon. An Energy-Efficient Reconfigurable Baseband Processor for Wireless Communications [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,2007,15(3):319-327.
    [56]Chao Cheng and Keshab K. Parhi. Hardware Efficient Low-Latency Architecture for High Throughput Rate Viterbi Decoders [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-Ⅱ:EXPRESS BRIFFS, 2008,55(12).
    [57]Maurizio Martina, Mario Nicola, and Guido Masera. A Flexible UMTS-WiMax Turbo Decoder Architecture [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-Ⅱ:EXPRESS BRIFFS,2008,55(4):369-373.
    [58]Weon Heum PARK, Myung Hoon SUNWOO, Seong Keun OH. Efficient DSP Architecture for Viterbi Decoding with Small Trace Back Latency [J]. IEICE TRANS. COMMUN.,2006, E89-B(10):2815-2818.
    [59]Afshin Niktash, Nader Bagherzadeh. RECFEC:Reconfigurable FEC Processor for Viterbi, Turbo, Reed-Solomon and LDPC Coding [C]//Proc. of WCNC2008,2008:605-610.
    [60]H. L. P. Arjuna Madanayake and Leonard T. Bruton. A Speed-Optimized Systolic Array Processor Architecture for Spatio-Temporal 2-D IIR Broadband Beam Filters [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIFFS,2008,55(7):1953-1966.
    [61]李宗伯,张普珩,张波涛,刘衡竹.一种Viterbi译码算法的改进[J].北京交通大学学报,2008,32(6):69-72.
    [62]Botao Zhang, Hengzhu Liu, Xianqiang Yang. Area-Efficient Reed-Solomon Decoder for lOGbps Satellite Communication [J]. IEICE Eletronics Express (ELEX),2011,8(13):1001-1007.
    [63]O. Muller, A. Baghdadi and M. Jezequel. ASIP-Based Multiprocessor SoC Design for Simple and Double Binary Turbo Decoding [C]//Proc. of of Design, Automation and Test in Europe,2006:1-6.
    [64]Botao Zhang, Hengzhu Liu, Xucan Chen, Dongpei Liu, and Xiaofei Yi. Low Complexity DVB-S2 LDPC Decoder [C]//Proc. of the 69th IEEE Vehicular Technology Conference, VTC2009-Spring,2009:1-5.
    [65]T. Brack, M. Alles, T. Lehnigk, Emden. Low Complexity LDPC Code Decoders for Next Generation Standards [C]//Proc. of the Design, Automation and Test in Europe Conference and Exhibition (DATE2007),2007:331-336.
    [66]Marco Gomes. Factorizable modulo M parallel architecture for DVB-S2 LDPC decoding [C]//Proc. of the GLOBECOM2007,2007.
    [67]陈为刚,殷柳国,陆建华.非规则LDPC码译码改进算法及其DSP实现[J].清华大学学报(自然科学版)2007年第47卷第4期.
    [68]吴俊,罗汉文.低密度校验码分层译码的密度演化算法[J].上海交通大学学报,2006年第41卷第6期.
    [69]Jin Sha, Minglun GAO. An FPGA Implementation of Array LDPC Decoder [C]. Proceedings of APCCAS 2006,2006:1675-1678.
    [70]雷青,文磊,唐朝京.基于变量节点串行消息传递的LDPC码译码研究[J].国防科技大学学报,2006年第28卷第5期.
    [71]Andrew DULLER Gajinder PANESAR Daniel TOWNER. Parallel Processing — the picoChip way! [J]. Communicating Process Architectures,2003.
    [72]周恩,张兴,吕召彪,孙宇昊.下一代宽带无线通信OFDM与MIMO技术[M].北京:人民邮电出版社,2008.
    [73]S. M. Alamouti. A simple transmit diversity technique for wireless communications [J]. IEEE Journal on Selected Areas in Communications,1998, 16(8):1451-1458.
    [74]Haiyan Jiao, Anders Nilsson, Eric Tell, and Dake Liu. MIPS Cost Estimation for OFDM-VBLAST systems [C]//Proc. of WCNC2006,2006:822-826.
    [75]M. Woh, S. Seo, H. Lee, Y. Lin, S. Mahlke, T. Mudge, C. Chakrabarti, K. Flautner. The Next Generation Challenge for Software Defined Radio [C]//Proc. of International Symposium on Systems, Architecture, Modeling and Simulation (SAMOS),2007.
    [76]刘衡竹,莫方政,张波涛,赵恒,刘冬培,陈艇,周理.软件无线电数字信号处理器体系结构研究[J].国防科技大学学报,2009,31(5):6-11.
    [77]LEON3软核处理器和IP说明文档[Z].//www.gaisler.com.2011.
    [78]Dake Liu. Embedded DSP Processor Design - Application Specific Instruction Set Processors [M]. MK Press.2007.
    [79]Josep Colom Ikuno, Martin Wrulich, Markus Rupp. System level simulation of LTE networks [C]//Proc. of IEEE Vehicular Technology Conference: VTC2010-Spring,2010:1-5.
    [80]Christian Mehlf uhrer, Martin Wrulich, Josep Colom Ikuno, Dagmar Bosanska, Markus Rupp. SIMULATING THE LONG TERM EVOLUTION PHYSICAL LAYER [C]//Proc. of European Signal Processing Conference 2009, Glasgow, Scotland,2009:1471-1478.
    [81]Michal imko. Channel Estimation for UMTS Long Term Evolution [D]. Vienna:Technischen Universitat Wien,2009.
    [82]方兴.面向软件无线电的数字信号处理器体系结构研究[D].长沙:国防科学技术大学.2007.
    [83]Ng, A. C.H., Weijers, J.W., Glassee, M., Schuster, T., Bougard, B., Van der Perre, L. ESL Design and HW/SW Co-verification of High-end Software Defined Radio Platforms [C]//Proc. of CODES+ISSS'07, Salzburg, Austria, 2007:191-196.
    [84]D. VERKEST, K. VAN ROMPAEY, I. BOLSENS. CoWare A Design Environment for Heterogeneous Hardware/Software Systems [J]. Design Automation for Embedded Systems,1,1996:357-386
    [85]ARM. AMBA技术说明[Z].2002.
    [86]A. Hoffmann, F. Fiedler, A. Nohl. A Methodology and Tooling Enabling Application Specific Processor Design [C]//Proc. of the 18th International Conference on VLSI Design,2005:399-404.
    [87]Xing Fang, Dong Wang, Shuming Chen. SPVA:A Novel Digital Signal Processor Architecture for Software Defined Radio[C]//Proc. of the 6th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2008), Doha, Qatar,2008:856-859.
    [88]WWW.TI.COM.
    [89]Yuan Lin, Nadev Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge. A Programmable Vector Coprocessor Architecture for Wireless Applications [C]// Proc. of IEEE,2006:1-8.
    [90]J. M. Rabaey, A. Chandrakasan, B. Nikolic. Digital Integrated Circuits:A Design Perspective[M]. second edition, Prentice Hall,2003.
    [91]Jarno K. Tanskanen, Teemu Pitkanen, Risto Makinen, and Jarmo Takala. Parallel Memory Architecture for TTA Processor [C]//Proc. of SAMOS 2007, LNCS 4599,2007:273-282.
    [92]Ang S, Constantinides. A flexible multi-port caching scheme for reconfigurable platforms [C]//Proc. of ARC 2006, LNCS4598,2006:205-216.
    [93]Pitkanem T, Tanskanen J, Makinen R, et al.Parallel Memory Architecture for Application-Specific Instruction-Set Processors [J]. J Sign Process Syst,,2009, 57:21-32
    [94]Peng J, Qin X, et al. An Efficient SIMD Architecture with Parallel Memory for 2D Cosine Transforms of Video Coding [C]//Proc. of IEEE,2008.
    [95]Tong D, Fang B, Hu M, A general method of designing SIMD computer using non-linear storage schemes [J]. Journal of Computer Research and Development, 2000,37(2):194-200.
    [96]Harper D. Conflict-free Vector Access Using a Dynamic Storage Scheme [J]. IEEE Trans. Comput.1991,40 (3):276-283.
    [97]Harper D. Increased Memory Performance During Vector Accesses Through the Use of Linear Address Transformations [J]. IEEE Trans. Comput.,1992,40(2): 227-230.
    [98]AHO E, Vanne J, et al. Configurable Data Memory for Multimedia Processing [J]. J Sign Process Syst,2007,50:231-249.
    [99]Zhang B, Liu H, et al. domain specific architecture for next generation wireless communication [C]//Proc. of IEEE/ACM Conf. Design Automation and Test in Europe (DATE2010):ACM,2010:1414-1419.
    [100]Tanskanen J, Sihvo T, et al. Byte and Modulo Addressable Parallel Memory Architecture for Video Coding [J]. IEEE Trans. Circuits Syst. Video Technol., 2004,14(11):1270-1276.
    [101]Tanskanen K, Creutzburg R. On Design of Parallel Memory Access Schemes for Video Coding [J]. J. VLSI Signal Process.,2005,40(2):215-237.
    [102]Kuzmanov G, Gaydadjiev G, et al. Multimedia Rectangular Addressable Memory [J]. IEEE Trans. Multimedia,2006,8(2):315-322.
    [103]Aho E, Vanne J, et al. Address Computation in Configurable Parallel Memory Architecture [J]. IEICE Trans. Inf. Syst,2004, E87-D(7):1674-1681.
    [104]Botao Zhang, Hengzhu Liu, Shixian Wang, Dongpei Liu. Application Specific Parallel Memory Architecture for Software Defined Radio [C].//Proc. of 2011 IEEE 3rd International Conference on Comuunication Software &Networks (ICCSN2011). Xi'an, China. IEEE,2011:411-415.
    [105]汪东.异构多核DSP数据流前瞻关键技术研究[D].国防科学技术大学博士学位论文.2007.
    [106]L.Shu, J.Daniel, J.Costello. Error Control Coding. (Second Edition)何元智等译.差错控制编码.(第二版)[M].北京:机械工业出版社,2007.
    [107]仇佩亮,陈惠芳,谢磊.数字通信基础[M].北京:电子工业出版社,2007.
    [108]R.G.Gallager. Low-Density Parity-Check Codes. MIT Press, Cambridge, MA, 1963.
    [109]于聪梅.非规则重复累积码构造研究[D].国防科技大学,2006.
    [110]刘冬培DVB-S2标准中LDPC码的编译码算法研究与实现[D].国防科技大学,2008.
    [111]ETSI. Digital Video Broadcasting (DVB), Second Generation Framing Structure, Channel Coding and Modulation Systems for Broadcasting[S], Interactive Services, News Gathering and Other Broadband Satellite Application. EN 302 307 V1.1.1,2004.
    [112]张长帅,宋黎定,刘泳. LDPC码在深空通信中的应用技术研究[J].航天器工程,2007,16(3):90-93.
    [113]IEEE P802.le/d8 IEEE Standard for Local and Metropolitan Area Networks[S]. IEEE,2005.
    [114]仲海梅,王锐华.4G中的纠错编码技术LDPC码及其新进展[J].广东通信技术,2004(12):6-8.
    [115]M.Fossorier. Reduced Complexity Iterative Decoding of Low-Density Parity-Check Codes Based on Belief Propagation[J]. IEEE Trans on Commum, 1999,47:673-680.
    [116]J.Chen, M.Fossorier. Density Evolution for Two Improved BP-based Decoding Algorithms of LDPC Codes[J]. IEEE on Commun. Letters,2002,6(5):208-210.
    [117]E.A.Choi, et al. Complexity Reduced Algorithm for LDPC Decoder for DVB-S2[J]. ETRI Journal,2005,27(5):639-642.
    [118]D.E.Hocevar. A Reduced Complexity Decoder Architecture via Layered Decoding of LDPC Codes[J]. IEEE Workshop on Signal Processing Systems. SIPS:Design and Implementation, Texas,2004:107-112.
    [119]雷菁,文磊,傅强.基于串行消息传递机制的LDPC码译码算法研究[J].四川大学学报(自然科学版),2006,43(4):790-795.
    [120]F.Kienle, T.Brack, N.Wehn. A Synthesizable IP Core for DVB-S2 LDPC Code Decoding[C]//Proc. of the Conference on Design, Automation and Test in Europe,2005:100-105.
    [121]A.Segard, et al. A DVB-S2 Compliant LDPC Decoder Integrating the Horizontal Shuffle Scheduling[C]//Proc. of ISPACS2006,2006:1013-1016.
    [122]T.Brack, et al. Low Complexity LDPC Code Decoders for Next Generation Standards[C]//Proc. of DATE 2007,2007:331-336.
    [123]P.Urard, et al. A 360mW 105Mb/s Compliant Codec Based on 64800b LDPC and BCH Codes Enabling Satellite-Transmission Portable Devices[C]//Proc. of ISSCC2008,2008:310-311.
    [124]Botao Zhang, Hengzhu Liu, Xianqiang Yang. Area-Efficient Reed-Solomon Decoder for lOGbps Satellite Communication [J]. IEICE Eletronics Express (ELEX),8(13):1001-1007.
    [125]Botao Zhang, Dongpei Liu, Shixian Wang, Xucan Chen, Hengzhu Liu. Design and Implementation of Area-efficient DVB-S2 BCH Decoder [C]//Proc. of 2010 IEEE 2nd International Conference on Computer Engineering and Technology (ICCET2010). Chengdu, China. IEEE,2010:3179-3184.
    [126]S. Lee, C.S. Choi, and H. Lee. Two-parallel Reed-Solomon based FEC architecture for optical communications[J]. IEICE ELEX,5(10):374-380,2008.
    [127]D.V. Sarwate,N.R. Shanbhag. High-speed architectures for Reed-Solomon decoders[J]. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,2001,9(5): 641-655.
    [128]S.D. Shieh, Y.K. Lu, S.M. Chung, and J.H. Chen. Design and implementation of efficient Reed-Solomon decoders for multi-modeapplications[C]//Proc. of Inter. Sym. on Circuits and Systems (ISCAS2006), Island of Kos, Greece,2006: 289-292.
    [129]B. Yuan, Z.F. Wang, L. Li, M.L. Gao, J. Sha, and C. Zhang. Area-Efficient Reed-Solomon Decoder Design for Optical Communications [J]. IEEE Trans. Circuit and Syst.-Ⅱ:Express Briefs,2009,56(6):469-473.
    [130]Jae Do Lee & Myung Hoon Sunwoo. Three-Parallel Reed-Solomon Decoder Using S-DCME for High-Speed Communications [J], J Sign Process Syst, DOI 10.1007/sl1265-010-0517-2.
    [131]J. Park, K. Lee, C.S. Choi, H. Lee. High-Speed Low-Complexity Reed-Solomon Decoder using Pipelined Berlekamp-Massey Algorithm [C]//Proc. of 7th In-ter. SoC Design Conf., pp.452-455, Busan, Korea,2009:452-455.
    [132]J.I. Park, and H. Lee. Area-effcient truncated Berlekamp-Massey architecture for Reed-Solomon decoder[J]. Electronics Letters,2011,47(8):241-243.
    [133]MINGXUAN YUAN, ZONGHUA GU, and XIUQIANG HE, XUE LIU, LEI JIANG. Hardware/Software Partitioning and Pipelined Scheduling on Runtime Reconfigurable FPGAs [J]. ACM Transactions on Design Automation of Electronic Systems,2010,15(2), Article 13.
    [134]Hoeseok Yang, Soonhoi Ha. Pipelined Data Parallel Task Mapping/Scheduling Technique for MPSoC [C]//Proc. of DATE2009,2009:69-74.
    [135]Botao Zhang, Dongpei Liu, and Hengzhu Liu. Software Controlled Time Division Multiplex MPSoC Bus for Software Defined Radio [C]//Proc. of 2011 3rd International Conference on Computer Engine and Application (ICCEA2011). Haikou, China. IEEE,2011:1693-1697.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700