大规模并行计算系统软件低功耗关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

大规模并行计算系统软件低功耗关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research of Software Low-power Optimization in Large-scale Parallel Computing System
作者：董勇
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：大规模并行计算系统 ; 功耗优化 ; OpenMP循环调度 ; 互连网络静态能量优化 ; 网络拓扑图划分 ; 路由器独立占用
英文关键词：large-scale parallel computing system ; power optimization ; OpenMP loop scheduling ; interconnection network static energy optimization ; network topology partition ; independent occupancy of routers
学位年度：2012
导师：杨学军
学科代码：0812
学位授予单位：国防科学技术大学
论文提交日期：2012-03-01

摘要

功耗已经成为大规模并行计算系统性能提升的重要约束条件之一。过高的功耗和能量消耗对系统的运行产生多种负面影响，包括系统故障率提高、可靠性降低、运行成本增加等。对大规模并行计算系统进行功耗优化研究具有重要的现实意义。
     大规模并行计算系统的功耗优化已经在系统设计与实现的各个层面展开，包括电路设计、逻辑设计、体系结构以及系统软件和应用软件层等。硬件设备的动态电压调节（DVS）和部件关闭等技术为软件功耗优化奠定了实现基础。软件低功耗优化具有不依赖于硬件平台、更灵活、可移植性好等诸多优点。计算系统，通信系统是大规模并行计算系统的核心，也是大规模并行计算系统功耗优化的重点。本文针对大规模并行计算系统的软件功耗优化技术，对基于循环调度的结点机能量优化、基于网络拓扑图划分的互连网络能量优化展开了研究。
     针对结点机的能量优化是大规模并行计算系统能量优化的重要组成部分。本文研究了基于OpenMP循环调度的结点机能量优化技术，通过将DVS和调度算法相结合提出了两类功耗优化算法：性能受限的能量优化和能量受限的性能优化。性能受限的能量优化通过对块轮循静态调度算法进行改进，提出了能量节约的最优静态调度算法（EOSS）。进一步考虑cache失效对访存延迟的影响，提出了改进的最优静态调度算法（IEOSS）。能量受限的性能优化在有限能量供给条件下，通过循环调度，减少循环执行时间，提出了能量受限的性能最优静态调度算法（ECPOSS）。论文证明了EOSS和ECPOSS的最优性，并通过实验验证了上述算法的有效性。
     互连网络能量优化对全系统能量优化具有重要意义。静态能耗是大规模并行计算系统互连网络能量消耗的主要组成部分。网络部件关闭是有效降低互连网络静态能耗的重要技术。本文提出了基于网络拓扑图划分的互连网络能量优化。首先分析了空间、时间两维路由器的占用性，提出了基于路由规则的网络拓扑图划分的概念，提出了Nd-mesh、Nd-torus、胖树的确定性路由、方向自适应路由和完全自适应路由规则的拓扑图划分方法，以此为基础提出了基于网络拓扑图划分的静态能量管理实现关键技术，在空间、时间两个维度上实现了空闲路由器关闭。基于TH-1A的软件系统框架提出并实现了基于网络拓扑图划分的互连网络静态能量管理技术方案，构建了虚实结合的验证环境，并通过大量实验结果验证了所提出方法的有效性。
     本文的主要创新点如下：
     1.提出了能量优化指导的并行循环调度方法。从性能受限能量优化和能量受限性能优化两个角度出发，给出了两类OpenMP循环调度能量优化算法，分别以能量节约的最优静态调度算法（EOSS）和能量受限的性能最优静态调度算法（ECPOSS）为代表。证明了EOSS和ECPOSS的最优性，并通过实验评测验证了这两类算法的有效性。
     2.提出了基于网络拓扑图划分的路由器关闭思想。从空间维度分析了作业对路由器的直接占用和间接占用，从时间维度分析了作业对路由器的连续占用。在此基础上分析了多作业对路由器的独立占用，从而提出了网络拓扑图划分的概念，以指导路由器关闭。
     3.提出了典型网络、典型路由规则的网络拓扑图划分方法。给出了Nd-mesh、Nd-torus、胖树的确定性路由、方向自适应路由、完全自适应路由规则支配下的网络拓扑图划分定理和算法。
     4.提出了基于网络拓扑图划分的静态能量管理实现的关键技术，包括不可关闭域设置技术、拓扑感知的资源分配策略以及空间碎片管理技术；基于TH-1A的软件系统框架提出并实现了基于网络拓扑图划分的互连网络静态能量管理实现方案，构建了虚实结合的验证环境，实验结果验证了本文所提方法的有效性。
Power consumption has become one of the most important constraints forperformance enhancement of large-scale parallel computing system. Too high powerand energy consumption has many negative effects on system running, which includeincreasing system failure frequency, reducing system reliability, increasing running costand so on. Therefore, studying power optimization of large-scale parallel computingsystem has important practical significance.
     Large-scale parallel computing system power optimization has been extended intoeach hiberarchy of system design and implementation, including circuit design, logicdesign, architecture, system software and applications. Dynamic voltage scale (DVS)and shutting down technology of hardware device provide implementation basis forsoftware power optimization. Software-level low power optimization has manyadvantages, such as hardware-platform independence, more flexibility, goodtransplantability and so on. Computing system and communication system compose thecrucial components of large-scale parallel computing system so that they are the focusof power optimization. From the viewpoint of low-power optimization softwaretechnology research in large-scale parallel computing system, this thesis studies loopscheduling based energy optimization of compute nodes, energy optimization ofinterconnection network based on network topology partition.
     The energy optimization of compute nodes is an important part of poweroptimization of large-scale parallel computing system. This thesis studies the OpenMPloop scheduling based energy optimization of compute nodes. Through combining DVSand scheduling algorithm, the thesis proposes two kinds of power optimizationalgorithms: performance-constrained energy optimization and energy-constrainedperformance optimization. Performance-constrained energy optimization improves theblock static scheduling algorithm and then propose Energy Saving Optimal StaticScheduling (EOSS) algorithm. The Improved Energy Optimal Static Scheduling(IEOSS) is then proposed which considers the impact of cache misses on the latency ofmemory access. Energy-constrained performance optimization is presented by EnergyConstrained Performance Optimal Static Scheduling (ECPOSS) algorithm which canreduce the execution time by loop rescheduling within a given energy constraint. Theoptimalities of EOSS and ECPOSS have been proved. The experiment results validatethe effectiveness of the above algorithms.
     The energy optimization of interconnection network has an important meaning forenergy optimization of the whole system. Static energy consumption occupies the mainpart of interconnection network energy consumption. The important technique to reducethe static energy consumption is shutting down network components. This thesis proposes energy optimization of interconnection network based on network topologypartition. Firstly, the thesis analyzes the occupancy characteristics of interconnectionnetwork in space and time dimension followed by the concept of network topologypartition based on routing rule. Then the thesis proposes network topology partitionmethods for Nd-mesh, Nd-torus and fat-tree network with three kinds of routing rule:determinate routing, oblivious adaptive routing and full adaptive routing. Finaly, thethesis describes the key techniques of static energy management for interconnectionnetwork based on network topology partition which shut down routers in space and timedimension. Within the software framwork of TH-1A, the scheme of static energymanagement of interconection network has been proposed with which the virtual-actualcombined experimental environment is constructed. A large amount of experiment hasproven the effectiveness of it.
     The main contributions of this thesis are as follows.
     1. Propose the energy optimization guided parallel loop scheduling methods. Fromthe viewpoint of performance-constrained energy optimization and energy-constrainedperformance optimization, the thesis represents two kinds of OpenMP loop schedulingenergy optimization algorithms, among which the representative two algorithms areEnergy Saving Optimal Static Scheduling (EOSS) and Energy ConstrainedPerformance Optimal Static Scheduling (ECPOSS). The optimalities of these twoalgorithms are proved. The experiment results validate the effectiveness of these twokinds of algorithms.
     2. Propose the ideology of router shutting down based on interconnection networktopology partition. The thesis analyzes the direct and indirect occupancy of parallel jobson routers in space dimension, and the continual occupancy on routers in timedimension. The independent occupancy of multip-jobs on routers is also analysed whichis followed by the concept of network topology partition--it can instruct the shuttingdown of routers.
     3. Propose the network topology partion methods for typical network with typicalrouting rules. The network topology partition methods for Nd-mesh, Nd-torus andfat-tree network with three kinds of routing rule: determinate routing, oblivious adaptiverouting and full adaptive routing are presented.
     4. The key techniques for static energy management based on interconnectionnetwork topology partition are proposed including the setting of regions in which therouters cannot be powered off, topology-awared resource allocation policy and spacialfragments management. Under the software framework of TH-1A, the thesis proposeand implements scheme of the network topology partition based interconnectionnetwork static energy management with which the virtual-actual combined experimental environment is constructed. The experimental results validate the effectiveness ofenergy optimization of interconnection network.

引文

[1] The TOP500list[EB/OL]. http://www.top500.org,2011-12-20.
    [2] Sterling T. An Overview of Exascale Architecture Challenges[C]//SC08Workshop on the Path to Exascale.Austin, Texas, USA:,2008:1.
    [3] Bergman K, Borkar S, Campbell D, et al. ExaScale ComputingStudy:Technology Challenges in Achieving Exascale Systems[R].2008.
    [4]易会战.低功耗技术研究—体系结构和编译优化[博士].长沙:国防科学技术大学,2006.
    [5] Li J, Martinez J F. Power-Performance Implications of Thread-level Parallelismon Chip Multiprocessors[C]//Proceedings of Symposium on Performance Analysis ofSystems and Software (ISPASS'05).Austin, TX:IEEE,2005:124～134.
    [6] Chen Y, Shao Z, Zhuge Q, et al. Minimizing Energy via Loop Scheduling andDVS for Multi-Core Embedded Systems[C]//Proceedings of the200511thInternational Conference on Parallel and Distributed Systems (ICPADS'05).Fuduoka,Japan:IEEE Computer Society,2005:2～6.
    [7] Xie F, Martonosi M, Malik S. Compile-time dynamic voltage scaling settings:opportunities and limits[C]//Proceedings of the ACM SIGPLAN2003Conference onProgramming Language Design and Implementation (PLDI'03).San Diego,California, USA:ACM,2003:49～62.
    [8] Saputra H, Kandemir M, vijaykrishan N, et al. Energy-Conscious CompilationBased On Voltage Scaling[C]//ACM SIGPLAN Joint Conference on Languages,Compilers, and Tools for Embedded Systems&Software and Compilers forEmbedded Systems (LCTES'02-SCOPES'02).Berlin, Germany:ACM,2002:2～11.
    [9] Yong D, Juan C, Tao T. Power measurements and analyses of massive objectstorage system[C]//10th IEEE International Conference on Computer andInformation Technology (CIT'10).Bradford, United kingdom:IEEE ComputerSociety,2010:1317～1322.
    [10] Hsu C, Feng W. A Power-Aware Run-Time System for High-PerformanceComputing[C]//Proceedings of the ACM/IEEE SC2005Conference on HighPerformance Networking and Computing.,2005:1.
    [11] Feng W. Making a case for efficient supercomputing[J]. ACM Queue,2003,1(7):54～64.
    [12]拉贝,钱德拉卡山,尼科利奇.数字集成电路-设计透视（第2版）[M].北京:清华大学出版社,2004:
    [13] Bianchini R. Research Directions in Power and Energy Conservation forClusters[R]. Rutgers University,2001.
    [14] Mudge T. Power: A first class design constraint for future architectures[J]. IEEEComputer,2001,34(4):52～58.
    [15] Tiwari V, Singh D, Rajgopal S, et al. Reducing power in high-performancemicroprocessors[C]//Proceedings of the35th Conference on Design Automation.SanFrancisco, California, United States:ACM,1998:732～737.
    [16] Hsu C. Compiler-Directed Dynamic Voltage and Frequency Scaling for CPUPower and Energy Reduction[D]. New Brunswick, New Jersey: The State Universityof New Jersey,2003.
    [17] Roy K, Johnson M C, Software Design for Low Power. In Low power design indeep submicron electronics,; Kluwer Academic Publishers: Norwell, MA, USA,1997; p433～460
    [18] Yang H. Power-aware Compilation Techniques for High PerformanceProcessors[D].University of Delaware,2004.
    [19] Tiwari V, Malik S, Wolfe A. Compilation Techniques for Low Energy: AnOverview[C]//Proceedings of the1994Symposium on Low-Power Electronics.SanDiego, CA:,1994:38～39.
    [20] Kandemir M, Vijaykrishnan N, Irwin M J, et al. Register Relabeling: A PostCompilation Technique for Energy Reduction[C]//Proceedings of Workshop onCompilers and Operating Systems for Low Power (COLP'00).Philadelphia, PA:,2000:1.
    [21] Kremer U, Hicks J, Rehg J. A Compilation Framework for Power and EnergyManagement on Mobile Computers[C]//Proceedings of the14th internationalconference on Languages and compilers for parallel computing.Cumberland Falls,KY, USA:Springer-Verlag,2001:115～131.
    [22]赵荣彩,唐志敏,张兆庆,等.软件流水的低功耗编译技术研究[J].软件学报,2003,14(8):1357～1363.
    [23] Keramidas G, Spiliopoulos V, Kaxiras S. Interval-based models for run-timeDVFS orchestration in superscalar processors[C]//Proceedings of the7th ACMinternational conference on Computing frontiers.Bertinoro, Italy:ACM,2010:287～296.
    [24] Sueur E L, Heiser G. Dynamic voltage and frequency scaling: the laws ofdiminishing returns[C]//Proceedings of the2010international conference on Poweraware computing and systems.Vancouver, BC, Canada:USENIX Association,2010:1～8.
    [25] Hsu C, Kremer U. The Design, Implementation, and Evaluation of a CompilerAlgorithm for CPU Energy Reduction[C]//Proceedings of the ACM SIGPLAN2003conference on Programming language design and implementation (PLDI'03).SanDiego, California, USA:ACM,2003:38～48.
    [26] Snowdon D C, Sueur E L, Petters S M, et al. Koala: a platform for OS-levelpower management[C]//Proceedings of the4th ACM European conference onComputer systems.Nuremberg, Germany:ACM,2009:289～302.
    [27] Zhu Z, Zhang X. Look-Ahead Architecture Adaptation to Reduce ProcessorPower Consumption[J]. IEEE Micro,2005,25(4):10～19.
    [28] Bahuleyan J, Nagpal R, Srikant Y N. Integrated energy-aware cyclic and acyclicscheduling for clustered VLIW processors[C]//The Sixth Workshop onHigh-Performance Power-Aware Computing (HPPAC'10).Atlanta, GA:IEEE,2010:1～8.
    [29] Rizvandi N B, Taheri J, Zomaya A Y, et al. Linear Combinations ofDVFS-Enabled Processor Frequencies to Modify the Energy-Aware SchedulingAlgorithms[C]//Proceedings of the201010th IEEE/ACM International Conferenceon Cluster, Cloud and Grid Computing.IEEE Computer Society,2010:388～397.
    [30] Wu Q, Martonosi M, Clark D W, et al. Dynamic-Compiler-Driven Control forMicroprocessor Energy and Performance[J]. IEEE Micro,2006,26(1):119～129.
    [31] Wu Q, Martonosi M, Clark D W, et al. A Dynamic Compilation Framework forControlling Microprocessor Energy and Performance[C]//Proceedings of the38thannual IEEE/ACM International Symposium on Microarchitecture.Barcelona,Spain:IEEE Computer Society,2005:271～282.
    [32] Venkatachalam V, Franz M. Power reduction techniques for microprocessorsystems[J]. ACM Computing Surveys,2005,37(3):195～237.
    [33] Weiser M, Welch B, Demers A, et al. Scheduling for reduced CPU energy[C]//Proceedings of the1st Symposium on Operating Systems Design and Implementation(OSDI'94).,1994:13～23.
    [34] Govil K, Chan E, Wasserman H. Comparing algorithms for dynamicspeedsetting of a low-power CPU[C]//the1st ACM International Conference onMobile Computing and Networking (MOBICOM'95).,1995:13～25.
    [35] Lorch J R, Smith A J. Improving dynamic voltage scaling algorithms withPACE[C]//Proceedings of the2001ACM SIGMETRICS international conference onMeasurement and modeling of computer systems.Cambridge, Massachusetts,USA:ACM,2001:50～61.
    [36] Lee S, Sakurai T. Run-time voltage hopping for low-power real-timesystems[C]//Proceedings of the37th Annual Design Automation Conference.LosAngeles, California, United States:ACM,2000:806～809.
    [37] Hsu C, Kremer U, Hsiao M. Compiler-Directed Dynamic Voltage/FrequencyScheduling for Energy Reduction in Microprocessors[C]//Proceedings ofInternational Symp. on Low Power Electronics and Design (ISLPED'01).,2001:275～278.
    [38] Hsu C, Kremer U. Single region vs. multiple regions: A comparison of differentcompiler-directed dynamic voltage scheduling approaches[C]//Second InternationalWorkshop on Power-aware Computer Systems (PACS'02).Cambridge, MA,USA:Springer,2002:1.
    [39]陈娟.低功耗软件优化技术研究[博士].长沙:国防科学技术大学,2007.
    [40] Zheng L, Dong M, Jin H, et al. An Improved Approach to Tag Reduction onLow Power CMP with Trade-Off of Energy and Performance[C]//2009FourthInternational Conference on Frontier of Computer Science and Technology.ShangHai,China:IEEE,2009:96～102.
    [41] Sch J H, nherr, Richling J, et al. Event-driven processor powermanagement[C]//Proceedings of the1st International Conference on Energy-EfficientComputing and Networking.Passau, Germany:ACM,2010:61～70.
    [42] Bhattacharjee A, Contreras G, Martonosi M. Full-system chip multiprocessorpower evaluations using FPGA-based emulation[C]//Proceeding of the13thinternational symposium on Low power electronics and design.Bangalore,India:ACM,2008:335～340.
    [43] Maury M C, Shah A, Blagojevic F, et al. Prediction models formulti-dimensional power-performance optimization on many cores[C]//Proceedingsof the17th international conference on Parallel architectures and compilationtechniques.Toronto, Ontario, Canada:ACM,2008:250～259.
    [44] Ding Y, Kandemir M, Irwin M J, et al. Dynamic core partitioning for energyefficiency[C]//The Sixth Workshop on High-Performance Power-Aware Computing(HPPAC'10).Atlanta, GA:IEEE,2010:1～8.
    [45] Anderso J H, Baruah S K. Energy-Efficient Synthesis of Periodic Task Systemsupon Identical Multiprocessor Platforms[C]//Proceedings of the24th InternationalConference on Distributed Computing Systems (ICDCS'04).Tokyo Japan:IEEEComputer Society,2004:428～435.
    [46] Che J J, Hs H R, Chuan K H, et al. Multiprocessor Energy-Efficient Schedulingwith Task Migration Considerations[C]//Proceedings of the16th EuromicroConference on Real-Time Systems(ECRTS'04).IEEE Computer Society,2004:101～108.
    [47] Gruian F. System-Level Design Methods for Low-Energy ArchitecturesContaining Variable Voltage Processors[C]//Proceedings of the First InternationalWorkshop on Power-Aware Computer Systems-RevisedPapers.Springer-Verlag,2001:1～12.
    [48] Zhan Y, H X S, Chen D Z. Task scheduling and voltage selection for energyminimization[C]//Proceedings of the39th annual Design AutomationConference.New Orleans, Louisiana, USA:ACM,2002:183～188.
    [49] Yan C Y, Che J J, Kuo T W. An Approximation Algorithm for Energy-EfficientScheduling on A Chip Multiprocessor[C]//Proceedings of the conference on Design,Automation and Test in Europe-Volume1.IEEE Computer Society,2005:468～473.
    [50] Kadayif I, Kandemir M, Sezer U. An Integer Linear Programming BasedApproach for Parallelizing Applications in On-Chip Multiprocessors[C]//Proceedingsof the39th IEEE/ACM Design Automation Conference (DAC'02).New Orleans, LA,USA:ACM,2002:703～708.
    [51] Kadayif I, Kandemir M, Karakoy M. An Energy Saving Strategy Based onAdaptive Loop Parallelization[C]//Proceedings of Design Automation Conference(DAC'02).New Orleans, Louisiana, USA:ACM,2002:195～200.
    [52] Kadayif I, Kandemir M, Vijaykrishnan N, et al. Exploiting Processor WorkloadHeterogeneity for Reducing Energy Consumption in Chip Multiprocessors[C]//Proceedings of Design, Automation and Test in Europe (DATE'04).Paris,France:IEEE Computer Society,2004:1158～1163.
    [53] Li J, Martinez J F. Dynamic Power-Performance Adaptation of ParallelComputation on Chip Multiprocessors[C]//Proceedings of the InternationalSymposium on High Performance Computer Architecture(HPCA'06).Austin,Texas:IEEE Computer Society,2006:77～87.
    [54] Chang P C, Wu I-, Shann J J, et al. ETAHM: An energy-aware task allocationalgorithm for heterogeneous multiprocessor[C]//Proceedings of Design AutomationConference (DAC'08).Anaheim CA, USA:,2008:776～779.
    [55] Cosku A K, Stron R, Tullse D M, et al. Evaluating the impact of job schedulingand power management on processor lifetime for chip multiprocessors[C]//Proceedings of the eleventh international joint conference on Measurement andmodeling of computer systems (SIGMETRICS/Performance).Seattle, WA,USA:ACM,2009:169～180.
    [56] Teodorescu R, Torrellas J. Variation-Aware Application Scheduling and PowerManagement for Chip Multiprocessors[C]//35th International Symposium onComputer Architecture (ISCA'08).Beijing, China:IEEE,2008:363～374.
    [57] Rangan K K, Wei G, Brooks D. Thread motion: fine-grained powermanagement for multi-core systems[C]//36th International Symposium on ComputerArchitecture (ISCA'09).Austin, TX, USA:ACM,2009:302～313.
    [58] Zhan S, Chatha K S. Automated techniques for energy efficient scheduling onhomogeneous and heterogeneous chip multi-processor architectures[C]//Proceedingsof the2008Asia and South Pacific Design Automation Conference.Seoul,Korea:IEEE Computer Society Press,2008:61～66.
    [59] Li K. Scheduling parallel tasks on multiprocessor computers with efficientpower management[C]//The Sixth Workshop on High-Performance Power-AwareComputing (HPPAC'10).Atlanta, GA:IEEE,2010:1～8.
    [60] Isci C, Buyuktosunoglu A, Cher C Y, et al. An Analysis of Efficient Multi-CoreGlobal Power Management Policies: Maximizing Performance for a Given PowerBudget[C]//Proceedings of the39th Annual IEEE/ACM International Symposium onMicroarchitecture.IEEE Computer Society,2006:347～358.
    [61] Elyada A, Ginosar R, Weiser U. Low-Complexity Policies forEnergy-Performance Tradeoff in Chip-Multi-Processors[J]. IEEE Transactions onVery Large Scale Integration (VLSI) Systems,2008,16(9):1243～1248.
    [62] Shi B, Zhang Y, Srivastava A. Dynamic thermal management for single andmulticore processors under soft thermal constraints[C]//Proceedings of the16thACM/IEEE international symposium on Low power electronics and design.Austin,Texas, USA:ACM,2010:165～170.
    [63] Brooks D, Martonosi M. Dynamic Thermal Management for High-PerformanceMicroprocessors[C]//Proceedings of the7th International Symposium onHigh-Performance Computer Architecture.IEEE Computer Society,2001:171.
    [64] Cohen A, Finkelstein F, Mendelson A, et al. On Estimating OptimalPerformance of CPU Dynamic Thermal Management[J]. IEEE Comput. Archit. Lett.,2003,2(1):6.
    [65] Donald J, Martonosi M. Techniques for Multicore Thermal Management:Classification and New Exploration[C]//Proceedings of the33rd annual internationalsymposium on Computer Architecture.IEEE Computer Society,2006:78～88.
    [66] Li Y, Brooks D, Hu Z, et al. Performance, Energy, and Thermal Considerationsfor SMT and CMP Architectures[C]//Proceedings of the11th InternationalSymposium on High-Performance Computer Architecture.IEEE ComputerSociety,2005:71～82.
    [67] Huang W, Stant M R, Sankaranarayanan K, et al. Many-core design from athermal perspective[C]//Proceedings of the45th annual Design AutomationConference.Anaheim, California:ACM,2008:746～749.
    [68] Wu W, Jin L, Yang J, et al. Efficient power modeling and software thermalsensing for runtime temperature monitoring[J]. ACM Transactions on DesignAutomation of Electronic Systems,2008,12(3):1～29.
    [69] A T J, D S X, Shen R, et al. General behavioral thermal modeling andcharacterization for multi-core microprocessor design[C]//Proceedings of theConference on Design, Automation and Test in Europe.Dresden, Germany:EuropeanDesign and Automation Association,2010:1136～1141.
    [70] Li D, D S X, Pacheco E H, et al. Architecture-level thermal characterization formulticore microprocessors[J]. IEEE Transactions on Very Large Scale Integration(VLSI) Systems,2009,17(10):1495～1507.
    [71] Mesa F J, Ardestani E K, Renau J. Characterizing processor thermalbehavior[C]//Proceedings of the fifteenth edition of ASPLOS on Architecturalsupport for programming languages and operating systems.Pittsburgh, Pennsylvania,USA:ACM,2010:193～204.
    [72] Reda S, Cochran R, Nowroz A N. Improved Thermal Tracking for ProcessorsUsing Hard and Soft Sensor Allocation Techniques[J]. IEEE Transactions onComputers,2011,: PrePrint.
    [73] Coskun A K, Rosing T S, Gross K C. Temperature management inmultiprocessor SoCs using online learning[C]//Proceedings of the45th annualDesign Automation Conference.Anaheim, California:ACM,2008:890～893.
    [74] Skadron K, Stan M R, Sankaranarayanan K, et al. Temperature-awaremicroarchitecture: Modeling and implementation[J]. ACM Trans. Archit. CodeOptim.,2004,1(1):94～125.
    [75] Rao R, Vrudhula S. Performance optimal processor throttling under thermalconstraints[C]//Proceedings of the2007international conference on Compilers,architecture, and synthesis for embedded systems.Salzburg, Austria:ACM,2007:257～266.
    [76] Rao R, Vrudhula S, Chakrabarti C, et al. An optimal analytical solution forprocessor speed control with thermal constraints[C]//Proceedings of the2006international symposium on Low power electronics and design.Tegernsee, Bavaria,Germany:ACM,2006:292～297.
    [77] Rao R, Vrudhula S. Efficient online computation of core speeds to maximize thethroughput of thermally constrained multi-core processors[C]//Proceedings of the2008IEEE/ACM International Conference on Computer-Aided Design.San Jose,California:IEEE Press,2008:537～542.
    [78] Stavrou K, Trancoso P. Thermal-aware scheduling for future chipmultiprocessors[J]. EURASIP Journal on Embedded Systems,2007,2007(1):40～40.
    [79] Choi J, Cher C Y, Franke H, et al. Thermal-aware task scheduling at the systemsoftware level[C]//Proceedings of the2007international symposium on Low powerelectronics and design.Portland, OR, USA:ACM,2007:213～218.
    [80] Coskun A K, Rosing T S, Gross K C. Proactive temperature management inMPSoCs[C]//Proceeding of the13th international symposium on Low powerelectronics and design.Bangalore, India:ACM,2008:165～170.
    [81] Ge Y, Malani P, Qiu Q. Distributed task migration for thermal management inmany-core systems[C]//Proceedings of the47th Design AutomationConference.Anaheim, California:ACM,2010:579～584.
    [82] Winter J A, Albonesi D H, Shoemaker C A. Scalable thread scheduling andglobal power management for heterogeneous many-core architectures[C]//Proceedings of the19th international conference on Parallel architectures andcompilation techniques.Vienna, Austria:ACM,2010:29～40.
    [83]黄春.面向分布共享存储体系结构的高效能OpenMP关键技术研究[博士].长沙:国防科学技术大学,2007.
    [84] Etinski M, Corbalan J, Labarta J, et al. BSLD threshold driven powermanagement policy for HPC centers[C]//The Sixth Workshop on High-PerformancePower-Aware Computing (HPPAC'10).Atlanta, GA:IEEE,2010:1～8.
    [85] Etinski M, Corbalan J, Labarta J, et al. Optimizing job performance under agiven power constraint in HPC centers[C]//International Conference on GreenComputing.Chicago, IL, USA:IEEE,2010:257～267.
    [86] Zhuo J, Chakrabarti C. Energy-efficient dynamic task scheduling algorithms forDVS systems[J]. ACM Transactions on Embedded Computing Systems,2008,7(2):1～25.
    [87] Rountree B, Lownenthal D K, Supinski B R, et al. Adagio: making DVSpractical for complex HPC applications[C]//Proceedings of the23rd internationalconference on Supercomputing.Yorktown Heights, NY, USA:ACM,2009:460～469.
    [88] S M, P B, S L. The Alpha21364network architecture[J]. IEEE Micro,2002,22(1):26～35.
    [89] B S. The Blue Gene/L Supercomputer[R].2005.
    [90] Wang H S, Zhu X, Peh L S, et al. Orion: A Power-Performance Simulator forInterconnection Networks[C]//Proceedings of the35th Annual InternationalSymposium on Microarchitecture.Istanbul, Turkey:ACM/IEEE,2002:294～305.
    [91] Wang H, Peh L S, Malik S. Power-driven Design of Router Microarchitecturesin On-chip Networks[C]//the36th Annual International Symposium onMicroarchitecture.San Diego, CA, USA:ACM/IEEE,2003:105～116.
    [92] Shang L, Peh L S, Jha N K. Dynamic Voltage Scaling with Links for PowerOptimization of Interconnection Networks[C]//Proceedings of the9th InternationalSymposium on High-Performance Computer Architecture.IEEE ComputerSociety,2003:91～102.
    [93] Wei G, Kim J, Liu D, et al. A variable-frequency parallel I/O interface withadaptive power-supply regulation[J]. Journal of Solid-State Circuits,2000,35(11):1600～1610.
    [94] Kim J, Horowitz M A. Adaptive supply serial links with sub-1V operation andper-pin clock recovery[C]//International Solid-State Circuits Conference.IEEE,2002:216.
    [95] Stin J M, Carte N P, Flich J. Comparing Adaptive Routing and DynamicVoltage Scaling for Link Power Reduction[J]. IEEE Computer Architecture Letters,2004,3(1):4～8.
    [96] Li F, Che G, Kandemir M. Compiler-directed voltage scaling on communicationlinks for reducing power consumption[C]//Proceedings of the2005IEEE/ACMInternational conference on Computer-aided design.San Jose, CA:IEEE ComputerSociety,2005:456～460.
    [97] Soteriou V, Eisley N, Peh L S. Software-directed power-aware interconnectionnetworks[J]. ACM Transactions on Architecture and Code Optimization,2007,4(1):5.
    [98] Son S W, Malkowski K, Chen G, et al. Integrated link/CPU voltage scaling forreducing energy consumption of parallel sparse matrix applications[C]//20thInternational Parallel and Distributed Processing Symposium (IPDPS'06).RhodesIsland, Greece:IEEE,2006:25～29.
    [99] Alonso M, Martinez J M, Santonja V, et al. Reducing Power Consumption inInterconnection Networks by Dynamically Adjusting Link Width[C]//Euro-Par2004Parallel Processing.Pisa, Inaly:Springer,2004:882～890.
    [100] Kant K. Power Control of High Speed Network Interconnects in DataCenters[C]//IEEE INFOCOM2009High-Speed Networks Workshop (HSN'09).Riode Janeiro, Brazil:IEEE,2009:145～150.
    [101] Vassos S, Peh L S. Dynamic Power Management for Power Optimization ofInterconnection Networks Using On/Off Links[C]//11th Symposium on HighPerformance Interconnects.Stanford, California:IEEE,2003:15～20.
    [102] Soterio V, Peh L S. Design-Space Exploration of Power-Aware On/OffInterconnection Networks[C]//Proceedings of the IEEE International Conference onComputer Design.IEEE Computer Society,2004:510～517.
    [103] Alonso M, Coll S, Mart J M, et al. Dynamic power saving in fat-treeinterconnection networks using on/off links[C]//20th International Parallel andDistributed Processing Symposium (IPDPS'06).Rhodes Island, Greece:IEEE,2006:8.
    [104] Kim E J, Lin G M, Yu K H, et al. A Holistic Approach to DesigningEnergy-Efficient Cluster Interconnects[J]. IEEE Trans. Comput.,2005,54(6):660～671.
    [105] Kim E J, Yu K H, Lin G M, et al. Energy optimization techniques in clusterinterconnects[C]//Proceedings of the2003international symposium on Low powerelectronics and design.Seoul, Korea:ACM,2003:459～464.
    [106] Li F, Chen G, Kandemir M, et al. Exploiting last idle periods of links fornetwork power management[C]//Proceedings of the5th ACM internationalconference on Embeded software.Jersey City, NJ, USA:ACM,2005:134～137.
    [107] Conner S, Akioka S, Irwin M J, et al. Link Shutdown Opportunities DuringCollective Communications in3-D Torus Nets[C]//21th International Parallel andDistributed Processing Symposium (IPDPS'07).Long Beach, California,USA:IEEE,2007:1～7.
    [108] Wikipedia. Orthogonal Convex Hull[EB/OL].http://en.wikipedia.org/wiki/Orthogonal_convex_hull.htm,2012-1-20.
    [109] Yangjing L, Milton L, Jie Z. Power Utilization Techniques with Links ofInterconnection Networks[EB/OL].http://cva.stanford.edu/classes/ee382c/research/power_links.pdf,2011-4-2.
    [110] The OpenMP API specification for parallel programming[EB/OL].http://openmp.org,2011-6-1.
    [111] Smith B J. Architecture and applications of the HEP multiprocessor computersystem[C]//Proceedings of SPIE-Real-Time Signal Processing IV.,1981:241～248.
    [112] C. P. Kruskal A W. Allocating Independent Subtasks on Parallel Processors[J].IEEE Transactions on Software Engineering,1985,11(10):1001～1016.
    [113] Polychronopoulos C D, D J K. Guided self-scheduling: A practical schedulingscheme for parallel supercomputers[J]. IEEE Transactions on Computers,1987,36(12):1425～1439.
    [114] Hummel S F, Schonberg E, Flynn L E. Factoring: a method for schedulingparallel loops[J]. Communications of the ACM,1992,35(8):90～101.
    [115] Tzen T H, Ni L M. Trapezoid Self-Scheduling: A Practical Scheduling Schemefor Parallel Computers[J]. IEEE Transactions on Parallel and Distributed Systems,1993,4(1):87～98.
    [116] Huang C, Yang X. CCRG OpenMP: Experiments and Improvements[C]//Proceedings of the1st International Workshop on OpenMP.Eugene, OregonUSA:Lecture Notes in Computer Science2690,2005:514～521.
    [117] Burd T, Brodersen R. Design issues for dynamic voltage scaling[C]//theProceedings of International Symposium on Low Power Electronics and Design(ISLPED'00).Rapallo, Italy:ACM,2000:9～14.
    [118] Philip T. Increasing Chunk Size Loop Scheduling Algorithems for DataIndependent Loops[M].The Pennsylvania State University,1995.
    [119] Burger D, Austin T M. The SimpleScalar tool set, Version2.0[R]. University ofWisconsin-Madison,1997.
    [120] Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural-levelpower analysis and optimizations[C]//Proceedings of27th International Symposiumon Computer Architecture (ISCA'00).,2000:83～94.
    [121] NAS Parallel Benchmarks[EB/OL]. http://www.nas.nasa.gov/Resources/Software,2011-5-30.
    [122] Corporation Y E. WT1600Digital Power Meter[EB/OL].http://tmi.yokogawa.com/products/digital-power-analyzers,2011-5-30.
    [123] Kogge P M. Architectural Challengies at the Exascale Frontier (invited talk)[R].2008.
    [124] Raponi P G, Petrini F, Walkup R, et al. Characterization of the communicationpatterns of scientific applications on Blue Gene/P[C]//In25th IEEE InternationalSymposium on Parallel and Distributed Processing.,2011:1017～1024.
    [125] Turner J A. The Los Alamos Roadrunner Petascale Hybrid Supercomputer-Overview of Applications, Results, and Programming[R]. Los Alamos NationalLaboratory,2008.
    [126] J C R, Miller P R, Yantchev J T. High Performance Communications inProcessor Networks[C]//The16th International Symposium on ComputerArchitecture.,1989:150～157.
    [127] Dally W J, Towles B. Principles and Practices of Interconnection Networks[M].Elsevier,2004:
    [128] Xie M, Lu Y, Wang K, et al. Tianhe-1A Interconnect and Massage-PassingServices[J]. IEEE MICRO,2012,:8～20.
    [129] SLURM[EB/OL]. http://www.llnl.gov/linux/slurm,2011-5-30.
    [130]袁国兴,张宝琳.一类流体力学问题的并行计算[J].计算物理,1994,11(2):483～488.
    [131]袁仙春,廖振民.多流体网格法的并行计算[J].计算机工程与科学,1984,(4):
    [132]莫则尧,符尚武,沈隆钧.二维三温流体力学数值模拟程序的并行化[J].计算物理,2000,17(6):625～632.
    [133]徐小文,莫则尧.一种新的并行代数多重网格粗化算法[J].计算数学,2005,27(3):325～336.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700