一种基于斯格明子介质的高效存内计算框架

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

一种基于斯格明子介质的高效存内计算框架

详细信息查看全文 | 推荐本文 |

英文篇名：An Efficient Processing In Memory Framework Based on Skyrmion Material
作者：刘必成 ; 顾海峰 ; 陈铭松 ; 谷守珍 ; 陈闻杰
英文作者：Liu Bicheng;Gu Haifeng;Chen Mingsong;Gu Shouzhen;Chen Wenjie;Shanghai Key Laboratory of Trustworthy Computing(East China Normal University);
关键词：斯格明子 ; 非易失性存储 ; 存内计算 ; 赛道存储 ; 地址映射
英文关键词：Skyrmion;;non-volatile memory;;processing in memory(PIM);;racetrack memory;;address mapping
中文刊名：JFYZ
英文刊名：Journal of Computer Research and Development
机构：上海市高可信计算重点实验室(华东师范大学);
出版日期：2019-04-15
出版单位：计算机研究与发展
年：2019
期：v.56
基金：国家自然科学基金项目(61872147,61702187);; “核高基”国家科技重大专项基金项目(2017Z01038102-002);; 中央高校基本科研业务费专项资金项目;; 上海市自然科学基金项目(15ZR1410000)~~
语种：中文;
页：JFYZ201904013
页数：12
CN：04
ISSN：11-1777/TP
分类号：124-135

摘要

存内计算(processing in memory, PIM)作为一种新兴的技术,支持数据在存储单元内就地处理,减少了数据的移动并增加了数据的并行处理,在一定程度上弥补了冯·诺依曼架构的缺陷.和传统易失随机存储介质相比,赛道型内存(racetrack memory, RM)具有密度大、非易失且静态功耗低等特点,支持高效的存内计算.为解决性能与功耗问题,提出了一种新型的基于斯格明子(Skyrmion)介质的非易失性存内计算框架.该框架采用斯格明子赛道内存(Skyrmion-based racetrack memory)作为存储单元,采用斯格明子逻辑门(Skyrmion-based logic gate)构成的加法/乘法器组成计算单元,无须大量CMOS(complementary metal oxide semiconductor)电路辅助,设计复杂度大大降低.同时,通过在电路级优化存储单元读写端口数目与在系统级改进内存地址映射方式,大幅提高该框架的运行效率.实验结果表明:相比基于磁畴壁(domain-wall)的非易失性存内计算框架,提出的框架在运行时间上节省了48.1%,同时在能耗上节省了42.9%.
As a new computing paradigm, processing in memory(PIM) allows the parallel computation in both processors and memories, which drastically reduce the movements between computation units and storage units. Therefore, PIM can be considered as an efficient technology to somewhat address the shortcomings of the von neumann architecture. Compared with traditional random access memories, racetrack memory has many merits including high density, non-volatility, and low static power. Therefore, it can be used for efficient PIM computing. To address the shortages of domain-wall based PIM, this paper proposes a novel PIM framework based on the Skyrmion material. In this framework, we use Skyrmion-based racetrack memories to construct storage units, and use Skyrmion-based logic gates to compose both adders and multipliers for the computation units. Since our framework does not need CMOS(complementary metal oxide semiconductor) circuits to assist the underlying computation unit construction, the design complexity is significantly reduced. Meanwhile, based on our proposed optimization methods for read and write operations at the circuit layer and address mapping mode of the memory at the system level, the performance of our framework is drastically improved. Experimental results show that compared with domain-wall based PIM framework, our approach can achieve 48.1% time improvement and 42.9% energy savings on average.

引文

[1]Luo Le,Liu Yi,Qian Depei.Survey on in-memory computing technology[J].Journal of Software,2016,27(8):2147-2167(in Chinese)(罗乐,刘轶,钱德沛.存内计算技术研究综述[J].软件学报,2016,27(8):2147-2167)
    [2]Gokhale M,Holmes B,Iobst K.Processing in memory:The Terasys massively parallel PIM array[J].Computer,1995,28(4):23-31
    [3]Zhang Dongping,Jayasena N,Lyashevsky A,et al.TOP-PIM:Throughput-oriented programmable processing in memory[C]//Proc of the 23rd ACM Int Symp on HighPerformance Parallel and Distributed Computing.New York:ACM,2014:85-98
    [4]Akin B,Franchetti F,Hoe J C.Data reorganization in memory using 3D-stacked DRAM[J].ACM SIGARCHComputer Architecture News,2015,43(3):131-143
    [5]Joshi M,Zhang Wangyuan,Li Tao.Mercury:A fast and energy-efficient multi-level cell based phase change memory system[C]//Proc of the 17th High Performance Computer Architecture(HPCA).Piscataway,NJ:IEEE,2011:345-356
    [6]Kültürsay E,Kandemir M,Sivasubramaniam A,et al.Evaluating STT-RAM as an energy-efficient main memory alternative[C]//Proc of the Performance Analysis of Systems and Software(ISPASS).Piscataway,NJ:IEEE,2013:256-267
    [7]Lee B C,Ipek E,Mutlu O,et al.Architecting phase change memory as a scalable DRAM alternative[J].ACMSIGARCH Computer Architecture News,2009,37(3):2-13
    [8]Parkin S,Yang Seehun.Memory on the racetrack[J].Nature Nanotechnology,2015,10(3):195-198
    [9]Mao Mengjie,Wen Wujie,Zhang Yaojun,et al.Exploration of GPGPU register file architecture using domain-wall-shiftwrite based racetrack memory[C]//Proc of the 51st Annual Design Automation Conf.New York:ACM,2014:1-6
    [10]Hu Qingda,Sun Guangyu,Shu Jiwu,et al.Exploring main memory design based on racetrack memory technology[C]//Proc of the 26th Great Lakes Symp on VLSI.Piscataway,NJ:IEEE,2016:397-402
    [11]Yu Hao,Wang Yuhao,Chen Shuai,et al.Energy efficient in-memory machine learning for data intensive imageprocessing by non-volatile domain-wall memory[C]//Proc of the 19th Asia and South Pacific Design Automation Conf(ASP-DAC).Piscataway,NJ:IEEE,2014:191-196
    [12]Fert A,Cros V,Sampaio J.Skyrmions on the track[J].Nature Nanotechnology,2013,8(3):152-156
    [13]Tomasello R,Martinez E,Zivieri R,et al.A strategy for the design of Skyrmion racetrack memories[J/OL].Scientific Reports,2014:Article number 6784.[2018-08-02].https://www.nature.com/articles/srep06784
    [14]Zhang Xichao,Ezawa M,Zhou Yan.Magnetic Skyrmion logic gates:Conversion,duplication and merging of Skyrmions[J/OL].Scientific Reports,2015:Article number 9400.[2018-08-02].https://www.nature.com/articles/srep09400
    [15]Xing Xiangjun,Pong Philip,Zhou Yan.Skyrmion domain wall collision and domain wall-gated Skyrmion logic[J].Physical Review B,2016,94(5):054408
    [16]Sun Guangyu,Zhang Chao,Li Hehe,et al.From device to system:Cross-layer design exploration of racetrack memory[C]//Proc of the Design,Automation and Test in Europe.San Jose,CA:EDA Consortium,2015:1018-1023
    [17]Jeong M K,Yoon D H,Sunwoo D,et al.Balancing DRAMlocality and parallelism in shared memory CMP systems[C]//Proc of the High Performance Computer Architecture(HPCA).Piscataway,NJ:IEEE,2012:1-12
    [18]Kang Wang,Huang Yangqi,Zheng Chentian,et al.Voltage controlled magnetic Skyrmion motion for racetrack memory[J/OL].Scientific Reports,2016:Article number 23164.[2018-08-02].https://www.nature.com/articles/srep23164
    [19]Binkert N,Beckmann B,Black G,et al.The Gem5simulator[J].ACM SIGARCH Computer Architecture News,2011,39(2):1-7
    [20]Li Sheng,Ahn J H,Strong R D,et al.McPAT:An integrated power,area,and timing modeling framework for multicore and manycore architectures[C]//Proc of the Annual IEEE/ACM Int Symp on Microarchitecture.Piscataway,NJ:IEEE,2009:469-480
    [21]Dong Xiangyu,Xu Cong,Jouppi N,et al.NVSim:Acircuit-level performance,energy,and area model for emerging non-volatile memory[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2012,31(7):994-1007

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700