用户名: 密码: 验证码:
DNA数据存储技术研究进展
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research Progress on DNA Data Storage Technology
  • 作者:张淑芳 ; 彭康 ; 宋香明 ; 张子昱 ; 王汉杰
  • 英文作者:ZHANG Shu-fang;PENG Kang;SONG Xiang-ming;ZHANG Zi-yu;WANG Han-jie;School of Electrical and Information Engineering,Tianjin University;Tianjin International Engineering Institute;School of Life Sciences,Tianjin University;
  • 关键词:数据存储 ; DNA ; 压缩编码 ; 纠错算法 ; 存储密度
  • 英文关键词:Data storage;;DNA;;Compression coding;;Error correction algorithm;;Storage density
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:天津大学电气自动化与信息工程学院;天津大学国际工程师学院;天津大学生命科学学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 语种:中文;
  • 页:JSJA201906002
  • 页数:8
  • CN:06
  • ISSN:50-1075/TP
  • 分类号:27-34
摘要
随着计算机技术和网络技术的飞速发展,由此产生的海量数据给传统数据存储方式带来了巨大挑战,因此研究人员开始致力于寻找新一代存储方案。脱氧核糖核酸(Deoxyribonucleic Acid,DNA)作为天然的遗传信息存储介质,具有存储容量大、能耗低和寿命长等优点,有效克服了传统硬盘和计算机存储等方式的不足,故DNA数据存储技术成为信息技术和生物技术交叉领域的研究热点。文中综述了DNA数据存储技术的研究进展,首先对DNA及其存储的理论框架进行了介绍;然后详细阐述了DNA数据存储中的编码技术:二进制数据的压缩编码算法、纠错算法以及二进制数据到DNA 4种碱基的转换方法;最后对现阶段已有的DNA存储方案进行了分析,并对DNA数据存储研究存在的挑战进行了讨论。
        With the rapid development of computer technology and network technology,the massive generated data have brought great challenges to traditional data storage methods,so researchers begin to focus on finding a new generation of storage scheme.As a natural genetic information storage medium,Deoxyribonucleic acid(DNA) has advantages of large storage capacity,low energy consumption and long life,which effectively overcome the shortcomings of traditional storage methods,such as hard disk and computer storage.The DNA data storage method has become a research hotspot in the intersection field of information and biotechnology.This paper reviewed the research progress on DNA data stora-ge technology.Firstly,DNA and its theoretical framework of storage are introduced.Then,the coding technologies in DNA data storage are elaborated,which includes compression coding algorithm of binary data,error correction algorithm and conversion method from binary data to four bases of DNA.Finally,the existing DNA storage schemes are ana-lyzed,and the challenges in DNA data storage research are discussed.
引文
[1] ZHIRNOV V,ZADEGAN R M,SANDHU G S,et al.Nucleic Acid Memory[J].Nature Materials,2016,15(4):366-370.
    [2] GODA K,KITSUREGAWA M.The History of Storage Sys- tems[J].Proceedings of the IEEE,2012,100(13):1433-1440.
    [3] PANDA D,MOLLA K A,BAIG M J,et al.DNA as a digital information storage device:hope or hype?[J].Biotech,2018,8(5):239-247.
    [4] WILLIAMS E D,AYRES R U,HELLER M.The 1.7 Kilogram Microchip:Energy and Material Use in the Production of Semiconductor Devices[J].Environmental Science & Technology,2004,38(6):1915-1916.
    [5] EXTANCE A.How DNA could store all the world’s data[J].Nature,2016,537(7618):22-24.
    [6] BORNHOLT J,LOPEZ R,CARMEAN D M,et al.A DNA- Based Archival Storage System[C]//International Conference on Architectural Support for Programming Languages & Ope-rating Systems.2016.
    [7] HAKAMI H A,CHACZKO Z,KALE A.Review of Big Data Storage Based on DNA Computing[C]//Computer Aided System Engineering.IEEE,2015.
    [8] CHURCH G M,GAO Y,KOSURI S.Next-Generation Digital Information Storage in DNA[J].Science,2012,337(6102):1628.
    [9] SHIPMAN S L,NIVALA J,MACKLIS J D,et al.CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria[J].Nature,2017,547(7663):345-349.
    [10] ERLICH Y,ZIELINSKI D.DNA Fountain enables a robust and efficient storage architecture[J].Science,2017,355(6328):950-954.
    [11] GOLDMAN N,BERTONE P,CHEN S,et al.Towards practical,high-capacity,low-maintenance information storage in synthesized DNA[J].Nature,2013,494(7435):77-80.
    [12] BORNHOLT J,LOPEZ R,CARMEAN D M,et al.Toward a DNA-Based Archival Storage System[J].IEEE Micro,2016,PP(99):637-649.
    [13] CASTILLO M.From Hard Drives to Flash Drives to DNA Drives[J].American Journal of Neuroradiology,2014,35(1):1-2.
    [14] BONNET J,COLOTTE M,COUDY D,et al.Chain and conformation stability of solid-state DNA:Implications of room temperature storage[J].Nucleic Acids Research 2009;38(5):1531-1546.
    [15] WANG S W.DNA storage with error correction mechanism [D].Changsha:National University of Defense Technology,2014.(in Chinese)王诗薇.带纠错机制的DNA存储[D].长沙:国防科学技术大学,2014.
    [16] ORLANDO L,GLNOLHAC A,ZHANG G,et al.Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse[J].Nature,2013,499(7456):74-78.
    [17] MILLER W,SCHUSTER S C,WELCH A J,et al.Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change[J].Proceedings of The National Academy of Sciences of The United States of America,2012,109(36):E2382-E2390.
    [18] KIM S,SOLTIS D E,SOLTIS P S,et al.DNA sequences from Miocene fossils:an ndhF sequence of Magnolia latahensis(Magnoliaceae) and an rbcL sequence of Persea psedocarolinensis(Lauraceae)[J].American Journal of Botany,2004,91(4):615-620.
    [19] ALLENTOFT M E,COLLINS M,HARKER D,et al.The half-life of DNA in bone:measuring decay kinetics in 158 dated fossils[J].Proceedings Biological Sciences,2012,279(1748):4724-4733.
    [20] HUFFMAN D A.A method for the construction of minimum-redundancy codes[J].Resonance,2006,11(2):91-99.
    [21] GIBSON D G,VENTER J C.Creation of a bacterial cell con- trolled by a chemically synthesized genome.[J].Science,2010,329(5987):52-56.
    [22] SAVITRI P A I,ADIWIJAYA,MURDIANSYAH D T,et al.Digital medical image compression algorithm using adaptive Huffman coding and graph based quantization based on IWT-SVD[C]//International Conference on Information & Communication Technology.2016.
    [23] PADMAVATI S,MESHARAM V.DCT combined with fractal quadtree decomposition and Huffman coding for image compression[C]//International Conference on Condition Assessment Techniques in Electrical Systems.2016.
    [24] AILENBERG M,ROTSTEIN O.An improved Huffman coding method for archiving text,images,and music characters in DNA[J].BioTechniques,2009,47(3):747-754.
    [25] CAIRE G,SHAMAI S,SHOKROLLAHI A,et al.Universal variable-length data compression of binary sources using fountain codes[C]//Information Theory Workshop.IEEE,2016:123-128.
    [26] BLAWAT M,GAEDKE K,HüTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
    [27] BYERS J W,LUBY M,MITZENMACHER M.A digital fountain approach to asynchronous reliable multicast[J].IEEE Journal on Selected Areas in Communications,2002,20(8):1528-1540.
    [28] BING L I,ZHANG L,LIU Y.FPGA hardware implementation of the LZMA compression algorithm[J].Journal of Beijing University of Aeronautics & Astronautics,2015,41(3):375-382.
    [29] LAN C,XU J,ZENG W,et al.Compound image compression using lossless and lossy LZMA in HEVC[C]//IEEE Internatio-nal Conference on Multimedia & Expo.IEEE,2015.
    [30] JI Z,ZHOU J R,ZHU Z X.Bioinformatics Features Based DNA Sequence DATA Compression Algorithm[J].Acta Electronica Sinica,2011,39(5):991-995.(in Chinese)纪震,周家锐,朱泽轩.基于生物信息学特征的DNA序列数据压缩算法[J].电子学报,2011,39(5):991-995.
    [31] PINHO A J,PRATAS D,FERREIRA P J S G.Bacteria DNA sequence compression using a mixture of finite-context models[C]//Statistical Signal Processing Workshop.2011.
    [32] CAO M D,DIX T I,ALLISON L,et al.A Simple Statistical Algorithm for Biological Sequence Compression[C]//Data Compression Conference.2007.
    [33] YIM K Y,YU C S,LI J W,et al.The Essential Component in DNA-Based Information Storage System:Robust Error-Tolerating Module[J].Frontiers in Bioengineering & Biotechnology,2014,2(2):49-53.
    [34] SALOMON D.Data compression:the complete reference[M]//Data Compression:The Complete Reference.New York:Sprin-ger-Verlag,2000.
    [35] SHEN Y F,PAN L.Principle and Method of the Error Detection and Correction of Ternary Hamming Codes [J].Chinese Journal of Computers,2015,38(8):1648-1655.(in Chinese)沈云付,潘磊.三值汉明码检错纠错原理和方法[J].计算机学报,2015,38(8):1648-1655.
    [36] SINGH A K.Error detection and correction by hamming code[C]//International Conference on Global Trends in Signal Processing.2017.
    [37] SONG X M.Research on DNA Information Storage Method Based on Huffman Coding [D].Tianjin:Tianjin University,2018.(in Chinese)宋香明.基于Huffman编码的DNA信息存储方法研究[D].天津:天津大学,2018.
    [38] GRASS R N,HECKEL R,PUDDU M,et al.Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J].Angewandte Chemie International Ed in English,2015,54(8):2552-2555.
    [39] WANG L,WEI Z,YANG W,et al.Multiple channel error-correction algorithms for LCC decoding of Reed-Solomon codes and its high-speed architecture design[J].IET Communications,2017,11(9):1407-1415.
    [40] ALWAN M H,SINGH M,MAHDI H F.Performance comparison of turbo codes with LDPC codes and with BCH codes for forward error correcting codes[C]//Research & Development.2016.
    [41] BALDI M,MATURO N,RICCIUTELLI G,et al.On the error detection capability of combined LDPC and CRC codes for space telecommand transmissions[C]//Computers & Communication.2016.
    [42] CHRISTY,BOGARD,ERIC,et al.DNA media storage[J].Progress in Natural Science,2008,18(5):603-609.
    [43] WONG P C,WONG K K,FOOTE H.Organic data memory using the DNA approach[J].Communications of the ACM,2003,46(1):95-98.
    [44] KASHIWAMURA S,YAMAMOTO M,KAMEDA A,et al.Potential for enlarging DNA memory:the validity of experimental operations of scaled-up nested primer molecular memory[J].Biosystems,2005,80(1):99-112.
    [45] NOZOMU Y,KAZUHIDE S,JUNICHI S,et al.Alignment- Based Approach for Durable Data Storage into Living Organisms[J].Biotechnology Progress,2007,23(2):501-505.
    [46] SCHUSTER S C.Next-generation sequencing transforms today’s biology[J].Nature Methods,2008,5(1):16-18.
    [47] SONG Y,KIM S,HELLER M J,et al.DNA multi-bit non-volatile memory and bit-shifting operations using addressable electrode arrays and electric field-induced hybridization[J].Nature Communications,2018,9(1):1-8.
    [48] HEAVEN D.Now we can store video in living DNA[J].New Scientist,2017,235(3134):11-14.
    [49] YAZDI S M H T,GABRYS R,MILENKOVIC O.Portable and Error-Free DNA-Based Data Storage[J].Scientific Reports,2017,7(1):1-4.
    [50] BLAWAT M,GAEDKE K,HüTTER I,et al.Forward Error Correction for DNA Data Storage [J].Procedia Computer Science,2016,80(5):1011-1022.
    [51] JIANG L,QIU W,ALDIRINI F,et al.Feasibility study of molecular memory device based on DNA using methylation to store information[J].Journal of Applied Physics,2016,120(2):96-100.
    [52] YAZDI S M H T,KIAH H M,GARCIA E R,et al.DNA-Based Storage:Trends and Methods[J].IEEE Transactions on Mole-cular,Biological and Multi-Scale Communications,2015,1(3):230-248.
    [53] TABATABAEI Y S M H,YUAN Y,MA J,et al.A Rewritable,Random-Access DNA-Based Storage System[J].Scientific Reports,2015,5(9):1-10.
    [54] FATIMA A,UL H I,HAIDER A,et al.Trends to store digital data in DNA:an overview[J].Molecular Biology Reports,2018,45(5):1479-1490.
    [55] JAIN S,FARNOUD F,SCHWARTZ M,et al.Duplication-correcting codes for data storage in the DNA of living organisms[J].IEEE Transactions on Information Theory,2016,PP(99):1.
    [56] 张淑芳,宋香明.一种高存储密度的 DNA 信息存储编码方案:201811445344.8[P].2018.
    [57] SILVA P Y D,GANEGODA G U.New Trends of Digital Data Storage in DNA[J].Biomed Research International,2016,2016(5536):1-14.
    [58] CARR P A,CHURCH G M.Genome engineering[J].Nature Biotechnology,2009,27(12):1151-1162.
    [59] SHENDURE J,LIEBERMAN A E.The expanding scope of DNA sequencing[J].Nature Biotechnology,2012,30(11):1084
    [60] PENNISI E.Genome sequencing.Search for pore-fection[J].Science,2012,336(6081):534-537.
    [61] KOSURI S,CHURCH G M.Large-scale de novo DNA synthesis:technologies and applications[J].Nature Methods,2014,11(5):499-507.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700