用户名: 密码: 验证码:
基于特征序列和CGR方法对Rh血型基因特征的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
Rh血型系统作为人类最复杂的血型系统之一,其临床意义仅次于ABO血型系统。在临床输血实践中因ABO、Rh血型不合的妊娠或输血可发生输血反应和新生儿溶血病。本文的主要工作包括以下几个方面:
     1、根据a, t, c, g的化学结构分类,给出了DNA序列的特征序列概念(σ-,τ-和στ-)并推广到蛋白质序列中,从而给出一种数值刻划,将蛋白质序列简化成一个(0,1)序列,并且应用到18条三种结构(α螺旋类,β折叠类,αβ类)的蛋白质序列中,得到了不同结构蛋白质序列的一些二级结构的信息。从另一种角度出发,根据氨基酸分子量与简并度的相关性,给出了DNA序列的特征序列概念(ω-)并推广到蛋白质序列中,通过数值刻划,将蛋白质序列简化成一个(0,1,2)序列,通过比较特征序列的数值刻划图,得出RHD基因和RHCE基因均偏爱使用低分子量且高简并度的氨基酸。
     2、利用基于经典HP模型的蛋白质序列的混沌游走(Chaos game representation,简称CGR)的方法,给出了RHD基因的蛋白质序列的CGR图,可作为蛋白质序列的二级结构的一个特征图谱描述,同时给出了RHD基因的DNA序列的CGR图,并且计算出了RHD基因的相应的马尔可夫两步转移概率矩阵,从而得到RHD基因对编码氨基酸的三联子的第三个碱基的使用偏好性。
     3、提出了一种新的氨基酸编码方法:拟氨基酸编码方法,并且计算了基于拟氨基酸编码方法下的78个人类基因的同义密码子的相对使用度,结果表明这些特征不但与石秀凡等人的研究结果一致,而且更为明显。说明了拟氨基酸编码方法更具合理性和科学性。进而计算了人类Rh血型系统中RHD基因基于拟氨基酸编码方法下的同义密码子相对使用度(QRSCU),以及QRFN3’,得出了RHD基因对密码子的偏好使用性以及后面所接碱基的偏好性。
     本论文的创新点为:
     1、提出了一种新的特征序列(ω-特征序列)。
     2、利用提出的一种新的氨基酸编码方法:拟氨基酸编码方法,建立了QRSCU,并应用于Rh血型系统研究。
The Rh blood type system is one of the most complex of the known human blood type system, the clinical significant of which is only next to the ABO blood type system. It can cause transfusion reaction and hemolytic disease of the feyus and newborn(HDN) because of the pregnancy or blood transfusion of ABO,Rh blood type incompatibility in clinical practice of blood transfusion.
     The main contents are listed as follows:
     1. According to the chemical structure classification of a, t, c and g, these concepts ofσ-,τ- andστ-characteristic sequences have been presented and promoted into protein sequence, furthermore, the graph representation is introduced, and protein sequence is simplified to be (0,1) sequence, these characteristic sequences are applied to 18 protein sequences of three different kinds of structure(αhelix,βpucker,αβ), so we obtain some structural characteristic of protein sequences of different structure. From another point of view, according to the relation of the molecular weight and degeneracy of the amino acids, another concept of DNA characteristic sequences(ω-)is presented, and promoted into protein sequence, furthermore, through the graph representation, protein sequence is simplified to be (0,1,2) sequence, we can know that both RHD and RHCE genes all prefer to use the amino acids with small molecular weight and high degeneracy, by comparing the graph representation of the RHD and RHCE characteristic sequences.
     2. This paper use the Chaos game representation (CGR) of protein sequences based on the detained HP model, it gives the CGR of the protein sequences of RHD gene, which can be regarded as characters map of the protein’s secondary structure of RHD gene. This paper still gives the CGR of the DNA sequences of RHD gene, it also compute the corresponding probability matrix for the second-order Markov Chain model. From the probability matrix, we can see the usage preference that the third base of the codons in the DNA sequence of the RHD gene.
     3. This paper propose another genomic genetic codes: quasi-amino acids coding.The result of the relative usage degree of 78 human genes based on the quasi-amino acids coding indicates that these characteristic is not only consistent with the result of Shi Xiufan et al but more obvious, which show that the quasi-amino acids coding is more rational and scientific.It also compute the preference and the QRFN3’of synonymy codon based on quasi-amino acids coding of RHD gene,we obtain the preference to the synonymy codon of RHD gene and the preference to the base behind the synonymy codon.
     The innovations are listed as follows:
     1. Proposing a new kind of characteristic sequence(ω-).
     2. Proposing a new kind of genomic genetic codes: quasi-amino acids coding, give out QRSCU and use it to research on the Rh blood type system.
引文
[1]孙啸.生物信息学基础[M].北京:清华大学出版社,2005.
    [2]赵国屏.生物信息学[M].北京:科学出版社,2002.
    [3]孙向东,刘拥军.蛋白质结构预测[M].北京:科学出版社,2008.
    [4]Trifonov E N.Earliest pages of bioinformatics[M].Bioinformatics,2000,16(1):5-9.
    [5]Backofen R,Gilbert D.Bioinfoematics and constraints.Constraints,2001,6:141-156.
    [6]钟杨,张亮,赵琼.简明生物信息学[M].北京:高等教育出版社,2001.
    [7]郝柏林.生物信息学[J].中国科学院院刊,2000,(4):260—264.
    [8]凃俐兰.基于快速沃尔什变换的生物序列相似性比对[D]:[硕士学位论文].武汉:华中科技大学计算数学,2004.
    [9]许忠能.生物信息学[M].北京:清华大学出版社,2008.
    [10]蒋彦,王小行,曹毅等.基础生物信息学及应用[M].北京:清华出版社,2003.1—16,102—186.
    [11]蔡禄编著.生物信息学教程[M].北京:化学工业出版社,2006.1—62.
    [12]塞图宝,梅丹尼斯.计算分子生物学导论[M].朱浩等译.北京:科学出版社,2003.1—20.
    [13]李红.中心法则图解[EB/OL]. http://ebio.wjszzx.cn/html/2006-08/3446p15.htm, 2006—08—18
    [14]Dan E.Krane & Michael L.Raymer生物信息学概论[M].北京:清华大学出版社.
    [15]贺平安.DNA序列及蛋白质序列的分析与比较[D]:[博士学位论文].大连:大连理工大学,2004.
    [16]黄延超.生物序列的非线性关联研究[D]:[博士学位论文].武汉:华中科技大学生物医学工程,2003.
    [17]乔纳森·佩夫斯纳.生物信息学与功能基因组学[M].化学工业出版社.
    [18].邹望远,郭曲练.Rh血型与输血的研究进展[J].国外医学输血及血液学分册2004.
    [19].邵超鹏.RHD研究进展和中国人RHD研究现状分析[J].中国输血杂志2003.
    [20].Westhoff CM. Rh血型系统:下一个十年的新面貌[J].国外医学输血及血液学分册2005,28(3):270-273.
    [21].孙志刚,丁梅,王保捷.Rh血型系统的分子生物学研究进展[J].法医学杂志,2005,21(1):65-67.
    [22].兰炯采,周华友,庞桂芝等.RHCE基因结构研究[J].中国输血杂志,2005,18(5):368-371.
    [23].Belinda K.Singletion,Carole A.Green,Neil D.Avent,Peter G.The presence of an RHD pseudogene containing a 37 base pair duplication and a nonsense mutation in Africans with the Rh D-negative blood group phenotype[J].BLOOD,2000,95(1):12-18.
    [24].A. Nandy, S.C.Basak,Simple numerical descriptor for quantifying effect of toxic subastances on DNA sequences ,J. Chem. Inf. Comput. Sci. 40(2000),915-919.
    [25].A. Nandy, Investigation on evolutionary changes in basic distributions in gene sequences, Internet Elec. J. Mol. Des. 1(2002)10,545-558.
    [26].A.Nandy, P. Nandy, S.C. Basak, Quantitative descriptor for SNP related gene sequences, Internet Elec. J. Mol. Des. 1(2002),367-373.
    [27].A.Nandy, P. Nandy, On the uniquences of quantitative DNA difference descriptors in 2Dgraphical representation models[J], Chemical Physics Letters, 368(2003),102-107.
    [28].X.F.Guo, A.Nandy, Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy[J], Chemical Physics Letters,369(2003),361-366.
    [29].M.Randic a, M. Vracko, N. Lers, D. Plavsic, Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation[J].Chemical Physics Letters,371(2003),202-207.
    [30].朱平,管维红,高雷等.基于氨基酸特征序列的蛋白质结构分析[J].生物信息学,2008,6(3):106-108.
    [31].陈志华,陈惟昌,邱红霞,王自强.氨基酸的分子结构与遗传密码及二维集合分类[J].生物物理学报,2001,1(17):187-194.
    [32]. M.Q.Zhang.Identification of protein coding regions in the human genomeby quadratic discriminant analysis[J].Proc.Natl.Acad.Sci.USA, 94(1997),565-568.
    [33].陈惟昌,陈志华,王自强等.线粒体遗传密码及基因组遗传密码的对称分析[J].生物物理学报,2002,18(1):87-94.
    [34].马飞,武耀廷,许晓风.遗传密码子和氨基酸若干物理化学特性的相关性研究[J].安徽农业大学学报,2003,30(4):439-445.
    [35].Jonas S. Almeida, Jo a~ ao A.Carrico, Peter A.Noble, Madilyn Fletcher. Analysis of genomic sequences by Chaos Game Representation[J].Bioinformatics, 2001,17(5):429-437.
    [36].Zu-Guo Yu,Vo Anh,Ka-Sing Lau.Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses[J].Journal ofTheoretical Biology 226(2004)341-348.
    [37].李道国,苗夺谦等.粒度计算研究综述[J].计算机科学2005,32(9).
    [38].Jun Wang,Wei Wang,A computational approach to simplifying the protein folding alphalet[J]. Nature Structutural Biology, 1999,6(11): 1033-1038.
    [39].王守源,李晓琴,罗辽复,氨基酸分类与蛋白质二级结构相关性[J].内蒙古大学学报(自然科学版)2002,33(4),423-427.
    [40].沈同王镜岩.生物化学[M].北京:高等教育出版社,1990.
    [41].R.B.Lyngso,C.N.S. Pedersen,Protein folding in the 2D HP moel[C].BRICS,1999, 1-15.
    [42].Hao B, Gong W, Ferguson T K, et al. A new UAG-encoded residue in the structure of a methanogen methyltransferase[J]. Science, 2002, 296(5572):1462-1466.
    [43].Srinivasan G, James C M, Krzycki J A. pyrrolysine encoded by UAG in Archaca: charging of a UAG-decoding specialized tRNA. Science, 2002, 296(5572): 1459-1462.
    [44].明镇寰,钟立人.第22种氨基酸和无义密码子的重新诠释[J].生物化学与生物物理进展, 2002, 29(6):831-833.
    [45].Zhu Ping,Tang Xuqing, Xu Zhenyuan.The structure analysis of protein sequences based on the quasi-amino acids code[J]. 2009 Chiese physics B 1 363.
    [46].朱平,高雷,徐振源.基于拟氨基酸分类下的同义密码子的偏好性和仍与结合强度密切相关[J].物理学报,2009(6).
    [47].石秀凡,黄京飞,梁宠荣,柳树群,谢君,刘次全.人类基因中同义密码子的偏好与密码子-反密码子间的结合强度密切相关吗?[J].科学通报,2000,45(23):2520-2525.
    [48].石峰,莫忠息.生物遗传信息的传输模型及分析[J].数学杂志,2001,21(1):65-70.
    [49].Neil D.Avent and Marion E.Reid.The Rh blood group system:a review [ J ].BLOOD,2000,95(2):375-387.
    [50].史晓红.基于氨基酸分类的基本氨基酸秩序的研究[J].生物数学学报,2005,20(4):491-495.
    [51].靳利霞,唐焕文.氨基酸序列的特征研究[J].计算机与应用化学,2003,20(1):1-5.
    [52]. Bohr Herik, Bohr Jakob, Brunak Seren et.al.Protein secondary structurl and homology by neural network[J].The ahelices in Rhodopsin,1988,241(12):223-228.
    [53].兰炯采,王从容,魏亚明等.中国汉族Rh血型基因多态性观察[J].Journal of Experimental Hematology,2003,11(6):642-645.
    [54].J.M.Guterrez, M.A.Rodriguze, G.Abramson. Multifractal analysis of DNA sequences using a novel chaos-game representation[J].Physica A 300(2001):271-284.
    [55].Jie Feng,Tian-Ming Wang.A 3D graphical representation of RNA secondary structures based on chaos game representation.Chemical Physics 454(2008):355-361.
    [56].Z.G.Yu, V.V.Anh, J.A.Wanliss, S.M.Watson.Chaos game representation of the D st index and prediction of genomagnetic storm events[J].Chaos,Solitons and Fractals 31(2007)736-746.
    [57].Yingwei Wang, Kathleen Hill, Shiva Singh, Lila Kari. The spectrum of genomic signatures: from dinucleotides to chaos game representation.Gene346(2005):173-185.
    [58].Peter Tino.Multifractal properties of Hao’s geometric representation of DNA sequences[J].Physica A 304(2002):480-494.
    [59].Soumalee Basu,Archana Pan,Chitra Dutta,Jyotirmoy Das.Chaos game representation of proteins.Journal of Molecular Graphics and Modelling[J].1997,15:279-289.
    [60].符维娟,汪源源,卢大儒.基因组序列CGR图形表示的多重分形分析.生物医学工程学杂志[J],2007,24(3):522-525.
    [61].H.Joel Jeffrey.Chaos game representation of gene structure[J].Nucleic Acids Research,18(8) :2163.
    [62].Ohno S.Universal rule for coding sequence construction: TA/CG deficiency-TG/CT excess. Proc Natl Acad Sci USA,1988,85: 9630~9634
    [63].Karlin S, Mrazek J.What drives codon choices in human genes? [J]. Mol Biol,1996,262: 459~472
    [64].Campbell A, Mrazek J, Karlin S. Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA[J]. Proc Natl Acad Sci USA, 1999,96: 9184~9189
    [65].石秀凡,黄京飞,柳树群等.人类基因同义密码子偏好的特征以及与基因GC含量的关系[J].生物化学与生物物理学进展, 2002,29(3):411-413.
    [66].顾万君,马建民,周童等.不同结构的蛋白编码基因的密码子偏性研究[J].生物物理学报, 2002, 18(1):81-86.
    [67].Clay O, Caccio S, Bernardi G, et al. Human coding and noncoding DNA sequences: compositional correlations[J]. Mol Phylogenet Evol,1996,5: 2~12.
    [68].Shpaer E G. Constraints on coden context in Escherichia coli genes.Their possible role in modulating the efficiency of translation[J].Mol Biol,1986,188:555~564.
    [69].Bernardi G. The human genome:organization and evolution history[J].Annu Rev Genetics,1995,9:445~476.
    [70].Xie T,Ding D.The relationship between synonymous codon usage and proteinstructure[J]. FEBS Letters,1998,434:93-96.
    [71].Oresic M,Shalloway D.Specific correlations between relative synonymous codon usage and protein secondary structure[J].Journal of Molecular Biology,1998,281:31-48.(in Chinese).
    [72].Komar A A,Lesnik T,Reiss C.Synonymous codon substitutions affect ribosome traffic and protein folding during in vitro translation[J]. FEBS Letters,1999,462:387-391.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700