用户名: 密码: 验证码:
两种特殊类型蛋白质功能残基的预测与生物序列比对
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息学对制约生物学发展的诸多问题的探索性研究和得出的具有指导意义的研究成果,受到了生物学、信息学及其相关学科的高度关注,同时也促进了生物信息学自身的快速发展。目前,生物信息学的基本框架已经构建,其着力解决的科学问题也更加清晰、明确。但从学科自身发展的逻辑严密和求解复杂问题的实际效果思考,生物信息学尚有许多工作要做,有些方法需要进一步完善、有些方法则需要设计、开发和创建。
     本文的主要结果由以下三部分组成:
     第一部分:推广了SPA算法,使之适用于一般罚分(或得分)矩阵,并做出了数学证明。推广后的SPA算法有了更广的应用范围,可以根据实际问题的需要来调整打分矩阵,进而得到需要的比对结果。这就缓解了使用Hamming矩阵时经常出现的最优比对不唯一的问题,为设计基于SPA的多重序列比对算法打下了基础。
     第二部分:设计了基于序列的蛋白质功能残基的预测方法,实现了蛋白质功能残基的大规模快速预测,为蛋白质功能实现的机理研究提供参考与指导。预测的结果可用于蛋白质功能残基实验测定的初步筛选,大大地节省了时间、人力和物力,提高了工作效率。使用特征选择算法对设计的特征进行筛选,减少了输入的特征维数,提高了预测的速度;还可以从选中的特征中总结出一些有意义的生物结论。
     (1)开发了基于序列的预测酶催化残基的方法CRpred。CRpred的预测精度超过了现有的基于序列的预测方法,且与当前的基于结构的预测方法的预测精度相当。对选中特征的分析表明:一些氨基酸(His,Cys,Asp,Arg,Glu,Tyr)有着较高的催化倾向性;而另外一些氨基酸(Val,Ala,Ile,Pro,Leu,Met)则不容易成为催化残基;甘氨酸(Gly)能够为催化部位提供柔韧性;残基的保守性之于预测催化残基至关重要,催化残基通常比一般的残基更保守,对催化倾向性较高的氨基酸保守的残基更有可能成为催化残基;催化残基与特定的序列模式,如CysXXCys,AspXLysXXAsn等相关联;虽然催化残基偏爱一个相对疏水的大环境,但在局部范围内,它们通常被一些亲水残基包围。
     (2)采用基于原子间距离的定义方式定义RNA绑定残基,设计了基于序列的预测蛋白质的RNA绑定残基的方法RBRpred。相对已有的基于序列的预测方法,RBRpred的预测精度有了提升。通过特征选择得到以下结论:侧链带正电的精氨酸(Arg)和赖氨酸(Lys),容易与带负电的RNA的磷酸基团相互吸引,结合形成稳定结构;甘氨酸(Gly)相对其他的氨基酸体积较小,可以增加RNA绑定部位的柔韧性:侧链带负电的谷氨酸(Glu),以及疏水的亮氨酸(Leu)、缬氨酸(Val)、丙氨酸(Ala)和苯丙氨酸(Phe)不容易出现在RNA绑定部位。序列保守性对蛋白质的RNA绑定残基的预测非常重要。在三种二级结构中,Coil结构的残基,尤其是在较长Coil片段中的残基有着更好的柔韧性,容易与RNA分子发生相互作用:而Helix结构则相反,结构比较稳固,位于其中的残基较难成为RNA绑定残基。相对溶剂可及面积较大的残基更有可能成为RNA绑定残基。
     第三部分:将广义纠错码应用于DNA计算。针对DNA计算中可能出现的突变误差问题,设计了一个可自动纠错的DNA操作系统。提出了解决DNA计算中突变误差纠正问题的一种方案。
Bioinformatics concentrates on solving the problems constraining the development of biology. The progress made so far has been encouraging, and a number of results of great interest to those in biology, informatics and other subjects have been made, in addition to facilitating the rapid development of bioinformatics itself. At present, the basic framework of bioinformatics has been established, and the main problems to be solved are better-defined. However, considering that the internal logic of bioinformatics must be rigorously maintained as the subject evolves, and keeping in mind the goal of actual solutions to complex problems, it is clear that much still remains to be done. New methods of solving synchronous practical problems are desired, and the existing methods must be further improved. The focus is on problems related to biological sequence alignment, and protein functional residue prediction.
     This thesis contains three main results:
     First, the SPA algorithm is extended to be applicable to a general penalty/scoring matrix and the mathematical proof is given. This extended SPA algorithm has a wider range of uses. By adjusting the penalty/scoring matrix in terms of practical demands, the algorithm can provide an appropriate alignment. It also reduces the number of cases in which the optimal alignment may not be exclusive when the Hamming matrix is used, and therefore may be used in the design of SPA-based multiple sequence alignment algorithms.
     Second, sequence-based methods for the large-scale fast prediction of protein functional residues are designed. This work enhances the fundamental understanding of how proteins perform their functions, and can be used to screen possible functional residues for experimental determination. Since experiments can be costly and time consuming, screening will save time, labor and material resources. Feature selection is performed, not only reducing the dimensionality of the input and decreasing the computational time, but also revealing the biological meaning inside features.
     (1) A sequence-based catalytic residue predictor called "CRpred" is proposed, with predictions of quality comparable to modern structure-based methods, and exceeding the quality of state-of-the-art sequence-based methods. This analysis, performed on selected features, indicates the following four characteristics: a) Amino acids are characterized by varied propensities to become catalytic residues, from high (His, Cys, Asp, Arg, Glu and Tyr) to low (Val, Ala, Ile, Pro, Leu and Met), with glycine (Gly) providing flexibility for catalytic sites; b) The most important factor contributing towards accurate predictions is residue conservation. Catalytic residues, irrespective of type, tend to be more conserved compared to the general population of residues. Highly conserved amino acids, characterized by high catalytic propensity, are likely to form catalytic sites; c) Certain sequence motifs such as CysXXCys, AspXLysXXAsn, which are associated with catalytic reactions, are found to contribute to the prediction; and d) Although catalytic residues prefer a relatively more hydrophobic neighborhood, they are likely to be surrounded locally (with respect to the sequence) by hydrophilic residues.
     (2) RNA-binding residues are identified according to a distance-based cutoff definition, and a new predictor "RBRpred" is designed which predicts RNA-binding residues from protein sequence, improving quality with respect to the current sequence-based methods. The four findings through feature selection are as follows: a) The positively charged amino acids Arg and Lys show higher propensity to form RNA-binding sites, due to their ability to participate in interactions with the negatively charged phosphate backbone of RNA; the small size of Gly provides flexibility for protein-RNA interactions; and Asp (with its negatively charged side chain) together with several hydrophobic residues (such as Leu, Val, Ala and Phe) are not preferred in RNA-binding sites; b) Sequence conservation plays a fundamental role in predicting RNA-binding residues; c) Coil residues, especially those in long coil segments, are more flexible and can easily interact with RNA; helices, however, are more rigid, and consequently residues in helices have less chance to bind with RNA; and d) Residues with higher relative solvent accessibility are more likely to be in RNA-binding sites.
     Third, generalized error correcting code is applied to DNA computing, with focus on mutation errors in DNA computing, and design of a DNA operating system with error correction in order to solve the Hamiltonian circuit problem.
引文
[1] International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001,409:860-921.
    
    [2] Batzoglou,S. The many faces of sequence alignment. Briefs in Bioinformatics.2005,6:6-22.
    
    [3] Altschul,S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997,25, 3389-3402.
    
    [4] Delcher,A.L. et al. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res., 2002, 30:2478-2483.
    
    [5] Brudno,M. and Morgenstern,B. Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics, 2003,4:66-77.
    
    [6] Morgenstern,B. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.Bioinformatics, 1999, 15:211-218.
    
    [7] Bray,N.,Dubchak,I. and Pachter,L. AVID: A global alignment program. Genome Res., 2003,13:97-102.
    
    [8] Corpet,F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res., 1988, 16:10881-10890.
    
    [9] Richard,D. et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge CB2 1TN, UK: Cambridge University Press, 1998.
    
    [10] Hohl,M., Kurtz,S. and Ohlebusch,E. Efficient multiple genome alignment. Bioinformatics, 2002, 18:S312-S320.
    
    [11] Bray,N. and Pachter,L. MAVID: constrained ancestral alignment of multiple sequences. Genome Res., 2004, 14:693-699.
    [12] Edgar,R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acid Res., 2004,32:1792-1797.
    
    [13] Floudas,C.A. Computational methods in protein structure prediction. Biotechnology and Bioengineering. 2007,97(2):207-213.
    
    [14] Kopp,J. et al. Assessment of CASP7 predictions for template-based modeling targets. Proteins. 2007,69(Suppl 8):38-56.
    
    [15] Ganapathiraju,M.K. et al. Characterization of protein secondary structure. IEEE Signal Process Mag. 2004,21:48-87.
    
    [16] Kurgan,L. et al. Real value predictions in protein structure designs, quality, and applications. Current Bioinformatks, 2008,3(3): 183-196.
    
    [17] Chea,E. et al. How accurate and statistically robust are catalytic site predictions based on closeness centrality? BMC Bioinformatics, 2007, 8: 153.
    
    [18] Youn,E. et al. Evaluation of features for catalytic residue prediction in novel folds.Protein Sri., 2007,16,216-226.
    
    [19] Wang,L. and Brown,S.J. BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res., 2006, 34,W243-W248.
    
    [20] Harris,R. et al. Automated prediction of ligand-binding sites in proteins. Proteins.2008,70:1506-1517.
    
    [21] Betel,D. et al. Structure-templated predictions of novel protein interactions from sequence information. PLoS Computational Biology. 2007, 3(9): 1783-1789.
    
    [22] Burge,C. and Karlin,S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997, 268:78-94.
    
    [23] Lukashin,A.V., Borodovsky,M. GeneMark.hmm: new solutions for gene finding.Nucleic Acids Res. 1998,26:1107-1115.
    [24]Majoros,W.H.,Pertea,M.and Salzberg,S.L.TigrScan and GlimmerHMM:two open source ab initio eukaryotic gene-finders.Bioinformatics.2004,20(16):2878-9.
    [25]Knaehsia,M.著(孙之荣等译).后基因组时代的生物信息学.北京:清华大学出版社,2002.
    [26]许忠能等.生物信息学.北京:清华大学出版社,2008.
    [27]陈凯先,蒋华良等.计算机辅助药物设计-原理、方法及应用.上海:科学技术出版社,2000.
    [28]叶德永.计算机辅助药物设计导论.北京:化学工业出版社,2003.
    [29]Adleman,L.M.Molecular computation of solutions to combinatorial problems.Science.1994,266(5187):1021-1024.
    [30]Liu,Y.et al.DNA solution of graph coloring problem.Journal of Chemical Information and Computer Sciences.2002,42(3):529-534.
    [31]Lipton,R.J.DNA solution of hard computation problems.Science.1995,268(4):542-545.
    [32]Boneh,D.et al.Breaking DES using a molecular computer.Technical Report CS-TR -489-95,Princeton University,1995.
    [33]Stojanovic,M.N.and Stefanovic,D.A deoxyribozyme-based molecular automaton.Nature Biotechnology.2003,21(9):1069-1075.
    [34]Shen,S.et al.Super Pairwise Alignment(SPA):A new approach to pairwise alignment with ultra-fast speed for homologous sequences.J.Comput.Biol.2002,9:477-486.
    [35]Hollstein,M.,Sidransky,D.and Vogelstein,B.p53 Mutations in human cancers.Science.1991,253:49-53.
    [36]刘组洞.遗传学.北京:高等教育出版社,1991.
    [37] Mount,D.W. Bioinformatics-sequence and genome analysis. Cold Spring Harbor Laboratory Press, 2002.
    
    [38] Needleman,S.B. and Wunsch,C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J.Mol.Biol., 1970,48:443-453.
    
    [39] Smith,T.F. and Waterman,M.S. Identification of common molecular subsequences. J.Mol.Biol., 1981,147:195-197.
    
    [40] Hu,G.Y. et al. SGA: a grammar-based alignment algorithm. Computer Methods and Programs in Biomedicine, 2007,86(1): 17-20.
    
    [41] Shen,S.Y. et al. SMA: an efficient tool for large-scale multiple alignment. Proceeding of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference Shanghai, China, 2005, September, 1-4.
    
    [42] Shen,S. Wang,K. Hu,G. et al. On the Alignment Space. Final Program and Abstract Book of IEEE EMBC'05. 353.
    
    [43] Shen,S. and Tuszynski,J. Theory and Mathematical Methods for Bioinformatics.Springer-Verlag Berlin Heidelberg, 2008.
    
    [44] Hunter T. Protein kinases and phosphatases: the yin and yang of protein phos-phorylation and signaling.Cell, 1995, 80(2):225-236.
    
    [45] Berg J.S., Powell B.C. and Cheney R.E. A millennial myosin census. Mol Biol Cell, 2001, 12(4): 780-794.
    
    [46] Meighen E.A. Molecular biology of bacterial bioluminescence. Microbiol Rev.,1991,55(1): 123-142.
    
    [47] Capra,J.A. and Singh,M. Predicting functionally important residues from sequence conservation. Bioinformatics, 2007, 23: 1875-1882.
    
    [48] Fischer,J.D. et al. Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics, 2008, 24, 613-620.
    [49] La,D. et al. Predicting protein functional sites with phylogenetic motifs. Proteins,2005,58, 309-320.
    
    [50] Pande.S. et al. Prediction of enzyme catalytic sites from sequence using neural networks. In IEEE symposium on CIBCB' 07, 2007, pp. 247-253.
    
    [51] Sterner,B. et al. Predicting and annotating catalytic residues: an information theoretic approach. J. Comp. Biol, 2007,14, 1058-1073.
    
    [52] Gutteridge,A. et al. Using a neural network and spatial clustering to predict the location of active sites in enzymes. J. Mol. Biol, 2003, 330, 719-734.
    
    [53] Ota,M. et al. Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. J. Mol. Biol, 2003, 327,1053-1064.
    
    [54] Petrova,N.V. and Wu,C.H. Prediction of catalytic residues using support vector machine with selected protein sequence and structural properties. BMC Bioinfor-matics, 2006, 7, 312.
    
    [55] Sacquin-Mora,S. et al. Locating the active sites of enzymes using mechanical properties. Proteins, 2007,67, 350-359.
    
    [56] TorranceJ.W. et al. Using a library of structural templates to recognize catalytic sites and explore their evolution in homologous families. J. Mol. Biol, 2005, 347,565-581.
    
    [57] Bartlett,G. J. et al. Analysis of catalytic residues in enzyme active sites. J. Mol.Biol, 2002, 324, 105-121.
    
    [58] Porter,C. et al. The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res., 2004, 32, D129-D133.
    
    [59] Li,W. and Godzik,A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22,1658-1659.
    [60] Jones,D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol, 1999,292, 195-202.
    
    [61] Sweet,R.M. and Eisenberg,D. Correlation of sequence hydrophobicities measures similarity in three dimensional protein structure. J. Mol. Biol, 1983, 171, 479-488.
    
    [62] Juretic,D. and Lucin,A. The preference functions method for predicting protein helical turns with membrane propensity. J. Chem. Inform. Comput. Sci., 1998,38,575-85.
    
    [63] Kurgan,L. et al. Novel scales based on hydrophobicity indices for secondary protein structure. J. Theor. Biol., 2007, 248, 354-366.
    
    [64] Vapnik,V. The Nature of Statistical Learning Theory. Springer-Verlag, 1999.
    
    [65] Fan,R.E. et al. Working set selection using the second order information for training SVM. J. Mach. Learn. Res., 2005,6,1889-1918.
    
    [66] Joachims,T. Making large scale SVM learning practical. In Scholkopf,B.,Burges,C. and Sola,A. (eds), Advances in Kernel Methods Support Vector Learning. MIT Press, Cambridge, 1999.
    
    [67] EL-Manzalawy,Y. and Honavar,V. WLSVM: integrating LibSVM into Weka environment. 2005, Available at http://www.cs.iastate.edu/~yasser/wlsvm.
    
    [68] Witten,I.H. and Frank,E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd edn. Morgan Kaufmann, San Francisco, 2005.
    
    [69] Liu,H. and Setiono,R. Chi2: feature selection and discretization of numeric attributes. In Proceedings of the 7th International Conferenceon Tools with Artificial Intelligence, 1995, pp. 388-391.
    
    [70] Forman,G. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 2003, 3,1289-1305.
    
    [71] Valdar,W.S. Scoring residue conservation. Proteins, 2002,48,227-241.
    [72] Mayrose,I. et al. Comparison of site-specific rate-inference methods: Bayesian methods are superior. Mol. Biol. Evol, 2004,21,1781-1791.
    
    [73] Karypis,G. YASSPP: better kernels and coding schemes lead to improvements in protein secondary structure prediction. Proteins, 2006,64,575-586.
    
    [74] Martin,J. et al. Analysis of an optimal hidden Markov model for secondary structure prediction. BMC Struct. Biol., 2006, 6, 25.
    
    [75] Yan,B.X. and Sun,Y.Q. Glycine residues provide flexibility for enzyme active sites. J. Biol. Chem, 1997,272, 3190-3194.
    
    [76] Chivers,P.T. et al. The CXXC motif: a rheostat in the active site. Biochemistry,1997, 36: 4061-4066.
    
    [77] Stegert,M.R. Functional characterisation of the mammalian NDR1 and NDR2 protein kinases and their regulation by the mammalian Ste20-like kinase MST3.Ph.D. dissertation, Basel University, 2005.
    
    [78] Moore,P.B. The three-dimensional structure of the ribosome and its components. Annu. Rev. Biophys. Biomol. Struct., 1998, 27, 35-58.
    
    [79] Ramakrishnan,V. and White,S.W. Ribosomal protein structures: insights into the architecture, machinery and evolution of the ribosome. Trends Biochem. Sci.,1998, 23, 208-212.
    
    [80] Luhrmann,R., Kastner,B. and Bach,M. Structure of spliceosomal snRNP's and their role in pre-mRNA splicing. Biochim. Biophys. Acta, 1990, 1087,265-292.
    
    [81] Jurica,M.S. and Moore,M.J. Pre-mRNA splicing: awash in a sea of proteins. Mol.Cell, 2003, 12,5-14.
    
    [82] Kim,V.N. MicroRNA biogenesis: coordinated cropping and dicing. Nat. Rev.Mol. Cell Biol., 2005, 6, 376-385.
    
    [83] Moore,MJ. From birth to death: the complex lives of eukaryotic mRNAs. Science, 2005, 309, 1514-1518.
    [84] Noller,H.F. RNA structure: reading the ribosome. Science, 2005, 309, 1508 -1514.
    
    [85] Freed,E.O. and Mouland,A.J. The cell biology of HIV-1 and other retroviruses. Retrovirology, 2006, 3,77.
    
    [86] Tarasow,T.M. and Eaton,B.E. Dressed for success: realising the catalytic potential of RNA. Biopolymers, 1998,48, 29-37.
    
    [87] Scott,W.G. and Klug.A. Ribozymes:Structures and mechanism in RNA catalysis.Trends Biochem. Sci., 1996,21, 220-224.
    
    [88] Scott,W.G. RNA catalysis. Curr. Opin. Struct. Biol., 1998, 8,720-726.
    
    [89] Fedor,M.J. and Williamson,J.R. The catalytic diversity of RNAs. Nat. Rev. Mol.Cell Biol, 2005, 6, 399-412.
    
    [90] Moras,D. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol., 1992, 2, 138-142.
    
    [91] Varani,G. and Nagai,K. RNA recognition by RNP proteins during RNA processing. Annu. Rev. Biophys. Biomol. Struct., 1998,27,407-445.
    
    [92] keeneJ.D. Ribonucleoprotein infrastructure regulating the flow of genetic information between the genome and the proteome. Proc. Natl Acad. Sci. USA, 2001,98,7018-7024.
    
    [93] Terribilini,M. et al. RNABindR: a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res., 2007, 35, W578-W584.
    
    [94] Terribilini,M. et al. Prediction of RNA binding sites in proteins from amino acid sequence. RNA, 2006, 12, 1450-1462.
    
    [95] Draper,D.E. Protein-RNA recognition. Annu. Rev. Biochem., 1995, 64,593-620.
    
    [96] Cusack,S. RNA-protein complexes. Curr. Opin. Struct. Biol., 1999, 9,66-73.
    [97] Draper,D.E. Themes in RNA-protein recognition. J. Mol. Biol., 1999, 293, 255-270.
    
    [98] Jones,S., Daley,D.T., Luscombe,N.M. et al. Protein-RNA interactions: a structural analysis. Nucleic Acids Res., 2001, 29, 943-954.
    
    [99] Treger,M. and Westhof,E. Statistical analysis of atomic contacts at RNA-protein interfaces. J. Mol. Recognit., 2001, 14,199-214.
    
    [100] Kim,H., Jeong,E., Lee,S.W. et al. Computational analysis of hydrogen bonds in protein-RNA complexes for interaction patterns. FEBS Lett., 2003,552,231-239.
    
    [101] Morozova,N., Allers,J., Myers,J. et al. Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures. Bioinformatics, 2006, 22, 2746-2752.
    
    [102] Ellis,J.J., Broom.M. and Jones,S. Protein-RNA interactions: structural analysis and functional classes. Proteins, 2007, 66, 903-911.
    
    [103] Jeong,E., Chung,I.F., Miyano,S. A neural network method for identification of RNA-interacting residues in protein. Genome Inform, 2004, 15, 105-116.
    
    [104] Jeong,E., Miyano,S. A Weighted profile based method for protein-RNA interacting residue prediction. In: Corrado P, Luca C, Stephen E, editors. Lecture notes in computer science, 2006, Vol. 3939.Berlin/Heidelberg: Springer. pp 123- 139.
    
    [105] Kim,O.T.P, Yura,K. and Go,N. Amino acid residue doublet propensity in the protein-RNA interface and its application to RNA interface prediction. Nucleic Acids Res., 2006, 34(22):6450-60.
    
    [106] Wang,Y. et al. PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids, 2008, 35(2):295-302.
    
    [107] Chen,Y.C. and Lim,C. Predicting RNA-binding sites from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res., 2008,36(5):e29.
    [108] Kumar.M, Gromiha,M.M. and Raghava,G.P.S. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins, 2008,71,189-194.
    
    [109] Allers J, Shamoo Y Structure-based analysis of protein-RNA interactions using the program ENTANGLE. J. Mol. Biol., 2001, 311,75-86.
    
    [110] McDonald.I.K. and Thornton,J.M. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol., 1994, 238,777-793.
    
    [111] Berman,H.M. et al. The nucleic-acid database: a comprehensive relational database of 3-dimensional structures of nucleic acids. Biophys. J., 1992,63,751-759.
    
    [112] Bryson,K. et al. Protein structure prediction servers at Universtiy College London. Nucleic Acids Res., 2005, W36-W38.
    
    [113] Dor,O. and Zhou,Y. Real-SPINE: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins. 2007, 68: 76-81.
    
    [114] Weiss, M.A. and Narayana, N. RNA recognition by argininerich peptide motifs.Biopolymers, 1998,48: 167-180.
    
    [115] Lustig, B., Arora, S., and Jernigan, R.L. RNA base - amino acid interaction strengths derived from structures and sequences. Nucleic Acids Res., 1997, 25:2562-2565.
    
    [116] Lichtarge,O. and Sowa,M.E. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. BioL, 2002, 12,21-27.
    
    [117] Cheng,J. and Baldi,P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics, 2007, 8:113.
    
    [118] Kabsch,W. and Sander,C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983,22, 2577-2637.
    [119] Cheng J. et al. SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acid Res., 2005,33(Web Server issue): W72-W76.
    
    [120] Adamczak R. et al. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins, 2005, 59:467-475.
    
    [121] Won K. et al. An evolutionary method for learning HMM structure: prediction of protein secondary structure. BMC Bioinformatics, 2007, 8:357.
    
    [122] Lin K. et al. A Simple and Fast Secondary Structure Prediction Algorithm using Hidden Neural Networks. Bioinformatics, 2005, 21,152-159.
    
    [123] Montgomerie S. et al. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics, 2006, 7:301.
    
    [124] Grag,A. et al. Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure. Proteins. 2005, 61(2):318-24.
    
    [125] Chen,K. and Kurgan,L. PFRES: Protein Fold Classification by using evolutionary information and predicted secondary structure. Bioinformatics. 2007,23:2843-2850.
    
    [126] Fuchs,P.F. and Alix,A.J. High accuracy prediction of beta-turns and their types using propensities and muliple alignments. Proteins. 2005,59:828-839.
    
    [127] Wang,Y. et al. Better prediction of the location of alpha-turns in proteins with support vector machine. Proteins. 2006, 65:49-54.
    
    [128] Birzele,F. and Kramer,S. A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics. 2006,22:2628-34.
    
    [129] Ahmad,S. et al. Real value predition of solvent accessibility from amino acid sequence. Proteins. 2003, 50: 629-635.
    
    [130] Yuan,Z. and Huang,B. Prediction of protein accessible surface areas by support vector regression. Proteins. 2004, 57:558-564.
    [131]Adamczak,R.et al.Accurate prediction of solvent accessibility using neural networks-based regression.Proteins.2004,56:753-767.
    [132]Garg,A.et al.Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure.Proteins.2005,61:318-324.
    [133]Wang,J.et al.Ahmad S.Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression.Proteins.2005,61:481-491.
    [134]Xu,Z.et al.QBES:predicting real values of solvent accessibility from sequences by efficient,constrained energy optimization.Proteins.2006,63:961-966.
    [135]Morik K,Brockhausen P,Joachims T Combining statistical learning with a knowledge-based approach-a case study in intensive care monitoring.In:Proceedings of the 16th International Conference on Machine Learning(ICML-99),1999.
    [136]Chou,K.C.Prediction of tight turns and their types in proteins.Analytical Biochem,2000,286:1-16.
    [137]殷剑宏,吴开亚.图论及其算法.合肥:中国科学技术大学出版社,2003.
    [138]殷志祥.图与组合优化中的DNA计算.北京:科学技术出版社,2004.
    [139]许进,张雷.DNA计算机原理、进展及难点(Ⅰ):生物计算系统及其在图论中的应用.计算机学报.2003,26(1):1-11.
    [140]Yin,Z.et al.A chinese postman problem based on DNA computing.Journal of Chemical Information and Computer Sciences.2002,42(2):224-224.
    [141]Yin,Z.and Xu,J.Chinese postman problem using molecular programming,Internet Electronic Journal of Molecular Design.2004,3(2):102-109.
    [142]Pan,L.and Xu,J.A surface-based DNA algorithm for the minimal vertex cover problem.Progress in Natural Science.2003,13(1):81-84.
    [143]Liu,W.and Xu,J.A DNA algorithm for the graph coloring problem.Journal of CHemical Information and Computers.2002,42(5):1176-1178.
    [144]Sakamoto,K.Molecular computation by DNA hairpin formation.Science.2000,283:1223-1227.
    [145]Liu,Q.et al.DNA computing on surfaces.Nature.2000,403:175-179.
    [146]Wu,H.An improved surface-based method for DNA computation.Biosystems.2001,59(1):1-5.
    [147]Braich,R.S.et al.Solution of a 20-variable 3-SAT problem on a DNA computer.Science.2002,296:499-502.
    [148]Ouyang,Q.et al.DNA solution of the maximal clique problem.Science.1997,278(17):446-449.
    [149]Head,T.et al.Computing with DNA by operating on plasmids.Biosystem.2000,57:87-93.
    [150]Rothemund,P.A DNA and restriction enzyme implementation of Turing machine.In DNA-based computers(Eds.Baum E.B.and Lipton R.J.) American Mathematical Society.1996,1-12.
    [151]Benenson,Y.et al.Programmable and autonomous computing machine made of biomolecules.Nature.2001,414(6862):430-434.
    [152]Frank,G.et al.Making DNA add.Science.1996,273:220-223.
    [153]Frank,G.and Carter,B.Use of a horizontal chain reaction for DNA-based addition.DIMACS Series in Discrete Mathematics and Theoretical Computer Science.1999,44:105-111.
    [154]Oliver,J.S.Computation with DNA:Matrix multiplication.DIMACS Series in Discrete Mathematics and Theoretical Computer Science.1999,44:113-121.
    [155]MAYA II,a second-generation Tic-Tac-Toe playing automaton.Online at http://tinyurl.com/4mvbnm.
    [156]丁永生等.DNA计算与软计算.北京:科学出版社,2002.
    [157]黄留玉.PCR最新技术原理、方法及应用.化学工业出版社,2005.
    [158]Paun,G.,Rozenberg,G.and Salomaa,A.,DNA Computing-New Computing Paradigms,Lect.Notes in Computer Sci.Springer,1998(许进等译,清华大学出版社,2002).
    [159]Navarro,G.A guided tour to approximate string matching.ACM Computing Surveys.2001,33(1):31-88.
    [160]Levenshtein,V.I.Binary coded capable of correcting deletion,insertions and reversals.(Russian) Doklady Akademii Nauk SSSR.1965,163(4):845-848;(English)Soviet Phys Doki.10(8):707-710.
    [161]Sellers,P.H.On the theory and computation of evolutionary distances.SLAM.J.Appl.Math.1974,26(4):787-793.
    [162]Hollmann,H.D.A relation between Levenshtein-type distances and insertion-and deletion-correcting capabilities of codes.IEEE Trans Inform Theory.1993,39(4):1424-1427.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700