用户名: 密码: 验证码:
基于机器学习的蛋白质结合位点特征化和预测方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组和许多其它物种基因组序列测序计划的成功完成,不断增长的基因组序列数据提供了数百万条蛋白质的编码信息。作为遗传信息的体现者,蛋白质是最主要的生命活动过程的载体和功能执行者。在生物体细胞中,蛋白质是通过与其它生物分子相互作用来完成特定的功能,但直接参与了与其它生物分子相互作用的残基只占有蛋白质上的一部分,这些结合位点对实现蛋白质的功能显得十分重要。因此,分析和识别蛋白质-其它分子结合位点成为研究蛋白质功能实现机制的基础。
     近十年来,研究者开始关注利用计算方法预测蛋白质上的功能残基,特别是基于机器学习的预测方法,从蛋白质的序列或结构信息出发预测功能残基。本文使用氨基酸属性来探讨蛋白质结合不同类型分子的结合位点的理化特征的共性和特性,并在此基础上提出了预测蛋白质与其它类型分子的结合位点(如血红素结合位点)的分类方法,然后主要从蛋白质的三维结构和拓扑结构信息出发设计出有效的特征和特征表示方法来描述和预测DNA结合残基。全文主要的研究内容概括如下:
     1.利用氨基酸理化属性对蛋白质与不同类型分子(蛋白质,DNA/RNA和血红素分子)结合位点的特异性特征进行分析,并提出了从序列信息预测血红素结合位点的分类方法。本工作首先从最简单直观却有着高解释性的理化特征出发,分析了蛋白质结合不同类型分子的结合位点的相关的理化特征,结果表明不同类型结合分子的结合位点具有不同的性质。然后,我们提出了一种简单直观的特征选择方法和整合序列谱编码方案,实现了基于整合序列谱预测血红蛋白的结合位点的新方法。在训练集上的交叉验证和测试集上的独立验证结果均表明了我们的方法与文献中已有报道的结果相比,在预测精度上得到了较大的提高。
     2.DNA结合残基预测模型中的特征设计与分析。本工作首先构建了基准数据集,该数据集整合了蛋白质绑定DNA前后的结构数据,然后引入了新的结构特征包括温度因子、包装密度和拓扑结构特征来描述DNA绑定蛋白和对应的非绑定蛋白上的结合残基,利用新特征对结合残基的分析结果能给分子生物学家提供有用的信息。
     3.提出了基于特征降维策略的DNA结合残基预测模型。在我们前面工作中对DNA结合残基的特征设计和分析的基础上,进一步提出了权值因子来定量描述周围氨基酸对中心氨基酸依赖距离的贡献,然后通过提取表面补缀上的加权平均特征进行特征降维,在此基础上实现了基于加权平均的降维特征集预测DNA结合残基的新方法,实验结果表明,本章提出的新方法相比现有文献中的机器学习方法更有更高的效率和预测精度,同时该方法中提出的加权平均的降维策略可以扩展应用到其它类型的结合残基预测研究中。
With the accomplishment of genome sequencing projects of human and other species, the increasing availability of genome sequencing data provides sufficient encoding information for hundreds of thousands of proteins. As the production of genetic information, proteins are the carriers of the most important biological activities and the executors of cellar functions. In biological cells, proteins perform specific functions when they interact with other molecules. However, only a part of residues on proteins are directly participating the interaction with other molecules. The interacting residues play the crucial roles in various biological functions. Therefore, the characterization and identification of functional residues or binding sites provides important clues for exploring the function of proteins.
     In the last decade, researchers have been focusing on the development of computational methods to predict functional residues on proteins. Especially, the machine learning-based methods are applied to the prediction of binding residues from sequence or structure-derived features. In our dissertation, we first exploit amino acid indices to analyze the physicochemical attributes specific to the different types of molecules (such as protein, DNA/RNA and heme) binding to proteins, and we propose a new classification method to predict heme binding residues from heme binding pretein sequences. More impoartantly, we mainly explore and design effective structural and topological features to characterize and predict DNA-binding residues. The outline of the research topics is listed as following:
     1. We exploit amino acid indices to analyze the physicochemical attributes specific to the different types of molecules (protein, DNA/RNA and heme) binding to proteins, and propose a new sequence-based method to predict heme binding residues. Our results have been shown that the different types of binding residues have their own relevant attributes. We first propose an intuitive feature selection scheme and a novel integrative sequence profile, which is generated by coupling the PSSM with the selected physicochemical properties. Evaluation experiments by using 5-fold cross validation on the training set and on the independent test demonstrate that our proposed approach outperforms the conventional methods based on PSSM profiles for prediction of heme binding residues.
     2. The feature design and analysis of DNA-binding residues in the prediction models. In the section, we first build the benchmark datasets, which consist of DNA-binding protein structures both in their holo and apo forms. Then, we introduce the novel features such as temperature factor, packing density and betweenness centrality, to descible DNA-binding residues on bound and unbound structures. The statistical results derived from the new features can provide useful information and knowledge to molecule biologists.
     3. We propose a new method using the stradegy based on dimensionality reduction to predict DNA-binding residues. In the previous section, the methods for predicting DNA-binding residues included data for neighboring residues by concatenating a number of properties, resulting in highdimensional feature vectors. To overcome the limitations, we first introduce a novel weighting factor to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Then, a weighted average scheme (dimensionality reduction) is proposed to represent the surface patch of the considering residue. Based on the above strategies, we exploit a reduced set of weighted average features to improve prediction of DNA-binding residues from structures. Experimental results indicate that our approach can predict DNA-binding residues with high accuracy and high efficiency using a reduced set of weighted average features, and compares favorably to the two previous methods. We believe that the weighted average scheme can potentially be expanded to predict other functional sites, such as protein-protein and protein-RNA interaction residues.
引文
1. Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC: The Genomes On Line Database (GOLD) in 2009:status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010,38(Database issue):D346-354.
    2. Lehninger A, Nelson D, Cox M:Lehninger principles of biochemistry:WH Freeman; 2004.
    3. Ruvkun GB, Ausubel FM:A general method for site-directed mutagenesis in prokaryotes. Nature 1981,289(5793):85-88.
    4. Gherardini PF, Helmer-Citterich M:Structure-based function prediction:approaches and applications. Brief Funct Genomic Proteomic 2008,7(4):291-302.
    5. 夏俊峰:蛋白质相互作用及其结合面热点残基的预测方法研究.中国科学技术大学博士学位论文;2010.
    6. Nooren IM, Thornton JM:Diversity of protein-protein interactions. EMBO J 2003, 22(14):3486-3492.
    7. Keskin O, Gursoy A, Ma B, Nussinov R:Principles of protein-protein interactions:what are the preferred ways for proteins to interact? Chem Rev 2008,108(4):1225-1244.
    8. Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D:PRIDB:a Protein-RNA Interface Database. Nucleic Acids Res 2011,39(Database issue):D277-282.
    9. Bahadur RP, Zacharias M, Janin J:Dissecting protein-RNA recognition sites. Nucleic Acids Res 2008,36(8):2705-2716.
    10. Ellis JJ, Broom M, Jones S:Protein-RNA interactions:structural analysis and functional classes. Proteins 2007,66(4):903-911.
    11. Chen Y, Varani G:Protein families and RNA recognition. FEBS J 2005,272(9):2088-2097.
    12. Jones S, Daley DT, Luscombe NM, Berman HM, Thornton JM:Protein-RNA interactions:a structural analysis. Nucleic Acids Res 2001,29(4):943-954.
    13. Luscombe NM, Austin SE, Berman HM, Thornton JM:An overview of the structures of protein-DNA complexes. Genome Biol 2000, 1(1):REVIEWS001.
    14. Harrison SC:A structural taxonomy of DNA-binding domains. Nature 1991, 353(6346):715-719.
    15. Harrison SC, Aggarwal AK:DNA recognition by proteins with the helix-turn-helix motif. Annu Rev Biochem 1990,59:933-969.
    16. Krishna SS, Majumdar I, Grishin NV:Structural classification of zinc fingers:survey and summary. Nucleic Acids Res 2003,31(2):532-550.
    17. Landschulz WH, Johnson PF, McKnight SL:The leucine zipper:a hypothetical structure common to a new class of DNA binding proteins. Science 1988,240(4860):1759-1764.
    18. Massari ME, Murre C:Helix-loop-helix proteins:regulators of transcription in eucaryotic organisms. Mol Cell Biol 2000,20(2):429-440.
    19. Csermely P, Palotai R, Nussinov R:Induced fit, conformational selection and independent dynamic segments:an extended view of binding events. Trends Biochem Sci 2010, 35(10):539-546.
    20. Fischer E:Einfluss der Configuration auf die Wirkung der Enzyme. Berichte der deutschen chemischen Gesellschaft 1894,27(3):2985-2993.
    21. Koshland DE:Application of a Theory of Enzyme Specificity to Protein Synthesis. Proc Natl Acad Sci U S A 1958,44(2):98-104.
    22. Ma B, Kumar S, Tsai GJ, Nussinov R:Folding funnels and binding mechanisms. Protein Eng 1999,12(9):713-720.
    23. Tsai CJ, Kumar S, Ma B, Nussinov R:Folding funnels, binding funnels, and protein function. Protein Sci 1999,8(6):1181-1190.
    24. Bustamante C, Chemla YR, Forde NR, Izhaky D:Mechanical processes in biochemistry. Annu Rev Biochem 2004,73:705-748.
    25. Hirokawa N, Takemura R:Biochemical and molecular characterization of diseases linked to motor proteins. Trends Biochem Sci 2003,28(10):558-565.
    26. Ansari HR, Raghava GP:Identification of NAD interacting residues in proteins. BMC Bioinformatics 2010,11:160.
    27. Mishra NK, Raghava GP:Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information. BMC Bioinformatics 2010,11 Suppl 1:S48.
    28. Chauhan JS, Mishra NK, Raghava GP:Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information. BMC Bioinformatics 2010,11:301.
    29. Ahmad S, Gromiha MM, Sarai A:Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 2004,20(4):477-486.
    30. Zhou HX, Shan Y:Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001,44(3)336-343.
    31. Wallace AC, Laskowski RA, Thornton JM:LIGPLOT:a program to generate schematic diagrams of protein-ligand interactions. Protein Eng 1995,8(2):127-134.
    32. Xiong Y, Xia J, Zhang W, Liu J:Exploiting a Reduced Set of Weighted Average Features to Improve Prediction of DNA-Binding Residues from 3D Structures. PLoS One 2011, 6(12):e28440.
    33. Xiong Y, Liu J, Wei DQ:An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 2011,79(2):509-517.
    34. Zeng T, Li J, Liu J:Distinct interfacial biclique patterns between ssDNA-binding proteins and those with dsDNAs. Proteins 2011,79(2):598-610.
    35. Jones S, Shanahan HP, Berman HM, Thornton JM:Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res 2003,31(24):7189-7198.
    36. Gromiha MM, Fukui K:Scoring function based approach for locating binding sites and understanding recognition mechanism of protein-DNA complexes. J Chem Inf Model 2011, 51(3):721-729.
    37. Gromiha MM, Yokota K, Fukui K:Understanding the recognition mechanism of protein-RNA complexes using energy based approach. Curr Protein Pept Sci 2010, 11(7):629-638.
    38. Gromiha MM, Yokota K, Fukui K:Energy based approach for understanding the recognition mechanism in protein-protein complexes. Mol Biosyst 2009,5(12):1779-1786.
    39. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000,28(1):235-242.
    40. Jones S, Thornton JM:Principles of protein-protein interactions. Proc Natl Acad Sci U S A 1996,93(1):13-20.
    41. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N:ConSurf:identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 2003,19(1):163-164.
    42. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N:Rate4Site:an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002,18 Suppl 1:S71-77.
    43. Haste Andersen P, Nielsen M, Lund O:Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006,15(11):2558-2567.
    44. Parca L, Gherardini PF, Helmer-Citterich M, Ausiello G:Phosphate binding sites identification in protein structures. Nucleic Acids Res 2011,39(4):1231-1242.
    45. Jones S, Thornton JM:Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997,272(1):121-132.
    46. Jones S, Thornton JM:Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997,272(1):133-143.
    47. Lo Conte L, Chothia C, Janin J:The atomic structure of protein-protein recognition sites. J Mol Biol 1999,285(5):2177-2198.
    48. Zhang QC, Petrey D, Norel R, Honig BH:Protein interface conservation across structure space. Proc Natl Acad Sci U S A 2010,107(24):10896-10901.
    49. Lise S, Buchan D, Pontil M, Jones DT:Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS One 2011,6(2):e16774.
    50. Keskin O, Ma B, Nussinov R:Hot regions in protein-Protein interactions:the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5):1281-1294.
    51. Halperin I, Wolfson H, Nussinov R:Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure 2004,12(6):1027-1038.
    52. Thorn KS, Bogan AA:ASEdb:a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001,17(3):284-285.
    53. Bogan AA, Thorn KS:Anatomy of hot spots in protein interfaces. J Mol Biol 1998,280(1):1-9.
    54. Wu JS, Liu HD, Duan XY, Ding Y, Wu HT, Bai YF, Sun X:Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics 2009,25(1)30-35.
    55. Nadassy K, Wodak SJ, Janin J:Structural features of protein-nucleic acid recognition sites. Biochemistry 1999,38(7):1999-2017.
    56. Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L:Prediction of protein-RNA binding sites by a random forest method with combined features. Bioinformatics 2010,26(13):1616-1622.
    57. Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS:Origins of specificity in protein-DNA recognition. Annu Rev Biochem 2010,79:233-269.
    58. Morozov AV, Havranek JJ, Baker D, Siggia ED:Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res 2005,33(18):5781-5798.
    59. Selvaraj S, Kono H, Sarai A:Specificity of protein-DNA recognition revealed by structure-based potentials:symmetric/asymmetric and cognate/non-cognate binding. J Mol Biol 2002,322(5):907-915.
    60. Zheng S, Robertson TA, Varani G:A knowledge-based potential function predicts the specificity and relative binding energy of RNA-binding proteins. FEBS J 2007, 274(24):6378-6391.
    61. Leulliot N, Varani G:Current topics in RNA-protein recognition:control of specificity and biological function through induced fit and conformational capture. Biochemistry 2001, 40(27):7947-7956.
    62. Temiz NA, Camacho CJ:Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface. Nucleic Acids Res 2009,37(12):4076-4088.
    63. Ashworth J, Baker D:Assessment of the optimization of affinity and specificity at protein-DNA interfaces. Nucleic Acids Res 2009,37(10):e73.
    64. Huang B, Schroeder M:LIGSITEcsc:predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 2006,6:19.
    65. Laurie AT, Jackson RM:Q-SiteFinder:an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 2005,21(9):1908-1916.
    66. Gilson MK, Zhou HX:Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct 2007,36:21-42.
    67. Sousa SF, Fernandes PA, Ramos MJ:Protein-ligand docking:current status and future challenges. Proteins 2006,65(1):15-26.
    68. Schneider S, Marles-Wright J, Sharp KH, Paoli M:Diversity and conservation of interactions for binding heme in b-type heme proteins. Nat Prod Rep 2007,24(3):621-630.
    69. Smith U, Kahraman A, Thornton JM:Heme proteins--diversity in structural characteristics, function, and folding. Proteins 2010,78(10):2349-2368.
    70. Weisel M, Proschak E, Schneider G:PocketPicker:analysis of ligand binding-sites with shape descriptors. Chem Cent J 2007,1:7.
    71. Hendlich M, Rippmann F, Barnickel G:LIGSITE:automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 1997,15(6):359-363,389.
    72. Laskowski RA:SURFNET:a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 1995,13(5):323-330,307-328.
    73. Levitt DG, Banaszak U:POCKET:a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph 1992,10(4):229-234.
    74. Armon A, Graur D, Ben-Tal N:ConSurf:an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001, 307(1):447-463.
    75. Liu R, Hu J:HemeBIND:a novel method for heme binding residue prediction by combining structural and sequence information. BMC Bioinformatics 2011,12:207.
    76. Liu R, Hu J:Computational prediction of heme-binding residues by exploiting residue interaction network. PLoS One 2011,6(10):e25560.
    77. Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M:AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2008,36(Database issue):D202-205.
    78. Kawashima S, Kanehisa M:AAindex:amino acid index database. Nucleic Acids Res 2000, 28(1):374.
    79. Si J, Zhang Z, Lin B, Schroeder M, Huang B:MetaDBSite:a meta approach to improve protein DNA-binding sites prediction. BMCSyst Biol 2011,5 Suppl 1:S7.
    80. Wang L, Huang C, Yang MQ, Yang JY:BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMCSyst Biol 2010,4 Suppl 1:S3.
    81. Carson MB, Langlois R, Lu H:NAPS:a residue-level nucleic acid-binding prediction server. Nucleic Acids Res 2010,38(Web Server issue):W431-435.
    82. Cai Y, He Z, Shi X, Kong X, Gu L, Xie L:A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach. Mol Cells 2010,30(2):99-105.
    83. Wang L, Yang MQ, Yang JY:Prediction of DNA-binding residues from protein sequence information using random forests. BMCGenomics 2009,10 Suppl 1:S1.
    84. Ofran Y, Mysore V, Rost B:Prediction of DNA-binding residues from sequence. Bioinformatics 2007,23(13):1347-1353.
    85. Hwang S, Gou ZK, Kuznetsov IB:DP-Bind:a Web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins. Bioinformatics 2007,23(5):634-636.
    86. Ho SY, Yu FC, Chang CY, Huang HL:Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method. Biosystems 2007,90(1):234-241.
    87. Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V:Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics 2006,7:262.
    88. Wang U, Brown SJ:BindN:a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 2006,34:W243-W248.
    89. Wang L, Brown SJ:Prediction of DNA-binding residues from sequence features. J Bioinform Comput Biol 2006,4(6):1141-1158.
    90. Kuznetsov IB, Gou ZK, Li R, Hwang SW:Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 2006,64(1):19-27.
    91. Ahmad S, Sarai A:PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 2005,6:33.
    92. Koike A, Takagi T:Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004,17(2):165-173.
    93. Zhao H, Yang Y, Zhou Y:Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 2010,26(15):1857-1863.
    94. Gao M, Skolnick J:DBD-Hunter:a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res 2008,36(12):3978-3992.
    95. Gao M, Skolnick J:From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions. PLoS Comput Biol 2009,5(3):e1000341.
    96. Tjong H, Zhou HX:DISPLAR:an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res 2007,35(5):1465-1477.
    97. Nimrod G, Szilagyi A, Leslie C, Ben-Tal N:Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 2009,387(4):1040-1053.
    98. Shazman S, Celniker G, Haber O, Glaser F, Mandel-Gutfreund Y:Patch Finder Plus (PFplus):a web server for extracting and displaying positive electrostatic patches on protein surfaces. Nucleic Acids Res 2007,35(Web Server issue):W526-530.
    99. Chen YC, Wu CY, Lim C:Predicting DNA-binding amino acid residues from electrostatic stabilization upon mutation to Asp/Glu and evolutionary conservation. Proteins 2007, 67(3):671-680.
    100. Tsuchiya Y, Kinoshita K, Nakamura H:Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces. Proteins 2004,55(4):885-894.
    101. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y:Annotating nucleic acid-binding function based on protein structure. J Mol Biol 2003,326(4):1065-1079.
    102. Ozbek P, Soner S, Erman B, Haliloglu T:DNABINDPROT:fluctuation-based predictor of DNA-binding residues within a network of interacting residues. Nucleic Acids Res 2010, 38(Web Server issue):W417-423.
    103. Zen A, de Chiara C, Pastore A, Micheletti C:Using dynamics-based comparisons to predict nucleic acid binding sites in proteins:an application to OB-fold domains. Bioinformatics 2009,25(15):1876-1883.
    104. Cole C, Warwicker J:Side-chain conformational entropy at protein-protein interfaces. Protein Sci 2002, 11(12):2860-2870.
    105. Maetschke SR, Yuan Z:Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics 2009,10:341.
    106. Chea E, Livesay DR:How accurate and statistically robust are catalytic site predictions based on closeness centrality? BMC Bioinformatics 2007,8:153.
    107. Li Y, Li G, Wen Z, Yin H, Hu M, Xiao J, Li M:Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011,6(3):e16932.
    108. Sathyapriya R, Vijayabaskar MS, Vishveshwara S:Insights into protein-DNA interactions through structure network analysis. PLoS Comput Biol 2008,4(9):e1000170.
    109. Li J, Wang J, Wang W:Identifying folding nucleus based on residue contact networks of proteins. Proteins 2008,71(4):1899-1907.
    110. del Sol A, Fujihashi H, Amoros D, Nussinov R:Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol Syst Biol 2006,2:2006 0019.
    111. del Sol A, Fujihashi H, Amoros D, Nussinov R:Residue centrality, functionally important residues, and active site shape:analysis of enzyme and non-enzyme families. Protein Sci 2006,15(9):2120-2128.
    112. Zhang T, Zhang H, Chen K, Ruan JS, Shen SY, Kurgan L:Analysis and Prediction of RNA-Binding Residues Using Sequence, Evolutionary Conservation, and Predicted Secondary Structure and Solvent Accessibility. Curr Protein Pept Sc 2010, 11(7):609-628.
    113. Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L:Accurate sequence-based prediction of catalytic residues. Bioinformatics 2008,24(20):2329-2338.
    114. Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, Hsieh SH, Srinivasan AR, Schneider B:The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys J 1992,63(3):751-759.
    115. Yamaguchi A, lida K, Matsui N, Tomoda S, Yura K, Go M:Het-PDB Navi.:a database for protein-small molecule interactions. J Biochem 2004,135(1):79-84.
    116. Wallach I, Lilien R:The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding. Bioinformatics 2009,25(5):615-620.
    117. Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb Gl, Akutsu T, Whisstock JC:Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010, 26(6):752-760.
    118. Sikic M, Tomic S, Vlahovicek K:Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Comput Biol 2009,5(1):e1000278.
    119. Porollo A, Meller J:Prediction-based fingerprints of protein-protein interactions. Proteins 2007,66(3):630-645.
    120. Segura J, Jones PF, Fernandez-Fuentes N:Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi Diagrams. BMC Bioinformatics 2011, 12(1)352.
    121. Abdi H, Williams U:Principal component analysis. Wiley Interdisciplinary reviews: Computational Statistics 2010,2(4):433-459.
    122. Jain A, Zongker D:Feature selection:Evaluation, application, and small sample performance. leee T Pattern Anal 1997,19(2):153-158.
    123. Kohavi R, John G:Wrappers for feature subset selection. Artificial intelligence 1997, 97(1-2):273-324.
    124. Kent JT:Information gain and a general measure of correlation. Biometrika 1983,70(1):163.
    125. Fang J, Tai D:Evaluation of mutual information, genetic algorithm and SVR for feature selection in QSAR regression. Curr Drug Discov Technol 2011,8(2):107-111.
    126. Shen C, Kim J, Wang L:Scalable large-margin Mahalanobis distance metric learning. IEEE Trans Neural Netw 2010,21(9):1524-1530.
    127. Quinlan JR:Induction of decision trees. Machine learning 1986, 1(1):81-106.
    128. Quinlan JR:C4.5:programs for machine learning:Morgan Kaufmann; 1993.
    129. Hecht-Nielsen R:Theory of the backpropagation neural network. In:1988. IEEE:593-605 vol. 591.
    130. Dorigo M, Maniezzo V, Colorni A:Ant system:optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics, Part B 1996,26(1):29-41.
    131. Kennedy J, Eberhart R:Particle swarm optimization. In:Proceedings of IEEE International Conference on Neural Networks:1995. IEEE:1942-1948.
    132. Breiman L:Random forests. Machine learning 2001,45(1):5-32.
    133. Breiman L:Bagging predictors. Machine learning 1996,24(2):123-140.
    134. Ho TK:The random subspace method for constructing decision forests. IEEE Trans on Pattern Analysis and Machine Intelligence 1998,20(8):832-844.
    135. Li MH, Lin L, Wang XL, Liu T:Protein-protein interaction site prediction based on conditional random fields. Bioinformatics 2007,23(5):597-604.
    136. Tipping ME:Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research 2001,1:211-244.
    137. Cortes C, Vapnik V:Support-vector networks. Machine learning 1995,20(3):273-297.
    138. El-Manzalawy Y, Honavar V:WLSVM:integrating LibSVM into Weka environment. Software available at http://wwwcsiastateedu/~yasser/wlsvm/2005.
    139. Chang C, Lin C:LIBSVM:a library for support vector machines. LIBSVM software website. Available:http://www.csie.ntu.edu.tw/-cjlin/libsvm/. Accessed 2011 May 2.2001.
    140. Yan C, Dobbs D, Honavar V:A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004,20 Suppl 1:i371-378.
    141. Egan JP:Signal detection theory and ROC-analysis. New York:Academic Press; 1975.
    142. Davis J, Goadrich M:The relationship between Precision-Recall and ROC curves. In:Proc 23rd Int Conf on Machine Learning:2006. ACM:233-240.
    143. Drummond C, Holte RC:What ROC curves can't do(and cost curves can). In:Proceedings of ROC analysis in artificial intelligence, ROCAI:2004. Citeseer:19-26.
    144. Huang HL, Lin IC, Liou YF, Tsai CT, Hsu KT, Huang WL, Ho SJ, Ho SY:Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinformatics 2011,12 Suppl 1:S47.
    145. Tung CW, Ho SY:Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics 2008,9:310.
    146. Tung CW, Ho SY:POPI:predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics 2007,23(8):942-949.
    147. Chen XW, Jeong JC:Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009,25(5):585-591.
    148. Deng L, Guan J, Dong Q, Zhou S:Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics 2009,10(1):426.
    149. Terribilini M, Sander JD, Lee JH, Zaback P, Jernigan RL, Honavar V, Dobbs D:RNABindR:a server for analyzing and predicting RNA-binding sites in proteins. Nucleic Acids Res 2007, 35(Web Server issue):W578-584.
    150. Fufezan C, Zhang J, Gunner MR:Ligand preference and orientation in b- and c-type heme-binding proteins. Proteins 2008,73(3):690-704.
    151. Chauhan JS, Mishra NK, Raghava GP:Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics 2009,10(1):434.
    152. Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M:Automated analysis of interatomic contacts in proteins. Bioinformatics 1999,15(4):327-332.
    153. Xia JF, Wang SL, Lei YK:Computational methods for the prediction of protein-protein interactions. Protein Pept Lett 2010,17(9):1069-1078.
    154. Xia JF, Zhao XM, Huang DS:Predicting protein-protein interactions from protein sequences using meta predictor. Amino Acids 2010,39(5):1595-1599.
    155. Larsen TA, Olson AJ, Goodsell DS:Morphology of protein-protein interfaces. Structure 1998, 6(4):421-427.
    156. Chen P, Li J:Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 2010, 11(1):402.
    157. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ:Gapped BLAST and PSI-BLAST:a new generation of protein database search programs. Nucleic Acids Res 1997,25(17):3389-3402.
    158. 吴建盛:蛋白质-核酸相互作用的统计分析与预测.东南大学博士学位论文;2009.
    159. Henikoff S, Henikoff JG:Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992,89(22):10915-10919.
    160. Jones DT:Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999,292(2):195-202.
    161. Ma X, Guo J, Wu J, Liu H, Yu J, Xie J, Sun X:Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature. Proteins 2011,79(4):1230-1239.
    162. Shimizu K, Hirose S, Noguchi T:POODLE-S:web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007,23(17):2337-2338.
    163. Su CT, Chen CY, Ou YY:Protein disorder prediction by condensed PSSM considering propensity for order or disorder. BMC Bioinformatics 2006,7:319.
    164. Liu R, Jiang W, Zhou Y:Identifying protein-protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area. Amino Acids 2010, 38(1):263-270.
    165. Hua S, Sun Z:Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001,17(8):721-728.
    166. Aurora R, Rose GD:Helix capping. Protein Sci 1998,7(1):21-38.
    167. Qian N, Sejnowski TJ:Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 1988,202(4):865-884.
    168. Suyama M, Ohara O:DomCut:prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 2003,19(5):673-674.
    169. Chandonia JM, Brenner SE:The impact of structural genomics:expectations and outcomes. Science 2006,311(5759):347-351.
    170. Brenner SE:A tour of structural genomics. Nat Rev Genet 2001,2(10):801-809.
    171. Skolnick J, Fetrow JS, Kolinski A:Structural genomics and its importance for gene function analysis. Nat Biotechnol 2000,18(3):283-287.
    172. Yuan Z, Zhao J, Wang ZX:Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng 2003,16(2):109-114.
    173. 刘融:蛋白质相互作用及其位点的特征分析与预测.华中科技大学博士学位论文;2009.
    174. Neuvirth H, Raz R, Schreiber G:ProMate:a structure based prediction program to identify the location of protein-protein binding sites. J Mol Biol 2004,338(1):181-199.
    175. Wang G, Dunbrack RL, Jr.:PISCES:a protein sequence culling server. Bioinformatics 2003, 19(12):1589-1591.
    176. Miller S, Lesk AM, Janin J, Chothia C:The accessible surface area and stability of oligomeric proteins. Nature 1987,328(6133):834-836.
    177. Hubbard SJ, Thornton JM:NACCESS. Department of Biochemistry and Molecular Biology, University College London 1993.
    178. Rost B, Sander C:Conservation and prediction of solvent accessibility in protein families. Proteins 1994,20(3):216-226.
    179. Xia JF, Zhao XM, Song J, Huang DS:APIS:accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics 2010,11(1):174.
    180. Drenth J:Principles of protein X-ray crystallography:Springer Verlag; 1999.
    181. Ahmad S, Keskin O, Sarai A, Nussinov R:Protein-DNA interactions:structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins. Nucleic Acids Res 2008,36(18):5922-5932.
    182. Jones S, Thornton JM:Protein-protein interactions:a review of protein dimer structures. Prog Biophys Mol Biol 1995,63(1):31-65.
    183. Luque I, Freire E:Structural stability of binding sites:consequences for binding affinity and allosteric effects. Proteins 2000, Suppl 4:63-71.
    184. Cho Kl, Kim D, Lee D:A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res 2009,37(8):2672-2687.
    185. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H:Assessing the accuracy of prediction algorithms for classification:an overview. Bioinformatics 2000,16(5):412-424.
    186. Zhang W, Xiong Y, Zhao M, Zou H, Ye X, Liu J:Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature. BMC Bioinformatics 2011,12:341.
    187. del Sol A, O'Meara P:Small-world network approach to identify key residues in protein-protein interaction. Proteins 2005,58(3):672-682.
    188. Chiu TK, Sohn C, Dickerson RE, Johnson RC:Testing water-mediated DNA recognition by the Hin recombinase. EMBO J 2002,21(4):801-814.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700