用户名: 密码: 验证码:
蛋白质二级结构预测准确率影响因素探讨
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
从蛋白质的一级序列得到其对应的三维结构是目前生物信息学领域重要的课题之一。计算机预测方法被广泛应用于蛋白质二级结构的研究,其发展过程大体分为两个阶段:第一个阶段以数理统计作为出发点,基于单个氨基酸信息,如Chou-Fasman和GOR (Garnier-Osguthorpe-Robson)方法;第二个阶段基于进化信息,主要利用BLAST等工具在序列数据库中对搜索序列进行多重比对以取得同源信息PSSM(特异位点打分矩阵)利用PSI-BLAST取得相应的进化信息PSSM。本实验致力于氨基酸特性对基于PSSM预测方法的改进和预测准确率的提高。
     以SVM(支持向量机)作为实现手段,在PSSM基础上分别添加疏水因子和HEC(螺旋、折叠、无规则卷曲)倾向性两种理化因子作为单个氨基酸的特征值对蛋白质二级结构进行预测。本实验还同时设计对SVM使用进行改进方法实现双层SVM,即通过理化因子和双层SVM工具两种方法共同达到提高蛋白质二级结构预测准确率的目的。实验结果经相关系数分析表明,添加的疏水因子和HEC倾向性对Q3微弱正相关,与SOV值显著正相关。它证明氨基酸的疏水性与HEC倾向性对蛋白质二级结构的形成起到一定作用。通过双层SVM实验,无论是准确率的绝对值还是相关系数分析,双层网络都在二级结构预测的准确率上占有优势,改进的SVM对其预测过程起到明显的优化作用。预测的准确率的Q3值和SOV比目前国际常用的PSSM方法分别提高了2.76%和1.25%。
One of the most persistent problems in bioinformatics has been the unraveling of the protein primary structure to their unique tertiary structure. Most current protein secondary structure prediction programs employ multiple sequence alignments to capture local sequence patterns as input information for machine learning techniques. However, such local sequence patterns ignore the amino acids'intrinsic propensities for three states of the secondary structure, namely, n-helices,β-strands, or others (often referred to as coils). For this reason, we propose an approach to integrate the multiple sequence alignment profiles with amino acid propensities for machine learning input coding schemes. The position specific scoring matrices (PSSM) from PSI-BLAST were integrated with amino acid conformation parameters and hydrophobicity properties for protein secondary structure prediction with support vector machines (SVMs).
     The paper described SVM-based method with hydrohpobicity and HEC propensity with PSSM to predict protein secondary structure, which also used a two-layer SVM. The result analysis with correlative coefficient showed that the hydrophobicity and the HEC propensity had little relationship with the Q3 results but they had obviously relationship with their SOV results. The two-layer SVM technique showed improvement on both Q3 and SOV. The integrated method increased Q3 and SOV by 2.76% and 1.25% respectively.
引文
[1]Christian B. Anfinsen. Principles that Govern the Folding of Protein Chains. Science.20 July 1973. Vo. 181 No.4096.
    [2]Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure:support vector machine approach. J Mol Biol.2001,308:397-407.
    [3]Chou, P Y, Fasman. G D. Conformational parameters for amino acids in helical, sheet, and random coil regions calculated from proteins. Biochemistry,1974, Vol.13, No.2:211-222.
    [4]Gamier J, Osguthorpe. D. J, Robson. B. Analysis and implications of simple methods for predicting the secondary structure of globular proteins.JMB,1978,Vol.120,No.13:97-120.
    [5]Gamier J, Osguthorpe D.J, Robson. B. GOR method for prediction protein secondary structure from amino acid sequence. Methods Enzymol,1996 Vol.266, No.3:540-553
    [6]Taner Z Sen, Robert L Jernigan. GOR V server for protein secondary structure prediction Bioinformatics,2005,Vol.21,No.11:2787-2788
    [7]William W Ralph, Teresa Webster, Temple F. Smith. A modified Chou and Fasman protein structure algorithm,1987, Vol.3, No.3:211-216
    [8]B Rost, C Sander. Improved prediction of protein secondary structure by use of sequence profiles and neural network. PNAS,1993, Vol.90, No 12:7558-7562
    [9]James A cuff, Michele E Clamp. Jpred:a consensus secondary structure prediction server. Bioinformatics,1998, Vol.14, No.10:892-893
    [10]Cuff J. A and Barton G.J (1999) Application of enhanced multiple sequence alignment profiles to improve protein secondary structure prediction Proteins 40 502-511
    [11]J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH:a Protein Structure and Structural Feature Prediction Server, Nucleic Acids Research, vol.33 (web server issue), w72-76,2005.
    [12]E. H. Han, G. Karypis and V. Kumar. Text categorization using weight adjusted k-Nearest Neighbor classification, Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.53-65,2001.
    [13]Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol.1988 Aug 20;202(4):865-84.
    [14]B. Rost, PHD:predicting one-dimensional protein structure by profile based neural Networks. Meth Enzymol,1996, Vol.266, No.4:525-539
    [15]V. Vapnik and C. Cortes, "Support vector networks," Machine Learning, vol.20, pp.273-293,1995.
    [16]Sujun Hua, Zhirong Sun. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure:Support Vector Machine Approach. JMB,2001, Vol.308, No.6:397-407
    [17]孙向东,韦柳静,黄日波.蛋白质二级结构预测的支持向量机模型研究.广西农业生物科学,2004,Vol.123,No.11:67-71
    [18]N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, U.K.: Cambridge Univ. Press,2000.
    [19]K. K. Chin, "Support vector machines applied to speech pattern classification," M.Phil, thesis, Cambridge Univ., Cambridge, U.K.,1999.
    [20]Birzele F, Kramer S. A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics.2006 Aug 29; [Epub ahead of print]
    [21]C.-W. Hsu and C.-J. Lin, A comparison of methods for multi-class Support Vector Machines, IEEE Transactions on Neural Networks,13,415-425.2002.
    [22]Chih-Chung Chang and Chih-Jen Lin, LIBSVM:a library for support vector machines,2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [23]Mihoko V Bennett, Thomas R Willemain. The Filtered Nearest Neighbor Method for Generating Low-Discrepancy Sequences. Informs Journal on computing 2004, Vol.16, No.1:68-72
    [24]Lens epithelia contain a high-affinity, membrane steroid hormone-binding protein. Invest Ophthalmol Vis Sci.1999 Jun; 40(7):1452-9.
    [25]The mouse lens fiber-cell intrinsic membrane protein MP19 gene (Lim2) and granule membrane protein GMP-17 gene (Nkg7):Isolation and sequence analysis of two neighboring genes. Mol Vis. 2001 Apr 2; 7:79-88. Epub 2001 Apr 2.
    [26]Lens epithelial cell mRNA. I. Cloning and sequencing of a messenger RNA with a basic motif/leucine-rich domain specifically expressed in rat lens epithelial cells. Exp Eye Res.1995 Jun; 60(6):675-82.
    [27]Molecular cloning of the bovine alpha 1(Ⅳ) procollagen gene (COL4A1) and its use in investigating the regulation of expression of type IV procollagen by retinoic acid in bovine lens epithelial cells. Cell Biol Int.1997 Aug; 21(8):501-10.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700