用户名: 密码: 验证码:
基于SAS系统的基因序列模型分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物信息学是用数理和信息科学的观点、理论和方法,以计算机为工具对生物信息进行获取、处理、存储、分配、分析和解释的一门学科,它是数学、生物学、医药学、计算机科学和物理学等学科的有机结合。SAS系统作为数理统计中的重要方法,在生物信息学中同样有着巨大的作用,其中聚类分析法,判别分析法,主成分分析法以及时间序列模型等越来越广泛的运用到生物信息学中,为研究生物信息学问题提供了更广泛的方法与思路。
     本文的主要工作包括以下几个方面:
     1、根据木聚糖酶分子的进化情况,以木聚糖酶分子中的几种重要的氨基酸含量为变量,设计一个时间序列的实验,并利用ARIMA模型进行了氨基酸含量的分析与预测,详细说明了建模的步骤,并介绍了建模的前提条件与参数选择,得到了所选择的氨基酸的进化趋势图,通过对图形的分析来说明其含量在各个进化阶段的变化,得出木聚糖酶两家族的进化稳定性特征以及甘氨酸在两家族进化过程中的差异。由此结果可推广到研究两家族同义密码子的偏好性问题。
     2、利用SAS系统中的聚类分析方法研究了急性出血性结膜炎病毒,引发此病的病原检测呈阳性的病原有多种,以蛋白质经典HP-模型对氨基酸的分类为依据,以CLUSTER为主要过程,分别采用WARD法与重心法对病毒及四类氨基酸进行聚类,得到聚类的谱系图。由聚类的结果得到氨基酸含量在几种病毒中的差异,从而简单分析了其密码子偏好性在几种病毒中的差异。
     3、利用MEGA软件对甲型流感病毒的血凝素的同源性及进化性做了分析,在同源性的基础上,得到甲型流感病毒的16种血凝素亚型的系统进化树,根据系统进化树的进化情况,结合RSCU方法分析了感染人类的血凝素的进化特征,并通过系统进化树与BLAST方法的相结合,分析了我国甲型流感病毒的现状与趋势。
     本论文的创新点为:
     1、在木聚糖酶的研究中,引入了ARIMA模型对某种氨基酸的进化趋势进行分析。
     2、在构建系统进化树的基础上,结合RSCU方法以及BLAST方法进行改进,具有很高的应用价值。
Bioinformatics is a discipline which is the use of mathematical and information science point of view, theories and methods to the computer as a tool for bioinformation question, processing, storage, distribution, analysis and interpretation. It is a combination of mathematics, biology, medicine, computer science and physics and other disciplines. As an important mathematical statistics method, SAS system has a great role in bioinformatics, including cluster analysis, discriminate analysis, principal component analysis and time series models to the more widely used in bioinformatics. The study of bioinformatics by SAS system provides a broader methods and ideas.
     The main contents are listed as follows:
     1. According to the evolution of xylanase molecules, so as to xylanase molecule in amino acid content of several important variables, design a time series of experiments carried out using the ARIMA model for the analysis and prediction of amino acid content, detailed description of the modeling steps, and introduced a prerequisite for modeling and parameter selection, has been chosen evolutionary trends of amino acids, through the analysis of graphics to illustrate the evolution of its content in various stages of change, derived xylanase stability characteristics of the evolution of two families and the two families of glycine in the evolution of differences. The results of this can be extended to the issues of the synonymous codon in the two families.
     2. This paper use the clustering analysis of the acute hemorrhagic conjunctivitis virus, cause the disease tested positive for the pathogen of a variety of pathogens, the classical model of protein HP-based classification of amino acids to the main process CLUSTER, respectively WARD method and center of gravity of the virus and the four amino acids cluster, pedigree chart obtained. Clustering results obtained by the amino acid content in several different viruses, so a simple analysis of the codon bias in several different viruses.
     3. Using MEGA software influenza virus hemagglutinin homology and evolutionary analysis done in the basis of homology to get the 16 kinds of influenza A virus hemagglutinin subtype of the phylogenetic tree, based on Phylogenetic tree of evolution, combined with the infection of human RSCU analysis of hemagglutinin evolutionary features, and by phylogenetic tree methods combined with the BLAST analysis of influenza A virus of our current situation and trends.
     The innovations are listed as follows:
     1. In xylanase study, ARIMA model is introduced and the evolutionary trends of some amino acids are analyzed.
     2. Phylogenetic tree based on the combination of the RSCU methods and improved BLAST method with a high application value.
引文
1.赵国屏.生物信息学[M].北京:科学出版社,2002.
    2.张革新.简明生物信息学教程[M].北京:化学工业出版社,2006.
    3.乔纳森·佩夫纳斯.生物信息学与功能基因组学[M].化学工业出版社.
    4.唐旭清,朱平.后基因组时代生物信息学的发展趋势[J].生物信息学,2008,6(1):142-144.
    5.许忠能.生物信息学[M].北京:清华大学出版社,2008.
    6.钟杨,张亮,赵琼.简明生物信息学[M].北京:高等教育出版社,2001.
    7.蔡禄.生物信息学教程[M].北京:化学工业出版社,2006.
    8.蒋彦,王小行,曹毅等.基础生物信息学及应用[M].北京:清华大学出版社,2003.
    9.塞图宝,梅丹尼斯.计算分子生物学导论[M].朱浩等译.北京:科学出版社,2003.
    10.孙啸.生物信息学基础[M].北京:清华大学出版社,2005.
    11.孙向东,刘拥军.蛋白质结构预测[M].北京:科学出版社,2008.
    12. Dan E.Krane & Micheal L.Raymer生物信息学概论[M].北京:清华大学出版社,2004.
    13.涂俐兰.基于快速沃尔什变化的生物序列相似性比对[D]:[硕士学位论文].武汉:华中科技大学计算数学,2004.
    14.贺平安.DNA序列及蛋白质序列的分析与比较[D]:[博士学位论文].大连:大连理工大学,2004.
    15.黄延超.生物序列的非线性关联研究[D]:[博士学位论文].武汉:华中科技大学生物医学工程,2003.
    16. Yi Zhang, Xianhui Wang, Le Kang. A-kmer scheme to predict piRNAs and characterize locust piRNAs[J]. Bioinformatics (2011) 27 (6):771-776.
    17. Rui-Xiang Sun, Meng-Qiu Dong, Chun-Qing Song, et al. Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra [J]. J.Proteome Res
    18. M.D.Sutton, D.E.G.Briggs, David J. Siveter, Derek J.Siveter. A soft-bodied lophophorate from the Silurian of England [J]. Biology Letters.
    19. Yoseph Barash, John A.Calarco, Weijun Gao, et al. Deciphering the splicing code [J].Nature 465:53-59.
    20.阮敬.SAS统计分析从入门到精通[M].北京:人民邮电出版社,2009.
    21.刘亮伟,王明道,高玉千.木聚糖酶研究进展[J].河南农业科学,2006,(6):14-18.
    22.刘亮伟,秦天苍,王宝.木聚糖酶的分子进化[J].食品与生物技术学报,2007,26(6):110-116.
    23. Bastawade K B. Xylan structure, microbial xylanases, and their mode of action [J].World Journal of Microbiology and Biotechnology, 1992, 8(4):353-368.
    24. Sunna, G. Antranikian. Xylanolytic enzymes from fungi and bacteria [J]. Critical Reviews in Biotechnology, 1997, 17(1):39 - 67.
    25. Q.K.Beg, M.Kapoor, L.Mabajan. Microbial xylanases and their industrial applications: a review [J].Applied Microbiology and Biotechnology, 2001, 56(3-4):326-338.
    26. R Vicu?a, Escobar, Osses. Bleaching of eucalyptus pulp with commercial xylanases [J].Biotechnology Letters, 1997, 19(6):575-578.
    27.周晨妍,符丹丹,丰慧根.酸性木聚糖酶XynⅡ活性中心关键氨基酸残基的鉴定[J].生物技术,2010,20(2):17-20.
    28.邬敏辰,符丹丹,朱劼.宇佐美曲霉木聚糖酶的纯化和性质[J].食品与生物技术学报,2005,24(6):29-33.
    29.张年凤,赵允麟.。黑曲霉产木聚糖酶的特性[J].无锡轻工大学学报:食品与生物技术,2003,22(5):38-41.
    30.胡沂淮,李迅,邵蔚蓝.草菇木聚糖酶SDS—PAGE后的复性及活性染色[J].无锡轻工大学学报:食品与生物技术,2003,22(5):102-110.
    31.林范学,程水明,李安政.香菇数量性状的相关性分析和主成分分析[J].菌物学报,2006,25(4):579-586.
    32.袁小平,姚惠源.木聚糖酶的纯化及其对小麦熬皮的作用[J].食品与生物技术学报,2005,24(3):19-23.
    33.薛业敏,曹建平,毛忠贵.木聚糖酶基因在大肠杆菌中的表达及表达蛋白的纯化[J].无锡轻工大学学报:食品与生物技术,2003,22(3):57-61.
    34.裴建军,李迅,李相前.大肠杆菌生产重组极耐热木聚糖酶的诱导条件酶学性质[J].无锡轻工大学学报:食品与生物术,2004,23(4):94-97.
    35.朱平,管维红,高雷.基于氨基酸特征序列的蛋白质结构分析[J].生物信息学,2008,3:106-109.
    36. http://mobyle.pasteur.fr/cgi-bin/portal.py?form=codonw
    37. Zu-Guo Yu, Vo Anh, Ka-Sing Lau. Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. Journal of Theoretical Biology, 226 (2004):341-348.
    38.陈松全,徐学荣.福建省“十二五”期间地区生产总值的预测及目标值建议[J].福建农林大学学报(哲学社会科学版),2010,13(2) :37-41.
    39.刘亮伟,,杨海玉,胡瑜.F/10木聚糖酶研究进展[J].食品与生物技术学报,2009,28(6):727-732.
    40.刘亮伟,秦苍天,翟继.F/10及G/11木聚糖酶家族密码子偏好性分析[J].河南农业大学学报,2008,42(2):223-227
    41.赵静静,齐斌,唐旭清.基于RSCU及QRSCU的F/10及G/11木聚糖酶家族密码子偏好性的比较研究[J] .食品与生物技术学报,2010 ,755-764.
    42.朱平,高雷,徐振源.基于拟氨基酸编码方法的同义密码子的偏好性与结合强度的相关性研究[J] .物理学报,2009,58(6):714-719.
    43. Chatterjee S, Quarcoopome C O, Apenteng A. Unusual type of epidemic conjunctivitis in Ghana [J]. Br J Ophthalmol , 1970 , 54 (9) : 628-630.
    44. Tavares F N , Costa E V , Oliveira S S , et al. Acutehemorrhagic conjunctivitis and coxsackiEvirus A24v , Riode Janeiro , Brazil , 2004 [J ]. Emerg Infect Dis , 2006 , 12(3) : 495-497.
    45. LEveque N, Amine I L, Cartet G, et al. Two outbreaksof acute hemorrhagic conjunctivitis in Af rica due to genotype III coxsackiEvirus A24 variant [J]. Eur J Clin Microbiol Infect Dis , 2007 , 26 (3) : 199-202.
    46.王秀亭,刘淑英,杨建峰,等.急出血性结膜炎的临床表现及病原研究[J].中华实验和临床病毒学杂志,1995,4:348.
    47.齐斌,赵静静,高雷,等.基于RSCU方法的EV71病毒VP1核酸序列的同义密码子的偏好性分析[J].病毒学报,2009,006.
    48.黄惠春,谢若男,陈奕田,等.急出血性结膜炎病毒原检测[J].汕头大学医学院学报,1996,1:77.
    49.丁丽新,张勇,李洁,等.2007年北京市急性出血性结膜炎的病原与分子进化分析[J].病毒学报,2009,004:251-256.
    50.史晓红,刘向荣,罗亮,等.基于氨基酸分类的基本氨基酸秩序的研究[J].生物数学学报,2005,20(4):491-495.
    51.阎隆飞,孙之荣.蛋白质分子结构[M].清华大学出版社,1999:11-13.
    52. R.B.Lyngso, C.N.S. Pedersen, Protein folding in the 2D HP model[C].BRICS,1999, 1-15.
    53.赵静静,齐斌,王寒冰,等.基于矩阵图谱表达法的蛋白质序列的相似性分析[J].计算机工程与应用,2009.3.
    54.王革非,李康生.新世纪流感大流行的思考[J].生物化学与生物物理进展,2009,36(8): 945-949.
    55. Taubenberger J.K, Reid A.H, Krafft A.E, et al.Science 275, 1793-1975, 1997.
    56. Taubenberger J.K, Reid A.H and Fanning G.Virology 274,241-245, 2000.
    57. Brown IH.The epidemiology and evolution of influenza viruses in pigs [J]. Veterinary Microbiology, 2000, 74 (1- 2):29-46.
    58. Webster RG, Bean WJ, Gorman OT, et al. Evolution and Ecology of Influenza A Viruses [J]. Microbiological Reviews, 1992, 56 (1): 152-179.
    59. Peiris JS, de J, Guan Y. Avian influenza virus (H5N1): a threat to human health [J]. Clinical Microbiology Reviews, 2007, 20(2): 243-267.
    60. Fouchier RAM, Munster V, Wallensten A, et al.Characterization of a Novel Influenza A Virus Hemagglutinin Subtype (H16) Obtained from Black2Headed Gulls [J]. Journal of Virology, 2005, 79(5): 2814-2822.
    61. ANDREA KOV?COV?, GABRIEL RUTTKAY-NEDECKY, IVAN KAROL HAVERLIK, et al. Sequence Similarities and Evolutionary Relationships of Influenza Virus A Hemagglutinins [J].Virus Genes 24:1,57-63.
    62. Huang R.T, Rott R, Klenk H.D.Virology 110, 243-247, 1981.
    63. Lenard J, Miller D.K.Virology 110, 479-482, 1981.
    64.王国戗,牛菊霞,贾安.2009年中国流行的甲型流感病毒(H1N1)血凝素特征分析[J]. Chinese Journal of Microecology, 2009, 21(11):998-1000.
    65. Tamura K, Dudley J, Nei M, et al. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol, 2007, 24(8):1596.
    66.鹿文英,殷建华,李淑华,等.2009年新型甲型H1N1流感病毒全基因组序列重组分析及同源性比对[J].解放军医学杂志,2009,34(12):1393-1397.
    67.石秀凡,黄京飞,梁宠荣,柳树群,等.人类基因中同义密码子的偏好性与密码子-反密码子间的结合强度密切相关吗?[J].科学通报,2000,45:2520-2525.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700