用户名: 密码: 验证码:
基于基因家族序列分析研究基因的倍增、分化和多效性
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因家族是指具有共同祖先、序列相似的一组基因。基因家族序列分析已经广泛渗透到现代生物学相关领域的各个方面,成为了生物学研究的一种常规手段。它广泛应用于研究物种的起源、分化、进化机制及检测自然选择压力等。现在还发现基因家族的序列信息对系统生物学的研究有着潜在的应用价值。
     本文试图从三个方面挖掘基因家族所包含的信息:基因的倍增时间、倍增基因的功能分化和基因多效性。虽然本文仅利用了脊椎动物、果蝇、酵母等物种的序列信息,但本文所使用的方法还可以扩展到其物种的相关研究。本文的主要研究内容和结论现归纳如下:
     1.在人类基因组中,基因倍增的时间分布呈现出两个波峰和一个古老基因集。由于功能的限制,不同的功能分类也会表现出基因倍增后保留的差异性,并且各个功能分类之间还会相互影响,从而保持相似的基因保留模式。现在已经知道某些功能的基因,在脊椎动物早期大量倍增,但各个功能分类间的基因倍增时间相关性还没有相关的研究。为此,我们开发了一套可靠的用来估计基因倍增时间的流程,并用之估计了人类中大部分基因的倍增时间,分析了不同功能分类间以及不同表达位置间的相关性。以G0分类来看,功能相关的基因分类聚在了一起。所有G0分类中,形成了两个与全基因组明显差异的功能组。一个是发育相关的功能组。另一个是生物生理过程相关的功能组。我们利用了更严谨的估计基因倍增时间的方法研究了三个信号转导相关的基因超家族:转录因子、蛋白激酶和G蛋白受体。它们的倍增时间分布模式与G0分类中的信号转导分类是相似的。此外,我们还比较了在不同细胞位置表达的倍增基因的时间分布模式。从细胞核到细胞外,基因倍增在时间分布模式上与全基因组是相似的。细胞外表达的基因在近600百万年以来,倍增速度有一定幅度的加快。
     2.基因倍增及其后的功能分化被认为是基因组功能多样性的来源。虽然已有几个模型用来描述基因倍增后的功能分化模式,但还是很有必要建立合适的倍增基因间的功能距离测度。我们提出了一种量度两个直系同源基因簇间功能距离的简单方法。我们对经过两次基因倍增后产生的具有三个直系同源基因簇的脊椎动物基因家族进行了一种新的统计检验,发现了基因倍增后功能分化的两种模式。这两种分化模式显示出基因分化对两次基因倍增具有不同的作用。功能距离分析可以为基因分化后不同直系同源基因簇间的功能分化水平提供简单的测度并将有助于理解功能基因组中的功能创新机制。
     3.基因多效性是指一个基因能够同时影响多个表型,这是基因的一个很普通的属性。生物学家很久以来就意识到了基因多效性的重要性。然而,对基因多效性的范围还没有经过严格的探究。理论上,Fisher的模型假设了一个广泛多效性模型,也就是说,一个基因的突变可以潜在地影响所有的性状。另外一方面,实验的结果表明基因通常仅能影响几个不同的表型。我们估计了321个脊椎动物基因的多效性,发现一个基因通常只能影响6-7个的分子表型(对应于生物适应度的维度)。另外,我们发现估计出来的基因多效性与G0生物过程数目和表达宽度是正相关的。这说明了这种估计基因多效性的测度具有确定的生物学意义。此外,根据我们的结果,基因多效性具有一个确定的数值,所以在理论研究时,假定基因突变具有广泛多效性,是值得商榷的。
     4.介绍了一个计算基因多效性的可视化软件Genepleio,并对计算误差进行了评估和就如何改善计算误差提出了建议。为了扩大基因多效性测度的应用范围,本文还提供了一个位点多效性的计算方法。利用Genepleio,我们研究了三类物种(脊椎动物、果蝇、酵母)中基因多效性分布的情况,发现脊椎动物和果蝇具有相似的基因多效性分布,并且它们的基因多效性平均值低于酵母。进一步关于人类疾病相关的多效性分析,发现大多数疾病基因的多效性仅仅略低于平均多效性水平。
A gene family denotes a subset of genes that has the common ancestor and sequence similarity. Now, sequence analysis of gene family has permeated into every aspects of biology-related area and become a regular tool. It is widely used for the investigation of the species origination, differentiation, evolutionary mechanism and detection of natural selection.
     In the dissertation, we tried to dig the information behind gene families from three aspects:duplication time, functional divergence and gene pleiotropy. Although we only use sequences from vertebrates, flies and yeasts, the method we used can be extended to similar research on other species. Our major research content and conclusion are as the following:
     1. In human genome, the age distribution of gene duplication was found to present two-wave duplication and an ancient component. By functional constraint, gene duplication in different functional categories will probably show different age distribution. However, functional categories are correlated and will affect each other, which result in similar gene retention pattern. There is finding that some functional categories have bias in the retention in different stages of evolution, but the correlation between different categories is not investigated. Thus, we developed a pipeline to estimate the age of duplication events with which we estimated most of the age of duplication events in human and zebra fish. We analyzed the retention pattern of duplicate genes in different GO functional categories and found two distinct patterns. One cluster is correlated with development and signaling. The other cluster is correlated with organism physiology process. For detailed information of the first cluster, we used a stricter method to estimate the duplication age of genes from three signaling-related super gene families. Their age distribution pattern is similar to the GO "signal transduction" categories. Besides, we compared the difference of age distribution of genes in different subcellular localizations. We found, from the nucleus to extracellular space, the age distribution patterns are almost similar. Besides, in the recent 600 million years, the gene duplication has accelerated a bit. Summarily, gene duplication is consistent in function-related categories as well in different subcellular localizations.
     2. The gene duplication and following functional differentiation are considered the source of function diversification of genomes. Although several models have been proposed to describe the patterns of functional divergence after gene duplication, an appropriate measure of functional distance between different duplicates is highly needed. In this paper, we proposed a simple method to measure the functional distance between each two subfamilies. We have performed a new statistical test on ten 3-cluster vertebrate gene families which have been generated after two rounds of whole genome duplications, and found two patterns of functional divergence after gene duplication(s), indicating two rounds of gene duplications may have distinct roles in the functional diversification. Functional distance analysis may provide a simple measure for the level of functional divergence between gene clusters after gene duplication(s) and further shed light on the mechanism of functional innovations in functional genomics.
     3. Biologists have long recognized the importance of gene pleiotropy, that is, single genes affect multiple traits, which is one of the most commonly observed attributes of genes. Yet the extent of gene pleiotropy has been seriously under-explored. Theoretically, Fisher's model assumed a universal pleiotropy, that is, a mutation can potentially affect all phenotypic traits. On the other hand, experimental assays of a gene usually showed a few distinct phenotypes. We estimated the effective gene pleiotropy for 321 vertebrate genes, and found that a gene typically affects 6-7 molecular phenotypes that correspond to the components of organism fitness, respectively. The positive correlation of gene pleiotropy with the number of gene ontology biological processes, as well as the expression broadness provides a biological basis for the sequence-based estimation of gene pleiotropy. On the other hand, the degree of gene pleiotropy has been restricted to a digital number of molecular phenotypes, indicating that some cautions are needed for theoretical analysis of gene pleiotropy based on the assumption of universal pleiotropy.
     4. We introduce a software, Genepleio, to calculate gene pleiotropy, and calculate the estimation error and give the suggestion to improve the estimation. For wider appliance of gene pleiotropic measure, we also designed a method to estimate the site-specific gene pleiotropy. Using Genepleio, we studied the gene pleiotropy distribution in vertebrates, flies and yeasts. We found vertebrates and flies have similar gene pleiotropy distribution, with average of gene pleiotropy below yeasts. Moreover, we calculate the gene pleiotropy of disease-related genes. Their pleiotropy is only a little below other genes.
引文
Adkins, R. M., and Honeycutt, R. L. (1994). Evolution of the primate cytochrome c oxidase subunit II gene. J Mol Evol 38,215-231.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25,3389-3402.
    Apic, G., Gough, J., and Teichmann, S. A. (2001). Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol 310, 311-325.
    Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A.,O'Donovan, C., Redaschi, N., and Yeh, L. S. (2005). The Universal Protein Resource (UniProt). Nucleic Acids Res 33, D154-159.
    Barthel, D., Hirst, J. D., Blazewicz, J., Burke, E. K., and Krasnogor, N. (2007). ProCKSI:a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information. BMC Bioinformatics 8,416.
    Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Howe, K. L., and Sonnhammer, E. L. (2000). The Pfam protein families database. Nucleic Acids Res 28,263-266.
    Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005). GenBank. Nucleic Acids Res 33, D34-38.
    Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., and Van de Peer, Y. (2006). The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol 7, R43.
    Chen, S.L., Lee, W., Hottes, A.K., Shapiro, L., and McAdams, H.H. (2004). Proc. Natl. Acad. Sci. USA 101,3480.
    Chiu, J. C., Lee, E. K., Egan, M. G., Sarkar, I. N., Coruzzi, G. M., and DeSalle, R. (2006). OrthologID:automation of genome-scale ortholog identification within a parsimony framework. Bioinformatics 22,699-707.
    Clark, A. G., Glanowski, S., Nielsen, R., Thomas, P. D., Ke jariwal, A., Todd, M. A., Tanenbaum, D. M., Civello, D., Lu, F., Murphy, B., Ferriera, S., Wang, G., Zheng, X.G., White, T. J., Sninsky, J. J., Adams, M. D., and Cargill, M. (2003). Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, 1960-1963.
    Corpet, F., Gouzy, J., and Kahn, D. (1998). The ProDom database of protein domain families. Nucleic Acids Res 26,323-326.
    Dehal, P. S., and Boore, J. L. (2006). A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database. BMC Bioinformatics 7,201.
    Dickerson, R. E. (1971). The structures of cytochrome c and the rates of molecular evolution. J Mol Evol 1,26-45.
    Domazet-Loso, T., and Tautz, D. (2003). An evolutionary analysis of orphan genes in Drosophila. Genome Res 13,2213-2219.
    Durbin, R., Eddy, S.R., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis:Probabilistic Models of Proteins and Nucleic Acids. (Cambridge University Press.).
    Ekman, D., Bjorklund, A.K., Frey-Skott, J., and Elofsson, A. (2005). Multi-domain proteins in the three kingdoms of life:orphan domains and other unassigned regions. J Mol Biol 348,231-243.
    Endo, T., Ikeo, K., and Gojobori, T. (1996). Large-scale search for genes on which positive selection may operate. Mol Biol Evol 13,685-690.
    Enright, A. J., Van Dongen, S., and Ouzounis, C. A. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30,1575-1584.
    Felsenstein, J. (1989). PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5,164-166.
    Fitch, W. M. (1970). Distinguishing homologous from analogous proteins. Syst Zool 19,99-113.
    Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L., and Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151,1531-1545.
    Funk, D. J., and Omland, K. E. (2003). Species-level paraphyly and polyphyly:frequency, causes, and consequences, with insights from animal mitochondrial dna. Annu. Rev. Ecol. Evol. Syst.,26.
    Gilad, Y., Man,O., Paabo, S., and Lancet, D. (2003). Human specific loss of olfactory receptor genes. Proc Natl Acad Sci U S A 100,3324-3327.
    Gingerich, P. D., Haq, M., Zalmout, I. S., Khan, I. H., and Malkani, M. S. (2001). Origin of whales from early artiodactyls:hands and feet of Eocene Protocetidae from Pakistan. Science 293,2239-2242.
    Gouy, M., and Gautier, C. (1982). Nucleic Acids Res.10,7055.
    Gouy, M., Gautier, C., Attimonelli, M., Lanave, C., and di Paola, G. (1985). ACNUC-a portable retrieval system for nucleic acid
    sequence databases:logical and physical designs and usage. Comput Appl Biosci 1,167-172.
    Graur, D., and Higgins, D. G. (1994). Molecular evidence for the inclusion of cetaceans within the order Artiodactyla. Mol Biol Evol 11, 357-364.
    Gu, X. (1999). Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16,1664-1674.
    Gu, X. (2007). Evolutionary Framework for Protein Sequence Evolution and Gene Pleiotropy, pp.1813-1822.
    Gu, X., and Vander Velden, K. (2002). DIVERGE:phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18,500-501.
    Gu, X., Wang, Y., and Gu, J. (2002). Age distribution of human gene families shows significant roles of both large-and small-scale duplications in vertebrate evolution. Nat Genet 31,205-209.
    Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P., and Kasprzyk, A. (2009). BioMart Central Portal-unified access to biological data. Nucleic Acids Res 37, W23-27.
    Heger, A., and Holm, L. (2000). Towards a covering set of protein family profiles. Prog Biophys Mol Biol 73,321-337.
    Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X. M., Gilbert, J., Hammond, M., Herrero, J., Hotz, H., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., Kokocinsci, F., London, D., Longden, I., McVicker, G., Melsopp, C., Meidl, P., Potter, S., Proctor, G., Rae, M., Rios, D., Schuster, M., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and Birney, E. (2005). Ensembl 2005. Nucleic Acids Res 33, D447-453.
    Huchon, D., Catzeflis, F.M., and Douzery, E.J. (2000). Variance of molecular datings, evolution of rodents and the phylogenetic
    affinities between Ctenodactylidae and Hystricognathi. Proc Biol Sci 267,393-402.
    Hughes, A. L., and Nei, M. (1988). Pattern of Nucleotide Substitution at Major Histocompatibility Complex Class-I Loci Reveals Overdominant Selection. Nature 335,167-170.
    Hulsen, T., Huynen, M.A., de Vlieg, J., and Groenen, P.M. (2006). Benchmarking ortholog identification methods using functional genomics data. Genome Biol 7, R31.
    Hunter, S., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R. D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A. F., Selengut, J. D., Sigrist, C. J., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., and Yeats, C. (2009). InterPro:the integrative protein signature database. Nucleic Acids Res 37, D211-215.
    Huynen, M. A., and Bork, P. (1998). Measuring genome evolution. Proc Natl Acad Sci U S A 95,5849-5856.
    Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. Journal of Molecular Biology 146,1-21.
    Ikemura, T. (1985). Mol. Biol. Evol.2,13.
    Joseph, J.M., and Durand, D. (2009). Family classification without domain chaining. Bioinformatics 25,ⅰ45-53.
    Kanaya, S., Yamada, Y., Kinouchi, M., Kudo, Y., and Ikemura, T. (2001). Codon usage and tRNA genes in eukaryotes:Correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. Journal of Molecular Evolution 53,290-298.
    Khalturin, K., Anton-Erxleben, F., Sassmann, S., Wittlieb, J., Hemmrich, G., and Bosch, T.C. (2008). A novel gene family controls species-specific morphological traits in Hydra. PLoS Biol 6, e278.
    Kim, P.M., Korbel, J.0., and Gerstein, M. B. (2007). Positive selection at the protein network periphery:evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci U S A 104, 20274-20279.
    Kimura, M. (1968). Evolutionary Rate at Molecular Level. Nature 217, 624-&.
    Kimura, M. (1983). The neutral theory of molecular evolution. (Cambridge, UK.:Cambridge University Press).
    Kimura, M., and Ohta, T. (1973). Eukaryotes-prokaryotes divergence estimated by 5S ribosomal RNA sequences. Nat New Biol 243,199-200.
    King, R.C., and Stanseld., W.D. (1990). A Dictionary of Genetics (Oxford University Press).
    Krishnamurthy, N., Brown, D., and Sjolander, K. (2007). FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function. BMC Evol Biol 7 Suppl 1, S12.
    Kumar, S., and Hedges, S. B. (1998). A molecular timescale for vertebrate evolution. Nature 392,917-920.
    Long, M., Betran, E., Thornton, K., and Wang, W. (2003). The origin of new genes:glimpses from the young and old. Nat Rev Genet 4,865-875.
    Lynch, M., and Force, A. (2000). The probability of duplicate gene preservation by subfunctionalization. Genetics 154,459-473.
    Makarova, K. S., Sorokin, A. V., Novichkov, P. S., Wolf, Y. I., and Koonin, E. V. (2007). Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea. Biol Direct 2,33.
    Margulies, E. H., and Birney, E. (2008). Approaches to comparative sequence analysis:towards a functional view of vertebrate genomes. Nature Reviews Genetics 9,303-313.
    Marques, A. C., Dupanloup, I., Vinckenbosch, N., Reymond, A., and Kaessmann, H. (2005). Emergence of young human genes after a burst of retroposition in primates. PLoS Biol 3, e357.
    McLaughlin, P. J., and Dayhoff, M. D. (1970). Eukaryotes versus prokaryotes: an estimate of evolutionary distance. Science 168,1469-1471.
    Medvedev, P., Stanciu, M., and Brudno, M. (2009). Computational methods for discovering structural variation with next-generation sequencing. Nat Methods 6, S13-20.
    Nei, M., and Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3,418-426.
    Nei, M., and Kumar, S. (2000). Molecular evolution and phylogenetics. (Oxford Univ. Press).
    Nielsen, R., Bustamante, C., Clark, A. G., Glanowski, S., Sackton, T. B., Hubisz, M. J., Fledel-Alon, A., Tanenbaum, D. M., Civello, D., White, T.J., Sninsky, J.J., Adams, M.D., and Cargill, M. (2005). A scan for positively selected genes in the genomes of humans and chimpanzees. Plos Biology 3,976-985.
    Ohno, S. (1970). Evolution by Gene Duplication (Springer-Verlag).
    Ohta, T. (1973). Slightly Deleterious Mutant Substitutions in Evolution. Nature 246,96-98.
    Peden, J. (2005). CodonW.
    Penel, S., Arigon, A. M., Dufayard, J. F., Sertier, A. S., Daubin, V., Duret, L., Gouy, M., and Perriere, G. (2009). Databases of homologous gene families for comparative genomics. BMC Bioinformatics 10 Suppl 6, S3.
    Posada, D., and Crandall, K.A. (1998). MODELTEST:testing the model of DNA substitution. Bioinformatics 14,817-818.
    Prachumwat, A., and Li, W. H. (2008). Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome Res 18,221-232.
    Reid, A. J., Yeats, C., and Orengo, C. A. (2007). Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics 23,2353-2360.
    Sadreyev, R., and Grishin, N. (2003). COMPASS:a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326,317-336.
    Schmid, K. J., and Tautz, D. (1997). A screen for fast evolving genes from Drosophila. Proc Natl Acad Sci U S A 94,9746-9750.
    Schmid, K. J., and Aquadro, C. F. (2001). The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159,589-598.
    Soding, J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics 21,951-960.
    Sonnhammer, E. L., and Koonin, E. V. (2002). Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 18, 619-620.
    Su, Z.X., Zeng, Y.W., and Gu, X. (2010). A Preliminary Analysis of Gene Pleiotropy Estimated from Protein Sequences. Journal of Experimental Zoology Part B-Molecular and Developmental Evolution 314B,115-122.
    Swofford, D. L. (1993). Paup-a Computer-Program for Phylogenetic Inference Using Maximum Parsimony. Journal of General Physiology 102, A9-A9.
    Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4:Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24,1596-1599.
    Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y.I., Yin, J.J., and Natale, D.A. (2003). The COG database:an updated version includes eukaryotes. BMC Bioinformatics 4,41.
    Tian, X., Pascal, G., and Monget, P. (2009). Evolution and functional divergence of NLRP genes in mammalian reproductive systems. Bmc Evolutionary Biology 9,
    Turner, D. J., Keane, T.M., Sudbery, I., and Adams, D.J. (2009). Next-generation sequencing of vertebrate experimental organisms. Mamm Genome 20,327-338.
    Waddell, P. J., Kishino, H., and Ota, R. (2007). Phylogenetic methodology for detecting protein interactions. Molecular Biology and Evolution 24,650-659.
    Watson, J. D., and Crick, F. H. (1953). Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171,737-738.
    Wilson, D., Charoensawan, V., Kummerfeld, S. K., and Teichmann, S. A. (2008). DBD-taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res 36, D88-92.
    Wingender, E., Dietze, P., Karas, H., and Knuppel, R. (1996). TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24,238-241.
    Yang, Z. (1997). PAML:a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13,555-556.
    Yoder, A. D., and Yang, Z. (2000). Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17,1081-1090.
    Zhang, G., Wang, H., Shi, J., Wang, X., Zheng, H., Wong, G. K., Clark, T., Wang, W., Wang, J., and Kang, L. (2007). Identification and characterization of insect-specific proteins by genome data analysis. BMC Genomics 8,93.
    Zheng, Y., Xu, D., and Gu, X. (2006). Functional divergence after gene duplication and sequence-structure relationship:a case study of G-protein alpha subunits. J Exp Zoolog B Mol Dev Evol.
    Zuckerkandl, E., and Pauling, L. B. (1962). Molecular disease, evolution, and genetic heterogeneity. (New York:Horizons in Biochemistry. Academic Press).
    Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000). Gene ontology:tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25,25-29.
    Blomme, T., Vandepoele, K., De Bodt, S., Simillion, C., Maere, S., and Van de Peer, Y. (2006). The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol 7, R43.
    Darbo, E., Danchin, E. G. J., Mc Dermott, M. F. P., and Pontarotti, P. (2008). Evolution of major histocompatibility complex by "en bloc" duplication before mammalian radiation. Immunogenetics 60, 423-438.
    Donoghue, P. C., and Purnell, M. A. (2005). Genome duplication, extinction and vertebrate evolution. Trends Ecol Evol 20,312-319.
    Duret, L., Mouchiroud, D., and Gouy, M. (1994). HOVERGEN:a database of homologous vertebrate genes. Nucleic Acids Res 22,2360-2365.
    Ensembl. (Release 45). http://jun2007.archive.Ensembl.org.
    Fredriksson, R., Lagerstrom, M. C., Lundin, L. G., and Schioth, H. B. (2003). The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol 63,1256-1272.
    Freeling, M., and Thomas, B.C. (2006). Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res 16,805-814.
    Gu, X., Wang, Y., and Gu, J. (2002a). Age distribution of human gene families shows significant roles of both large-and small-scale duplications in vertebrate evolution. Nat Genet 31,205-209.
    Gu, Z., Cavalcanti, A., Chen, F. C., Bouman, P., and Li, W. H. (2002b). Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol Biol Evol 19,256-262.
    Hao, L., and Nei, M. (2005). Rapid expansion of killer cell immunoglobulin-like receptor genes in primates and their coevolution with MHC Class I genes. Gene 347,149-159.
    Huang, Y., Zheng, Y., Su, Z., and Gu, X. (2009). Differences in duplication age distributions between human GPCRs and their downstream genes from a network prospective. BMC Genomics 10 Suppl 1, S14.
    Kellis, M., Birren, B. W., and Lander, E. S. (2004). Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428,617-624.
    Kim, P.M., Korbel, J.0., and Gerstein, M. B. (2007). Positive selection at the protein network periphery:evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci U S A 104, 20274-20279.
    Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002). The protein kinase complement of the human genome. Science 298,1912-1934.
    Meyer, A., and Van de Peer, Y. (2005). From 2R to 3R:evidence for a fish-specific genome duplication (FSGD). Bioessays 27,937-945.
    Niimura, Y., and Nei, M. (2007). Extensive Gains and Losses of Olfactory Receptor Genes in Mammalian Evolution. Plos One 2,
    Ohno, S. (1970). Evolution by Gene Duplication (Springer-Verlag).
    Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME (2006). Global variation in copy number in the human genome. Nature 444(7118),444-54.
    Saitou, N., and Nei, M. (1987). The neighbor-joining method:a new method for reconstructing phylogenetic trees. Mol Biol Evol 4,406-425.
    Sprenger, J., Lynn Fink, J., Karunaratne, S., Hanson, K., Hamilton, N. A., and Teasdale, R. D. (2008). LOCATE:a mammalian protein subcellular localization database. Nucleic Acids Res 36, D230-233.
    Su, Z., Huang, Y., and Gu, X. (2007). Tissue-driven hypothesis with Gene Ontology (GO) analysis. Ann Biomed Eng 35,1088-1094.
    Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4:Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24,1596-1599.
    Van de Peer, Y., Maere, S., and Meyer, A. (2010).2R or not 2R is not the question anymore. Nature Reviews Genetics 11,
    Vandepoele, K., De Vos, W., Taylor, J. S., Meyer, A., and Van de Peer, Y. (2004). Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci U S A 101,1638-1643.
    Vision, T.J., Brown, D.G., and Tanksley, S.D. (2000). The origins of genomic duplications in Arabidopsis. Science 290,2114-2117.
    Wolfe, K. H., and Shields, D. C. (1997). Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387,708-713.
    Yang, J., Lusk, R., and Li, W. H. (2003). Organismal complexity, protein complexity, and gene duplicability. Proc Natl Acad Sci U S A 100, 15661-15665.
    Online Mendelian Inheritance in Man, OMIM (TM). (McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD)).
    Agrafioti I, Swire J, Abbott J, et al. (2005) Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks. BMC Evol Biol.; 5:23.
    Ashburner, M., Ball, C.A., Blake, J. A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000). Gene ontology:tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25,25-29.
    Batada, NN, Hurst LD, Tyers M. (2006). Evolutionary and physiological importance of hub proteins. PLoS Comput Biol 2:e88.
    Benzinou, M., Creemers, J. W. M., Choquet, H., Lobbens, S., Dina, C., Durand, E., Guerardel, A., Boutin, P., Jouret, B., Heude, B., Balkau, B., Tichet, J., Marre, M., Potoczna, N., Horber, F., Le Stunff, C., Czernichow, S., Sandbaek, A., Lauritzen, T., Borch-Johnsen, K., Andersen, G., Kiess, W., Korner, A., Kovacs, P., Jacobson, P., Carlsson, L. M. S., Walley, A. J., Jorgensen, T., Hansen, T., Pedersen,O., Meyre, D., and Froguel, P. (2008). Common nonsynonymous variants in PCSK1 confer risk of obesity. Nat Genet advanced online publication.
    Bloom JD, Adami C (2004). Evolutionary rate depends on number of protein-protein interactions independently of gene expression level:Response. BMC Evol Biol.; 4:14.
    Clark, A. G., Eisen, M. B., Smith, D. R., Bergman, C.M., and Oliver, B. (2007). Evolution of genes and genomes on the Drosophila phylogeny. Nature 450,203-218.
    Dudley AM, Janse DM, Tanay A, Shamir R, Church GM (2005). A global view of pleiotropy and phenotypically derived gene function in yeast. Mol Syst Biol,1:0001.
    Felsenstein, J. (1989). PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics,164-166.
    Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C., and Feldman, M. W. (2002). Evolutionary rate in the protein interaction network. Science 296,750-752.
    Gu, X. (2007a). Evolutionary Framework for Protein Sequence Evolution and Gene Pleiotropy, pp.1813-1822.
    Gu, X. (2007b). Stabilizing selection of protein function and distribution of selection coefficient among sites. Genetica,130:93-97.
    Gu X, Su Z. (2007). Tissue-driven hypothesis of genomic evolution and sequence-expression correlations. Proc Natl Acad Sci USA, 104:2779-2784.
    Gu, X., and Zhang, J. (1997). A simple method for estimating the parameter of substitution rate variation among sites, pp.1106-1113.
    Hahn, MW, Conant GC, Wagner A (2004). Molecular evolution in large genetic networks:does connectivity equal constraint? J Mol Evol 58: 203-211.
    Hirsh AE, Fraser HB (2001). Protein dispensability and rate of evolution. Nature 2001,411(6841):1046-1049.
    Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002). Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res,12(6):962-968.
    Jovelin, R., and Phillips, P. C. (2009). Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol 10, R35.
    Koonin EV, Wolf YI 2006). Evolutionary systems biology:links between gene evolution and function. Curr Opin Biotechnol,17(5):481-487.
    Kimura M (1980) A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111-120.
    Liang, H., and Li, W. H. (2007). Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends in Genetics 23, 375-378.
    Liang, H., Plazonic, K. R., Chen, J. P., Li, W. H., and Fernandez, A. (2008). Protein under-wrapping causes dosage sensitivity and decreases gene duplicability. Plos Genetics 4,
    Loewe, L. (2009). A framework for evolutionary systems biology. BMC Syst Biol 3,27.
    Man,O., and Pilpel, Y. (2007). Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nat Genet 39,415-421.
    Pal C, Papp B, and Hurst LD (2003). Genomic function:Rate of evolution and gene dispensability. Nature,421:496-497; discussion 497-498.
    Prachumwat, A., and Li, W.H. (2006). Protein function, connectivity, and duplicability in yeast. Mol Biol Evol 23,30-39.
    Prachumwat, A., and Li, W. H. (2008). Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome Res 18,221-232.
    Su Z, Huang Y, Gu X (2007). Gene ontology analysis of tissue-driven hypothesis for genomic evolution. Annals of Biomedical Engineering,35:1088-1094.
    Su Z, Zeng Y, Gu X (2009). A preliminary analysis of gene pleiotropy estimated from protein sequences. J Exp Zool B Mol Dev Evol. 314B:115-122.
    Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL:robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res 34, W609-612.
    Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673-4680.
    Turelli, M. (1985). Effects of pleiotropy on predictions concerning mutation-selection balance for polygenic traits. Genetics 111, 165-195.
    Waddell, P.J., Kishino, H., and Ota, R. (2007). Phylogenetic methodology for detecting protein interactions. Molecular Biology and Evolution 24,650-659.
    Wagner GP, Kenney-Hunt JP, Pavlicev M, Peck JR, Waxman D, Cheverud JM (2008). Pleiotropic scaling of gene effects and the'cost of complexity'. Nature,452:470-472.
    Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW (2005). Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA,102(15):5483-5488.
    Wang, Z., Zhang, J. (2009). Why is the correlation between gene importance and gene evolutionary rate so weak? PLoS Genet 5(1):e1000329
    Waxman, D., and Peck, J. R. (1998). Pleiotropy and the preservation of perfection. Science 279,1210-1213.
    Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem,46:573-639.
    Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW (2005). Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA,102(15):5483-5488.
    Wolf YI, Carmel L, Koonin EV (2006) Unifying measures of gene function and evolution. Proc Biol Sci,273:1507-1515.
    Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ. (2009). The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A, 5;106(18):7273-80.
    Wolf MY, Wolf YI, Koonin EV (2008). Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biology Direct,3:40
    Yang, J., Lusk, R., and Li, W. H. (2003). Organismal complexity, protein complexity, and gene duplicability. Proc Natl Acad Sci U S A 100, 15661-15665.
    Yang, Z. (1997). PAML:a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13,555-556.
    Zhang, J., and Nei, M. (1997). Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J Mol Evol 44 Suppl 1, S139-146.
    Zou, L., Sriswasdi, S., Ross, B., Missiuro, P. V., Liu, J., and Ge, H. (2008). Systematic analysis of pleiotropy in C. elegans early embryogenesis. PLoS Comput Biol 4, e1000003.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700