用户名: 密码: 验证码:
基于表型以及微阵列数据的基因(型)分类技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
分离分析(Segregation Analysis, SA)是直接根据分离群体数量性状的表现型检测主基因是否存在并估计其效应的一种统计遗传分析方法,是进一步进行QTL作图和基因组分析的基础。在数量性状主基因和微基因独立的遗传假定下,同一主基因基因型将呈现连续性的正态分布,不同主基因基因型则将是具有不同平均数和相同方差的多个正态分布的混合。因此,分离分析通过高斯混合模型的构建、参数的极大似然估计以及似然比检验统计量的计算,从而实现主基因的效应估计和各种遗传假设测验。
     然而,现有的分离分析方法均是基于单一性状进行的,主基因的统计功效较低。为此,本研究提出一种多性状主基因联合分析方法—多元分离分析方法(Multivariate Segregation Analysis, MSA),MSA可以充分利用多个数量性状间的遗传相关和剩余相关信息,因此有望提高主基因的检测功效,以及剖析复杂性状的遗传结构。MSA通过建立多个多元高斯分布的混合模型,采用EM算法实现的极大似然估计方法进行主基因的分离比例、主基因效应和剩余变异估计,以似然比测验统计量进行主基因的各种遗传假设检验,以一因多效、独立遗传和紧密连锁3种可能模型下的贝叶斯信息准则(Bayesian Information Criterion, BIC)来区分主基因是一因多效还是紧密连锁。为了验证方法的可行性,模拟研究以F2群体为例设置了两套模拟实验,模拟实验1研究不同主基因遗传力和样本容量下MSA的统计功效、主基因效应和剩余变异估计的准确度和精确度。模拟实验2研究不同遗传力下MSA区分一因多效主基因或紧密连锁主基因的能力。计算机模拟研究结果表明:(1)无论主基因是同时控制多个性状的表达,还是仅控制其中一个性状的表达,由于联合分析充分利用了性状之间的相关信息,MSA均可以显著提高主基因的被发现能力。(2)MSA可以显著增加主基因效应估计值的准确度和精确度,通常来说,只要主基因的检测功效高达50%以上,其相应估计值的准确度和精确度均可达到较理想水平。(3)MSA还能够有效的区分多性状是受一个主基因控制还是受紧密连锁的多个主基因控制。(4)对遗传力和样本容量两个影响主基因检测功效的关键因素来说,其作用效果则是遗传力明显大于样本容量。以水稻杂交组合多蘖矮×中花11的F2群体597个植株株高和分蘖数为例演示了分析程序。结果表明该组合的株高和分蘖数受同一主基因控制。该主基因对株高的加性和显性效应分别为-21.3 cm和40.6 cm,表现为超显性;对分蘖数的加性和显性效应则分别为22.7和-25.3,表现为接近完全显性。
     上述MSA不仅可以估计模型中的遗传参数,而且可计算出每个个体属于不同主基因基因型的后验概率,因此,本研究提出根据个体的贝叶斯后验概率进行个体分类的新方法,即一种基于模型的非监督动态聚类方法。该方法同样是以EM算法实现的极大似然估计方法实现各个类参数估计,以个体所属类别的贝叶斯后验概率判别个体的归类。模拟研究结果表明:(1)该方法通常既可无偏估计类参数又可根据各种模型的BIC值确定最佳分类个数,从而解决传统动态聚类法类数难确定的问题。(2)与重心法动态聚类(k-means)和最小组内平方和法(Minimum Square Sum Within Groups, MinSSw)动态聚类相比,稳健性较高。(3)通过提高判别标准,可以有效降低误判率(Misclassified Rate, MR)。以Fisher的Iris试验数据验证了方法的可行性,分析结果表明基于似然函数极大为目标的非监督动态聚类方法特别适于原始数据为高斯分布的数据聚类,其误判率显著低于k-means和MinSSw法。
     DNA微阵列技术是后基因组时代功能基因组研究的主要工具之一,它可以一次同时测出不同实验环境或不同组织的成千上万个基因的表达水平。将相似表达模式的基因聚在一个类中的基因聚类分析,是提取基因表达谱数据潜在生物学信息的有用工具,同时也是微阵列数据分析中使用最为广泛的一类方法。聚类技术依据先验信息的有无,又可分为非监督聚类和监督聚类。为了探讨上述基于模型的聚类方法应用于高维微阵列表达谱数据分析的可行性,分别用计算机模拟数据、酵母细胞周期微阵列数据以及人类癌细胞NCI-60微阵列数据进行聚类分析,并与k-最近邻居法(k-Nearest Neighbour, KNN),二分类支持向量机器(Supprot Vector Machines, SVMs)以及多分类SVMs(Multicategory SVMs, MC-SVMs)法分析结果进行比较,采用假阳性(False Positive, FP)、假阴性(False Negative, FN)、聚类的准确性以及马修斯相关系数(Matthews’Correlation Coefficient, MCC)等指标比较不同监督聚类方法的优劣及其适用场合。结果表明:(1)对成千上万基因表达谱数据,基于模型的聚类法聚类准确性最高,且在训练样本容量较小的情况下,同时利用已知基因和未知基因的先验信息指导未知基因归类的基于模型的监督聚类法,比仅利用已知基因的信息指导未知基因归类的基于模型的判别分类准确性要高,但运算速度较慢。(2)相比较而言,MC-SVMs法稳健性较高,适用性最广,其对高维数据不敏感。不仅适用于成千上万基因表达谱数据的聚类,聚类准确性仅次于基于模型的监督聚类法;而且适用于以成千上万基因作为指标对少数几十个样本的聚类,聚类准确性最高。(3)几种MC-SVMs法的表现,在样本容量较大时,宜采用OVO(One-versus-one)和DAGSVM(Directed Acyclic Graph SVM)法;样本容量较小时,OVR(One-versus-rest)、WW(Method by Weston and Watkins)和CS(Method by Crammer and Singer)法聚类准确性和MCC值较高;样本容量适中时,5种MC-SVMs表现一致。(4)建议根据数据的特征以及实验需要,同时选用至少两种方法进行试算,以便获得最佳聚类结果。
Segregation analysis (SA) is a statistical genetic method directly using the phenotype of quantitative traits in segregation population to detect the existence of major genes and estimate their effects. It serves as an important tool in helping investigators to plan further studies such as quantitative trait loci mapping or more sophisticated genomic analyses. Under the assumption that the major gene effects and polygenic effects are independent, the individuals with the same major gene genotype are expected to be normally distributed, whereas individuals with different major gene genotypes could follow a mixture of normal distributions with different means and the same variance. Therefore, the estimation of major gene effects and genetic hypothsis testing in SA were implemented through the construction of Gaussian mixture model, the maximum likelihood (ML) estimation of parameters and the calculation of the likelihood ratio test (LRT) statistics.
     However, current methods of SA for a single trait typically have low statistical power. In this study, we propose a joint analysis method for multiple traits, i.e., multivatiate segregation analysis (MSA) that takes advantage of the genetic and residual correlation information of multiple quantitative traits to detect major genes. It is hopeful that this method not only increases the statistical power, but allows dissection of the genetic architecture underlying the trait complex. In MSA the observed phenotypes of multiple correlated traits are fitted to a multivariate Gaussian mixture model. The separated proportion, major gene effects and residual variabilities are estimated under the ML framework via the expectation-maximization (EM) algorithm. Various genetic hypothesis tests of major genes are tested using LRT statistics. Pleiotropy is distinguished from close linkage by comparing three possible models using the Bayesian information criterion (BIC). Three models are the complete pleiotropic model, the linkage model and the non-linkage/independent model respectively. Two simulation experiments were performed based on the F2 mating design to validate the feasibility of this method. In the first, the statistical powers and the accuracy and the precision of genetic effects along with residual variabilities of MSA under varying heritabilities and sample size were investigated. In the second simulation the efficacy of MSA in separating pleiotropy from close linkage under varying heritabillities was demonstrated. The results of extensive simulation showed (1) MSA increases the statistical power of major gene detection, due to MSA made best use of the correlation among traits, whether the simultaneous monitoring the expression of multiple traits or only monitoring the expression of a single trait among these traits by major gene. (2) MSA improves the precision and accuracy of major gene effect estimates. In general, if only the statistical power of major gene is higher than 50%, the precision and accuracy can arrive at the ideal value. (3) The efficacy of MSA to separate pleiotropy and close linkage was demonstrated. (4) Although both the heritability and sample size are key factors affecting the statistical power in the detection of major genes, it was found that the statistical power can be much better improved with the increased heritability than sample size. An example of the plant height and tiller number of F2 population in rice cross Duonieai×Zhonghua 11 was used in the illustration. The results indicated that the genetic difference of these two traits in this cross involves only one pleiotropic major gene. The additive effect and dominance effect of the major gene are estimated as -21.3cm and 40.6cm on plant height, and 22.7 and -25.3 on number of tiller, respectively. The major gene shows overdominance for plant height and close to complete dominance for number of tillers.
     The above MSA not only estimates the genetic parameters in model, but also can calculate the posterior probabilities of each individual belong to different major genotypes. Thus, in this paper, we introduced a new method, namely model-based unsupervised dynamic clustering method, which classified individuals according to the Bayesian posterior probabilities. In this method the parameters of different clusters were also estimated by the ML method implemented via EM algorithm and the individuals were classified by the Bayesian posterior probabilities. The outcomes of the simulation experiments clearly demonstrated. (1) The proposed method not only unbiasedly estimated the corresponding cluster parameters but also determined the optimum clustering numbers by BIC, which solving the great dilemma of deciding the number of cluster in traditional dynamic cluster methods. (2) Compared with the k-means method and the minimum square sum within groups (MinSSw) method, the proposed method was more robustness. (3) Moreover, the misclassified rate (MR) could be reduced by using stricter discrimination criterion. The proposed method was further validated by Fisher’s Iris dataset and the result indicated that the unsupervised dynamic cluster method implemented through the maximum of the likelihood function especially fits the data generated from Gaussian distribution, because the proposed method had a significant lower MR compared to the k-means and MinSSw methods.
     DNA microarray technology is the chief tool for functional genome research in the post-genomics era, which allowed the simultaneous monitoring of expression levels in cells of thousands of genes under varying experimental environment or biological tissue. Grouping gene having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Also, it is the useful and most widely used method of microarray data analysis. Depending on whether or not the prior knowledge is used, the clustering methods could be classified into unsupervised clustering and supervised clustering. To explore the feasibility of the application of the above model-based cluster method to the analysis of high-dimension Microarray expression data, several typical supervised clustering methods, i.e., Gaussian mixture model-based supervised clustering, k-nearest-neighbor (KNN), binary support vector machines (SVMs) and multicategory support vector machines (MC-SVMs), were employed to classify the computer simulation data, yeast cell cycle microarray data and 60 human cancer cell lines (NCI-60) microarray data. False positive, false negative, true positive, true negative, clustering accuracy and Matthews’correlation coefficient (MCC) were compared among these supervised methods. The results are as follows. (1) In classifying thousands of gene expression data, the performances of model-based cluster methods have the maximal clustering accuracy. Furthermore, when the number of training sample is very small, the clustering accuracy of model-based supervised method have superiority over model-based discrimination method only using the information of known functional gene to guide the classified of unkonw functional gene, whereas the former simultaneous using the prior knowledge of known functional genes and unknown functional genes to guide the classified of unknown functional genes. But insofar as the computational speed was concerned, discrimination method is quicker than model-based method. (2) In general, the superior classification performance of the MC-SVMs is more robust and more practical, which are less sensitive to the curse of dimensionality and not only inferior to model-based method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. (3) Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, while five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield the better results when sample sizes are small. (4) We recommend that at least two candidate methods choosing based on the real data features and experimental conditions should be performed and compared to obtain better clustering result.
引文
[1] Nilsson-Ehle H. Kreuzungsuntersuchungen an Hafer und Weizen [J]. Lunds Univ, 1909, series 2, 5 (2): 1-122
    [2] Fisher R A. The correlations between relatives on the supposition of Mendelian inheritance [J]. Trans. Roy. Soc. Edinb., 1918, 52: 399-433
    [3] Fisher R A, Immer F R, Tedin O. The genetical interpretation of statistics of the third degree in the study of quantitative inheritance [J]. Genetics, 1932, 17: 107-124
    [4] Mather K. Variation and selection of polygenic characters [J]. J. Genetics., 1941, 41: 159-193
    [5] Kempthorne O. The correlation between relatives in a random mating population [C]. In: Proceedings of the Royal Society of London. Biological Sciences, 1954, 103-113
    [6] Kempthorne O. The theoretical values of correlation between relatives in a random mating populations [J]. Genetics, 1955, 134: 351-360
    [7] Kempthorne O. An Introduction to Genetics Statistics [M]. New York: John Wiley & sons, 1957
    [8] Elston R, Stewart J. A general model for the genetic analysis of pedigree data [J]. Hum. Hered., 1971, 21: 523-542
    [9] Elston R C, Stewart J. The analysis of quantitative traits for simple genetic models from parental F1 and backcross data [J]. Genetics, 1973, 73: 695-711
    [10] Morton N E, MacLean C J. Analysis of family resemblance. III. Complex segregation of quantitative traits [J]. Am. J Hum. Genet., 1974, 26: 489-503
    [11] Elkind Y, Cahaner A. A mixed model for the effects of single gene, polygenes and their interaction on quantitative traits.1.The model and experimental design [J]. Theor. Appl. Genet., 1986, 72: 377-383
    [12] 莫惠栋. 质量-数量性状的遗传分析 I.遗传组成和主基因基因型鉴别 [J]. 作物学报, 1993, 19 (1): 1-6
    [13] 姜长鉴, 莫惠栋. 质量-数量性状的遗传分析Ⅳ. 极大似然法的应用 [J]. 作物学报, 1995, 21 (6): 641-648
    [14] 莫惠栋. 质量-数量性状的遗传分析Ⅱ.世代平均数和遗传方差 [J]. 作物学报, 1993, 19 (3): 193-200
    [15] 莫惠栋, 徐辰武. 质量-数量性状的遗传分析Ⅲ. 受三倍体遗传控制的胚乳性状 [J]. 作物学报, 1994, 20 (5): 513-519
    [16] Jiang C J, Pan X B, Gu M H. The use of mixture models to detect effects of major genes on quantitative characters in a plant breeding experiment [J]. Genetics, 1994, 136: 383-394
    [17] 盖钧镒, 章元明, 王建康. 植物数量性状遗传分析体系 [M]. 北京: 科学出版社, 2001. 380
    [18] Wang J, Podlich D W, Cooper M, DeLacy I H. Power of the joint segregation analysis method for testing mixed major-gene and polygene inheritance models of quantitative traits [J]. Theo. Appl. Genet., 2001, 103: 804-816
    [19] 章元明, 盖钧镒, 戚存扣. 数量性状分离分析的精确度及其改善途径 [J]. 作物学报, 2001, 27 (6): 787-793
    [20] 徐辰武, 胡治球, 王学枫, 王伟. 胚乳性状主基因的分离分析方法 [J]. 中国农业科学, 2005, 38 (7): 1317-1323
    [21] 田佺, 杨润清, 宁海龙, 李文斌. 动态性状主基因的分离分析新方法 [J]. 上海交通大学学报(农业科学版), 2005, 23 (4): 333-359
    [22] Chien K L, Chen W J, Hsu H C, Su T C, Chen M F, Lee Y T. Segregation analysis of apolipoprotein A1 levels in families of adolescents: A community-based study in Taiwan [J]. BMC Genet., 2006, 7: 4
    [23] Schmitz S, Cherny S S, Fulker D W. Increase in power through multivariate analyses [J]. Behav. Genet., 1998, 28: 357-363
    [24] Xiang Z Y, Yang Y N, Ma X J, Ding W. Microarray expression profiling: Analysis and applications [J]. Current Opinion in Drug Discovery & Development, 2003, 6 (3): 1367-6733
    [25] Holstege F C P, Jennings E G, Wyrick J J, Lee T I, Hengartner C J, Green M R, Golub T R, Lander E S, Young R A. Dissecting the regulatory circuitry of a eukaryotic genome [J]. cell, 1998, 95: 717-728
    [26] Kathiresan A, Lafitte H R, Chen J X, Mansueto L, Bruskiewith R, Bennett J. Gene expression microarray and their application in drought stress research [J]. Field Crops Research, 2006, 97: 101-110
    [27] Li Q, Chen F, Sun L X, Zhang Z Q, Yang Y N, He Z H. Expression profiling of rice genes in early defense responses to blast and bacterial blight pathogens using cDNA microarray [J]. Physiol. Mol. Plant Pathol., 2006, 68: 51-60
    [28] Luo F, Khan L, Yen I L, Bastani F. A dynamical growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles [J]. Bioinformatics, 2004, 20 (16): 2605-2617
    [29] Carr D B, Somogyi R, Michaels G. Templates for looking at the gene expression clustering [J]. Statistical Computing & Statistical Graphics Newsletter, 1997, 8: 20-29
    [30] Cho R J, Campbell M J, Winzeler E A, Steinmetz L, Conway A, Wodicka L, Wolfsberg T G, Gabrielian A E, Landsman D, Lockhart D J, Davis R W. A genome-wide transcriptional analysis of the mitotic cell cycle [J]. Mol. Cell, 1998, 2: 65-78
    [31] Hughes T R, Marton M J, Jones A R, Roland Stoughton C J R, Armour C D, Bennett H A, Coffey E, Dai H, He Y D, Kidd M J, King A M, Meyer M R, Slade D, Lum P Y, Stepaniants S B, Shoemaker D D, Gachotte D, Kalpana C, Simon J, Bard M, Friend S H. Functional discovery via a compendium of expression profiles [J]. Cell, 2000, 102: 109-126
    [32] Szabo A, Boucher K, Carroll W L, Klebanov L B, Tsodikov A D, Yakovlev A Y. Variable selection and pattern recognitiion with gene expression data generated by the Microarray technology [J]. Math. Biosci., 2002, 176: 71-98
    [33] Eisen M B, Brown P O. DNA arrays for analysis of gene expression [J]. Meth. Enzymol., 1999, 303: 179-205
    [34] Lonnstedt I, Speed T P. Replicated microarray data [J]. Stat. Sinica, 2002, 12: 31-46
    [35] Tavazoie S, Hughes J D, Campbell M J, Cho R J, Church G M. Systematic determination of genetic network architecture [J]. Nature Genetics, 1999, 22 (3): 281-285
    [36] Herrero J, Valencia A, Dopazo J. A hierarchical unsupervised growing neural network for clustering gene expression patterns [J]. Bioinformatics, 2001, 17 (2): 126-136
    [37] Han J, Kamber M. Data Mining: Concepts and Techniques [M]. Morgan Kaufmann Publishers, 2000
    [38] Brown M P S, Grundy W N, Lin D, Cristianini N, Sugnet C W, Furey T S, Ares M, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines [J]. Proc. Natl. Acad. Sci., 2000, 97: 262-267
    [39] Dasgupta A, Raftery A E. Detecting features in spatial point processes with clutter via model-based clustering [J]. J. Am. Stat. Assoc., 1998, 93: 294-302
    [40] Mather K, Jinks J L. Introduction to Biometrical Genetics [M]. London: Chapman and Hall Ltd., 1977
    [41] Mather K, Jinks J L. Biometrical Genetics. 3rd edn [M]. London: Chapman & Hall, 1982
    [42] Galton F. Regression towards mediocrity in hereditary stature [J]. J of Anthropological Institute, 1885, 15: 246-263
    [43] Pearson K. Skew variation in homogeneous material [J]. Philos. Trans., 1895, A: 186-343
    [44] Pearson K. Supplement to a memoir on skew variation [J]. Philos. Trans., 1901, A: 197-443
    [45] Pearson K. Second supplement to a memoir on skew variation [J]. Philos. Trans., 1916, A: 216-429
    [46] Emerson R A, East E M. The inheritance of quantitative characters in maize [J]. Bull. Agr. Exp. Sta. Nebrask, 1913, Res Bull 2
    [47] East E M. Studies on size inheritance in Nicotiana [J]. Genetics, 1915, 1: 164-176
    [48] Wright S. Evolution in Mendelian populations [J]. Genetics, 1931, 16: 97-159
    [49] Wright S. The analysis of variance and the correlations between relatives with respect to deviation from an optimum [J]. J. Genetics., 1935, 30: 243-256
    [50] Haldane J B S. The causes of evolution [M]. New York: Harper Brothers, 1932. 222
    [51] Falconer D S. Introduction to Quantitative Genetics (3rd edn) [M]. London: Longman, 1989
    [52] Haldane J B S. A mathematical theory of natural and artifical selection. PartⅠ [J]. Camb. Phil. Soc., 1924, 23: 19-41
    [53] Haldane J B S. A mathematical theory of natural and artifical selection. Part Ⅱ [J]. Camb. Phil. Soc., 1926, 23: 363-372
    [54] Haldane J B S. A mathematical theory of artifical and natural selection. Part Ⅳ [J]. Camb. Phil. Soc., 1926, 23: 607-615
    [55] Haldane J B S. A mathematical theory of natural and artifical selection. Part Ⅴ [J]. Camb. Phil. Soc., 1927, 23: 838-844
    [56] Wright S. System of mating.Ⅰ.The biometric relation between parent and offspring [J]. Genetics, 1921, 6: 111-123
    [57] Mather K. Biometrical Genetics [M]. London Methuen and Co. Lfd., 1949
    [58] 李竞雄. 玉米杂种优势研究的回顾和展望 [C]. In: 植物遗传理论和应用研讨会论文集. 中国遗传学会, 1990, 1-7
    [59] Robertson A. The nature of quantitative genetic variation [C]. In: Heritage from Mendel. Madison: Univ. Wisc., 1967, 265-280
    [60] 顾铭洪, 朱立宏. 几个矮秆籼稻矮秆基因等位关系的初步分析 [J]. 遗传, 1979, 1 (6): 10-13
    [61] 卢永根, 曾世雄, 李镇邦. 我国籼稻矮生性基因源的表现型和遗传规律的研究 [J]. 遗传学报, 1979, 6 (3): 311-321
    [62] 吴竞仑, 蒋荷. 四个云南半矮秆籼稻在育种上的应用潜力 [J]. 江苏农业学报, 1991, 7 (3): 46-49
    [63] Simmonds N W. Principles of Crop Improvement [M]. New York: Longman Inc, 1979
    [64] 朱立宏, 陆维忠, 谢岳峰. 主要农作物抗病性遗传研究进展 [M]. 南京: 江苏科学技术出版社, 1990. 83-95
    [65] 高明尉. 野败型杂交籼稻基因型的初步分析 [J]. 遗传学报, 1981, 8 (1): 66-74
    [66] 周天理, 沈锦骅, 叶复初. 野败型杂交籼稻的育性基因分析 [J]. 作物学报, 1983, 9 (4): 241-247
    [67] 莫惠栋, 顾铭洪. 谷类作物品质性状遗传研究进展 [M]. 南京: 江苏科学技术出版社,1990. 75-81
    [68] 张勤. 主效基因及其在家畜育种中的意义 [J]. 中国畜牧杂志, 1993, 29 (1): 57-59
    [69] 张泽, 鲁成, 李发德, 向仲怀. 家蚕产卵量的主基因探测 [J]. 遗传, 1997, 19(增刊): 81-82
    [70] Elkind Y, Cahaner A. A mixed model for the effects of single gene, polygenes and their interaction on quantitative traits: 2. The effects of the major gene and polygenes on tomato fruit softness [J]. Heredity, 1990, 64: 205-213
    [71] 徐辰武, 莫惠栋. 胚乳性状的质量-数量遗传分析 [J]. 江苏农学院学报, 1995, 16 (1): 9-13
    [72] 盖钧镒, 管荣展, 王建康. 植物数量性状 QTL 体系检测的遗传试验方法 [J]. 世界科技研究与发展, 1999, 21 (1): 34-40
    [73] 杨永华, 盖钧镒, 马育华. 春夏秋播种季节条件下大豆生育期遗传的差异表现 [J]. 中国农业科学, 1994, 27 (3): 1-6
    [74] 耿社民. 畜禽数量性状主效基因及基因集团的检测 [J]. 黄牛杂志, 1998, 24 (6): 4-6
    [75] Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits [M]. Massachusetts:Sinauer Associates, Inc: Baker & Taylor Books, 1998
    [76] 徐华, 孙少华. 畜禽主基因的研究现状与检测方法 [J]. 当代畜牧, 2004, 2: 46-49
    [77] Carmelli D, Karlin S, Willians R. A class of indices to assess major gene versus polygenic inheritance in distributed variables [C]. In: The Genetic Analysis of Common Diseases: Applications to Predictive Factors in Coronary Heart Disease. New York: Alan R Liss, 1979, 259-270
    [78] Karlin S, Carmelli D, Williams R. Index measures for assessing the model of inheritance of continuously distributed traits: 1. Theory and justification [J]. Theor. Popul. Biol., 1979, 16: 81-106
    [79] Karlin S, Williams P, Carmelli D. Structured exploratory data analysis (SEDA) for determining mode of inheritance of quantitative traits.Ⅰ. Simulation studies on the effect of background distributions [J]. Amer. J. Hum. Genet., 1981, (33): 2
    [80] Loisel P, Goffinet B, Monod H, Montes G. Detecting a major in an F2 population [J]. Biometrics, 1994, 50: 512-516
    [81] Haseman J K, Elston R C. The investigation of linkage between a quantitative trait and a marker locus [J]. Behav. Genet., 1972, 2: 3-19
    [82] 戚存扣, 盖钧镒, 章元明. 甘蓝型油菜芥酸含量的主基因+多基因遗传 [J]. 遗传学报, 2001, 28 (2): 182-187
    [83] 杜雄明, 汪若海, 刘国强, 傅怀勤, 潘家驹, 张天真. 棉花纤维相关性状的主基因-多基因混合遗传分析 [J]. 棉花学报, 1999, 11 (2): 73-78
    [84] Tanksley S D. Mapping polygenes [J]. Annu. Rev. Genet., 1993, 27: 205-223
    [85] Dempster A P, Laird M N, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm [J]. J.R. statist. soc. B (Methodological), 1977, 39 (1): 1-38
    [86] Tan W Y, Chang W C. Convolution approach to the genetic analysis of quantitative characters of self-fertilized populations [J]. Biometrics, 1972, 28: 1073-1090
    [87] 王建康, 盖钧镒. 利用杂种 F2 世代鉴定数量性状主基因-多基因混合模型并估计其遗传效应 [J]. 遗传学报, 1997, 24 (5): 432-440
    [88] 王建康, 盖钧镒. 数量性状主基因-多基因混合遗传的 P1、P2、F1、F2 和 F2:3 联合分析方法 [J]. 作物学报, 1998, 24 (6): 651-659
    [89] Gai J Y, Wang J K. Identification and estimation of a QTL model and its effects [J]. Theor. Appl. Genet., 1998, 97 (7): 1162-1168
    [90] 盖钧镒. QTL 混合遗传模型扩展至 2 对主基因+多基因时的多世代联合分析 [J]. 作物学报, 2000, 26 (4): 385-391
    [91] 章元明, 盖钧镒. 数量性状主基因+多基因混合遗传分析中鉴定多基因存在的 IECM 算法 [J]. 生物数学学报, 1999, 14 (4): 429-434
    [92] 章元明, 盖钧镒. 数量性状分离分析中分布参数估计的 IECM 算法 [J]. 作物学报, 2000, 26 (6): 699-705
    [93] 章元明, 盖钧镒, 戚存扣. 植物数量性状遗传体系检测中回交或自交家系重复试验数据的分析方法 [J]. 遗传, 2001, 23 (4): 329-332
    [94] 章元明, 盖钧镒. 利用 DH 或 RIL 群体检测 QTL 体系并估计其遗传效应 [J]. 遗传学报, 2000, 27 (7): 634-640
    [95] 章元明, 盖钧镒. 利用 P1、F1、P2、F2 和 F2:3 家系五世代联合分离分析的拓展 [J]. 生物数学学报, 2002, 17 (3): 363-368
    [96] 章元明, 盖钧镒, 王建康. 利用回交 B1 和 B2 以及 F2 群体鉴定数量性状两对主基因+多基因混合遗传模型 [J]. 生物数学学报, 2000, 15 (3): 358-366
    [97] 章元明, 盖钧镒, 王永军. 利用 P1、P2 和 DH 或 RIL 群体联合分离分析的拓展 [J]. 遗传, 2001, 23 (5): 467-470
    [98] 章元明, 盖钧镒, 张孟臣. 利用 P1、F1、P2 和 F2 或 F2:3 世代联合的数量性状分离分析 [J]. 西南农业大学学报, 2000, 22 (1): 6-9
    [99] 戴君惕, 何觉民, 张淑芝. 生态遗传雄性不育理论与两系杂交植物Ⅴ.雄性育性广义空间 [J]. 湖南农业大学学报, 1997, 23 (1): 9-14
    [100] 胡中立, 章志宏. 质量-数量性状的遗传参数估计 II:利用 DH 群体或 RIL 群体 [J]. 武汉大学学报, 1998, 44 (6 ): 784-788
    [101] Schena M, Shalon D, Davis R W, O B P. Quantitative monitoring of gene expression patterns with a complementary DNA microarray [J]. Science, 1995, 270 (5235): 467-470
    [102] Shimizu M, Hochadel J F, Fulmer B A, Waalkes M P. Effect of glutathione depletion and metallothionein gene expression on arsenic-induced cytotoxicity and c-myc expression in vitro [J]. Toxicol. Sci., 1998, 45 (2): 204-211
    [103] Suzuki K, Nakajima K, Otaki N, Kimura M. Metallothionein in developing human brain [J]. Biol. signals., 1998, 3 (4): 188-192
    [104] Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, Levine A J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J]. Proc. Natl. Acad. Sci., 1999, 96: 6745-6750
    [105] Eisen M B, Spellman P T, Brown P O, Botstein D. Cluster analysis and display of genome-wide expression patterns [J]. Proc. Natl. Acad. Sci. USA., 1998, 95 (2525): 14863-14868
    [106] Sokal R R, Sneath P H A. Principles of numerical taxonomy [M]. San Francisco: CA: W H Freeman, 1963. 359
    [107] Heyer L J, Kruglyak L, Yooseph S. Exploring expression data: identification and analysis of coexpressed genes [J]. Genome Res, 1999, 9: 1106-1115
    [108] Michaels G S, Carr D B, Askenazi M, Fuhrman S, Wen X, Somogyi R. Cluster analysis and data visualization of large-scale gene expression data [J]. Pacific Symp. Biocomput., 1998, (3): 42-53
    [109] Sokal R R, Michener C D. A statistical method for evaluating systematic relationships [J]. Univ. Kans. Sci. Bull., 1958, 38: 1409-1438
    [110] Weinstein J N, Myers T G, M O C P, Friend S H, Fornace Jr. A J, Kohn K W, Fojo T, Bates S E,Rubinstein L V, Anderson N L, Buolamwini J K. An information-intensive approach to the molecular pharmacology of cancer [J]. Science, 1997, 275 (5298): 343-349
    [111] Wen X L, Fuhrman S, Michaels G S, Carr D B, Smith S, Barker J L, Somogyi R. Large-scale temporal gene expression mapping of central nervous system development [J]. Proc. Natl. Acad. Sci., 1998, 95: 334-339
    [112] Kaufman L, Rousseeuw P. Finding Groups in Data: An introduction to cluster Analysis [M]. New York: John Wiley & Sons, 1990
    [113] Quackenbush J. Computational analysis of microarray data [J]. Nat. Rev. Genet., 2001, 2: 418-427
    [114] Susmita D, Somnath D. Comparisons and validation of statistical clustering technique for microarray gene expression data [J]. Bioinformatics, 2003, 19 (4): 459-466
    [115] MacQueen J B. Some methods for classification and analysis of multivariate observations [C]. In: J L L a N. Proceedings of the 5th Berkeley Symposium on Mathematics Statistic Problem. 1967, 281-297
    [116] Hartigan J A, Wong M A. A k-means clustering algorithm [J]. Appl. Stat., 1979, 28: 100-108
    [117] Speed T. Statistical Analysis of Gene Expression Microarray Data [J]. Chapman & Hall/CRC, 2003
    [118] Hastie T, Tibshirani R, Eisen M B. "Gene shaving" as a method for identifying distinct sets of genes with similar expression patterns [J]. Genome Biol., 2000, 1 (2): 1-21
    [119] Hastie T, Tibshirani R, Eisen M B, Brown P, Scherf U, Weinstein J, Alzadeh A, Staudt L, Botstein D. Gene shaving: a new class of clustering methods for expression arrays [C]. In: Technical Report. Department of Statistics, Stanford University, Stanford, CA: 2000b
    [120] Audrey P G, Michael B E. Exploring the conditional coregulation of yeast gene expression through fuzzy K- means clustering [J]. Genome Biol., 2002, 3 (11): 1-22
    [121] Kohonen T. Self-Organized formation of topologically correct feature maps [J]. Biol. Cybernetics, 1982, 43: 59-69
    [122] Kohonen T. Self Organizing Maps [M]. Springer, Berlin: 1995
    [123] Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E S, Golub T R. Interpreting patterns of gene expression with self-organizing maps: Method and application to hematopoietic differentiation [J]. Proc. Natl. Acad. Sci., 1999, 96: 2907-2912
    [124] VanOsdol W W, Myers T G, Paull K D, Kohn K W, Weinstein J N. Use of the Kohonen self-organizing map to study the mechanisms of action of chemotherapeutic-agents [J]. J .Natl. Cancer. Inst., 1994, 86 (24): 1853-1859
    [125] Mangiameli P, Chen S K, West D. A comparison of SOM neural network and hierachical clustering methods [J]. Eur. J Oper. Res., 1996, 93 (2): 402-417
    [126] Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a dataset via the gap statistic [J]. J Royal. Stat. Soc. B., 2001, 63: 411-423
    [127] Mavroudi S, Papadimitrious S, Bezerianos A. Gene expression data analysis with a dynamically extended self-organized map that exploits class information [J]. Bioinformatics, 2002, 18: 1446-1453
    [128] Fritzke B. Growing cell structures—a self-organizing network for unsupervised and supervised learning [J]. Neural Networks, 1994, 7 (9): 1441-1460
    [129] Kohonen T. Self-organizing maps, Second Edition [M]. Berlin: Springer-Verlag, 1997. 145-152
    [130] Xu Y, Olman V, Xu D. Clustering gene expression data using a graph-theoretic approach: anapplication of minimum spanning trees [J]. Bioinformatics, 2002, 18 (4): 536-545
    [131] Lukashin A V, Fuchs R. Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters [J]. Bioinformatics, 2001, 17: 405-414
    [132] Cowgill M C, Harvey R J, Watson L T. A genetic algorithm approach to cluster analysis [J]. Comput. Math. App., 1999, 37: 99-108
    [133] Holland J H. Outline for a logical theory of adaptive systems [J]. J Assoc. Comput. Mach., 1962, 9 (3): 297-314
    [134] Pan H Y, Zhu J, Han D F. Genetic algorithms applied to mutli-class clustering for gene expression data [J]. Geno, Prot. & Bioinfo., 2003, 1 (4): 279-287
    [135] Maulik L, Bandyopadhyay S. Genetic algorithm-based clustering technique [J]. Pattern Recognition, 2000, 33: 1455-1465
    [136] Gad G, Erel L, Eytan D. Coupled two-way clustering analysis of gene microarray data [J]. Proc. Natl. Acad. Sci., 2000, 97 (22): 12079-12084
    [137] Blatt M, Wiseman S, Domany E. Superparamagnetic clusterring of data [J]. Phys. Rev. Lett., 1996, 76: 3251-3255
    [138] Tang C, Zhang L, Zhang A D, Ramanathan M. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis [C]. In: Proceeding 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering. Los Alamitos: IEEE Comput. Soc., 2001, 41-48
    [139] Lazzeroni L, Owen A B. Plaid models for gene expression data [J]. Stat. Sinica., 2002, 12: 61-86
    [140] Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data [J]. J. Am. Stats. Assoc., 2002, 97: 77-87
    [141] Breiman L. Bagging predictors [J]. Machine Learning, 1996, 23 (2): 123-140
    [142] Hastie T, Tibshirani R, Friedman J H. The elements of statistical learning: Data mining, inference, and prediction [J]. New York: Springer-Verlag, 2001: 397-399
    [143] Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue sampling using microarray expression data [J]. Bioinformatics, 2000, 16 (10): 906-914
    [144] Riply B D. Pattern Recognition and Neural Network [M]. Cambridge: Cambridge University Press, 1996
    [145] Ghosh D, Chinnaiyan A M. Mixture modelling of gene expression data from Microarray experiments [J]. Bioinformatics, 2002, 18 (2): 275-286
    [146] Yeung K Y, Fraley C, Murua A, Raftery A E, Ruzzo W L. Model-based clustering and data transformations for gene expression data [J]. Bioinformatics, 2001, 17 (10): 977-987
    [147] McLachlan G J, Bean R W, Peel D. A mixture model-based approach to the clustering of microarray expression data [J]. Bioinformatics, 2002, 18 (3): 413-422
    [148] Efron B, Tibshirani R, Storey J D, Tusher V. Empirical Bayes analysis of a Microarray experiment [J]. J. Am. Stats. Assoc., 2001, 96: 1151-1160
    [149] Yeung K Y, Medvedovic M, Bumgarner R E. Clustering gene expression data with repeated measurements [J]. Genome. Biol., 2003, 4 (5): R34:1
    [150] Qu Y, Xu S Z. Supervised cluster analysis for Microarray data based on multivariate Gaussian mixture [J]. Bioinformatics, 2004, 20: 1905-1913
    [151] Ji X, Li-Ling J, Sun Z R. Mining gene expression data using a novel approach based onhidden Markov models [J]. FEBS. Lett., 2003, 542 (123): 125-131
    [152] Schliep A, Schonhuth A, Steinhoff C. Using hidden Markov models to analyze gene expression time course data [J]. Bioinformatics, 2003, 19 (Suppl.1): 255-263
    [153] Li L, Noble W S. Combining pairwise sequence similarity and support vector machines for remote protein homology detection [J]. Recomb, 2002, 18: 255-232
    [154] Yang R, Tian Q, Xu S Z. Mapping quantitative trait loci for longitudinal traits in line crosses [J]. Genetics, 2006, 173: 2339-2356
    [155] Zeng W, Li B L. Simple tests for detecting segregation of major genes with phenotypic data from a diallel mating [J]. For. Sci., 2003, 49: 268-278
    [156] Elston R. The genetic analysis of quantitative trait differences between two homozygous lines [J]. Genetics, 1984, 108: 733-744
    [157] Tan W, D’Angelo H. Statistical analysis of joint effects of major genes and polygenes in quantitative genetics [J]. Biom. J., 1979, 21: 179-192
    [158] Zhang Y M, Gai J, Yang Y H. The EIM algorithm in the joint segregation analysis of quantitative traits [J]. Genet. Res., 2003, 81: 157-163
    [159] Aultchenko Y S, Veprev S G, Aksenovich T I. An example of complex segregation analysis of plant pedigree: reversion of cytoplasm type in sugar beet (Beta vulgaris L.) [J]. Ann. Hum. Genet., 1999, 63: 351-353
    [160] Tourjee K R, Harding J, Byrne T G. Complex segregation analysis of Gerbera flower color [J]. Heredity, 1995, 74: 303-310
    [161] 王建康, 盖钧镒. 数量性状主-多基因混合遗传的 P1, P2, F1, F2 和 F2:3 联合分析方法 [J]. 作物学报, 1998, 24 (6): 651-659
    [162] 黄蛟龙, 曹致琦, 马海燕, 泽 张. 极大似然法探测主基因的效能 [J]. 作物学报, 2003, 29 (1): 133-137
    [163] Jiang C J, Zeng Z B. Multiple trait analysis of genetic mapping for quantitative trait loci [J]. Genetics, 1995, 140: 1111-1127
    [164] Wylie M P, Holtizman J. The non-line of sight problem in mobile location estimation [C]. In: Proc. IEEE International conference on universal personal comunications. 1996, 827-831
    [165] 张尧庭, 方开泰. 多元统计分析引论 [M]. 北京: 科学出版社, 1983. 401-457
    [166] Johnoson R A, Wichern D W. Applied Multivariate Statistical Analysis [M]. Prentice Hall, 1982. 532-560
    [167] Wu W, Xiong H, Shekhar S. Clustering and Information Retrieval [M]. Norwell, Mass: Kluwer Academic Publishers, 2004
    [168] Leszczynski J. Computational materials science [J]. Elsevier, Amsterdam, Boston, 2004
    [169] Lee M L T. Analysis of microarray gene expression data [J]. Kluwer Academic, Boston, 2004
    [170] Banks D L. Classification, clustering, and data mining applications [C]. In: Proceedings of the Meeting of the International Federation of Classification Societies (IFCS). Chicago: lIlinois Institute of Technology, 2004, 15-18
    [171] Hartigan J A. Clustering algorithms [M]. Wiley, New York: 1975
    [172] Selim S Z, Alsultan K. A simulated annealing algorithm for the clustering problem [J]. Pattern Recognition, 1991, 24 (10): 1003-1008
    [173] 唐立新, 杨自厚, 王梦光. 用遗传算法改进聚类分析中的 K-平均算法 [J]. 数理统计与应用概率, 1997, 12 (4): 350-356
    [174] Holland J H. Genetic algorithms [J]. Scientific American, 1992: 66-72
    [175] Gordon A D, Henderson J T. An algorithm for Euclidean sum of squares classification [J].Biometrics, 1977, 33: 355-362
    [176] 顾世梁. 实现动态聚类全局最优的一种算法 [J]. 江苏农学院学报, 1996, 17 (1): 57-65
    [177] 范金城, 梅长林. 数据分析 [J]. 科学出版社, 2002: 159-187
    [178] Titterington D M, Smith A F M, Makov U E. Statistical Analysis of Finite Mixture Distributions [M]. New York: John Wiley & Sons, 1985
    [179] McLachlan G J, Basford K E. Mixture Models: Inference and Applications to Clustering [M]. New York: Marcel Dekker, 1988
    [180] 肖静, 胡治球, 汤在祥, 隋炯明, 李欣, 徐辰武. 多个相关数量性状主基因的联合分析方法 [J]. 中国农业科学, 2005, 38 (9): 1717-1724
    [181] 王长本, 刘兴晖. 基因表达数据的聚类分析 [J]. 国外医学临床生物化学与检验学分册, 2004, 25 (4): 359-362
    [182] Jiang D, Tang C, Zhang A. Cluster analysis for gene expression data: A survey [C]. In: IEEE Trans. Knowledge and Data Eng. 2004, 1370-1386
    [183] Duda R. Pattern Classification [M]. Wiley, 2001
    [184] Mitchell T M. Machine Learning [M]. New York: McGraw Hill, 1997
    [185] Vapnik V. Statistical Learning Theory [M]. New York: Wiley, Chichester, 1998
    [186] Brown M P S, Grundy W N, Lin D, Cristianini N, Sugnet C W, Furey T S, Ares M J, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines [J]. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97 (1): 262-267
    [187] Mukherjee S. Classifying Microarray Data Using Support Vector Machines, Understanding And Using Microarray Analysis Techniques: A Practical Guide [M]. Boston, MA: Kluwer Academic Publishers, 2003
    [188] Lee Y, Lee C K. Classification of multiple cancer types by multicategory support vector machines using gene expression data [J]. Bioinformatics, 2003, 19 (9): 1132-1139
    [189] Statnikov A, Aliferis C F, Tsamardinos L, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis [J]. Bioinformatics, 2005, 21 (5): 631-643
    [190] Kressel U. Pairwise Classification and Support Vector Machines. In Advances in Kernel Methods: Support Vector Learning, (Chapter 15). [M]. Cambridge, MA, USA: MIT Press, 1999
    [191] Friedman J. Another approach to polychotomous classification [C]. In: Technical Report. Stanford University, CA, 1996
    [192] Platt J C, Cristianini N, Shawe-Taylor J. Large margin DAGS for multiclass classification [C]. In: Proceedings of Neural Information Processing Systems N. MIT Press, 2000, 547-553
    [193] Hsu C W, Lin C J. A comparison of methods for muli-class support vector machines [J]. IEEE Transactions on Neural Networks, 2002, 13 (2): 415-425
    [194] Platt J. Fast training of support vector machines using sequential minimal optimization [C]. In: Sch?lkopf B, Burges C, Smola A eds. Advances in Kernel Methods—Support Vector Learning. Cambridge, MA, USA: MIT Press, 1999, 185-208
    [195] Staunton J E, Slonim D K, Coller H A, Tamayo P, Angelo M J, Park J, Scherf U, Lee J K, Reinhold W O, Weinstein J N, Mesirov J P, Lander E S, Golub T R. Chemosensitivity prediction by transcriptional profiling [C]. In: Proceedings of the National Academy of Sciences of the United States of America. 2001, 10787-10792
    [196] Berrar D P, Downes C S, Dubitzky W. Multiclass cancer classification using gene expressionprofiling and probabilistic neural networks [C]. In: Proceedings of the Pacific Symposium on Biocomputing (PSB). Hawaii, USA: 2003, 5-16
    [197] Romualdi C, Campanaro S, Campagna D, Celegato B, Cannata N, Toppo S, Valle G, Lanfranchi G. Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification [J]. Hum. Mol. Genet., 2003, 12 (8): 823-836
    [198] Gurzi P, Masulli F, Spalvieri A, Sotgiu M L, Biella G. Rough annealing by two-step clustering, with application to neuronal signals [J]. J Neurosci. Methods., 1998, 85 (1): 81-87
    [199] Wright S. Physiological and evolutionary theories of dominance [J]. Am. Nat., 1934, 68: 24-53

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700