用户名: 密码: 验证码:
基因表达数据在肿瘤诊断、基因功能预测中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
前言
     后基因组时代应用了大量高通量方法,由此产生了海量的基因表达数据。可靠准确的分类对于癌症的诊断和治疗至关重要。微阵列的使用可以同时检测每个样本的上千个基因表达,不但为客观准确的肿瘤分类提供了可能,而且也为临床医生选择适当形式的治疗提供数据支持。
     基因表达数据通常存在基因个数远远大于观察例数的情形,传统的统计分析方法有时失效,因此有必要分析何时用何法才能获取最有用的信息。虽然已有研究分析特征基因选择方法并用于肿瘤分类,然而大部分集中于一个方法或单个数据库,并缺乏统计学基础。因此,有必要使用多个数据库对各种方法的性能进行系统比较与分析。
     随着基因组及后基因组计划的不断开展,越来越多的生物信息被人类不断获得。合理的利用这些信息不但能有效的抑制噪声的影响,也能够避免单纯根据单独实验获得的片面信息,但是只有较少的文献意识到先验信息的重要性。
     聚类分析是一种有效的数据分析工具,已有研究表明参与同一个生物过程的基因具有相同的功能,因此对基因表达数据的聚类分析成为基因功能预测的一种主要方法。然而在聚类分析中,大部分现有方法都忽视了基因的已知功能。随着基因注释数据库的不断完善,尤其是当数据存在噪声时,在聚类过程中整合已知基因功能不失为一个明智之举。在聚类分析中通常是需要先定义基因表达距离,然后再根据此测量距离将基因聚类。如果这个距离单纯从生物实验出发,并没有考虑已有的先验知识,因此得到的距离就不全面、准确。
     目的
     选择合适的特征基因,比较不同方法在基因表达数据肿瘤分类中的优劣;在肿瘤基因表达数据中加入先验信息,提高肿瘤分类准确性;结合已知的生物学功能,提高基因表达聚类分析的准确性和解释性。
     方法
     本研究使用五个经典的基因表达数据库,分别包括二分类肺癌、结肠癌、多分类肺癌、儿童期肿瘤和脑肿瘤。分别采用最近收缩质心法(PAM),收缩质心的调整判别分析(SCRDA)和多重比较方法(MTP)选择特征基因,再分别利用所得到的特征基因集进行判别分析,判别分析方法包括:K近邻法(KNN)、线性判别分析(LDA)、C-分类支持向量机(C-SVM)、收缩线性判别分析(SLDA)、收缩对角判别分析(SDDA)、最近收缩质心法(PAM)、收缩质心的调整判别分析(SCRDA)和BP人工神经网络(BP-ANN)。
     本研究使用恶性胸膜间皮瘤和肺腺癌基因表达数据库,通过检索CancerResearch杂志报道的部分有关肺腺癌的基因,获得这些基因在原始数据集中的位置,并进行MTP检验,剔除不显著基因,保留显著基因,再分别与PAM和SCRDA方法获得的显著基因共同组成特征基因集,然后利用所得到的特征基因集进行判别分析。
     利用积累的基因功能关系,我们提出将已知基因的功能加入一个新的距离矩阵。这个新距离等于测量距离和功能距离之和。算法分为两步进行;第一步,在基于距离的聚类分析(如K-中心或系统聚类)中使用新距离。第二步,将上一步的聚类结果用于功能未知的基因功能预测,判断其是具有已知的功能,还是具有新功能。
     结果
     当基因个数多于样本个数时,传统LDA无法正常执行。从二分类与多分类数据来看,SCRDA选择出的基因个数明显多于PAM选择出的基因个数;SDA、PAM和SCRDA的准确率高于传统LDA方法;在机器学习方法中,SVM的准确率高于BP-ANN;使用全部基因与部分基因相比,KNN准确率有所下降。
     对于利用PAM和SCRDA方法获得基因集后再结合先验信息的分类方法中,只有少数方法的检验集分类准确率没有得到提高,其它方法都有一定提高,除了PCR等少数方法外,训练集的分类准确率都得到提高,相应的标准差也随之降低。
     模拟试验和对于酵母菌数据的研究证实整合功能距离方法比标准方法更有效。
     结论
     本研究发现特征基因的选择对于分类方法具有一定影响,PAM方法使用的特征基因的数目一般要小于SCRDA方法,而后者又要小于MTP方法。
     改进的判别方法,尤其是SLDA在肿瘤分类判别方面具有良好的表现,优于传统LDA,各改进方法间差别并不明显。在机器学习方法中,SVM好于BP-ANN,但是需要注意核函数及参数的选取。
     在判别分析中加入先验信息能够有效提高判别分析能力,降低基因表达数据中噪声的影响,这种思想无论在方法学上还是在实践上都具有实际应用前景。
     基因表达中结合生物学功能在一定程度上能够提高基因表达聚类分析的准确性和解释性,具有一定实际应用意义。
Introduction
     The postgenomic era has led to a multitude of high-throughput methodologies that generate massive volumes of gene expression data.Microarrays are capable of determining the expression levels of thousands of genes simultaneously and have greatly facilitated the discovery of new biological knowledge.Microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a more accurate and informative classification.However,this kind of knowledge is often difficult to grasp,and turing raw microarray data into biological understanding is by no means a simple task.Even a simple,small-scale,microarray experiment generates thousands to millions of data points.
     One feature of microarray data is that the number of tumor samples collected tends to be much smaller than the number of genes.The number for the former tends to be on the order of tens or hundreds,while microarray data typically contain thousands of genes on each chip.In statistical terms,it is called 'large p,small n' problem,i.e.the number of predictor variables is much larger than the number of samples.Thus, microarrays present new challenge for statistical methods.Traditional statistic methodologies in classification or prediction do not work well when the number of variables p(genes) far too exceeds the number of samples n.So,appropriate choice of existing statistical methodologies or development of new methodologies is needed for the analysis of gene expression microarray data.
     A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer.Gene expression microarrays have provided the high-throughput platform to discover genomic biomarkers for cancer diagnosis and prognosis.Many gene expression signatures have been identified in recent years for accurate classification of tumor subtypes or for prognosis of patient survival outcome. Current methods to help classifying human malignancies mostly rely on a variety of feature selection methods and classifiers for selecting informative genes.Many previous studies focused on one method or single dataset.Cancer is not a single disease, there are many different kinds of cancer,arising in different organs and tissues through the accumulated mutation of multiple genes.Evaluation of the most commonly employed methods may give more accurate results if it is based on the collection of multiple databases from the statistical point of view.
     Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips,but also avoid one-sided results of separate experiment. However,only some studies have been aware of the importance of priori information in tumor classification.Together with the application of discriminant techniques,we propose one method that incorporates prior knowledge into tumor classification based on gene expression data.The main problem is how to incorporate prior biological knowledge and where to get it from.For the purposes of this study,prior knowledge is any information about lung adenocarcinoma related genes that have been confirmed in literature.Prior knowledge is viewed here as a means of directing the classifier using known lung adenocarcinoma genes.
     Clustering analysis was first applied to microarray analysis in the late 1990s,and has become an increasingly important tool for gene expression analysis.Because co-expressed genes are likely to share the same biological function,cluster analysis of gene expression profiles has been applied for gene function discovery.Cluster diagrams display the "terminal branches" as a list of genes that share similar behavior across multiple experiments.Most clustering analyses work by first defining a distance metric based on a biological network,either a pathway or GO.The distance is not complete and accurate if ignoring enough prior knowledge.In general,with relatively high noise levels of genomic data,it is recognized that incorporating biological knowledge into statistical analysis is a reliable way to maximize statistical efficiency and enhance the interpretability of analysis results.
     Objectives
     In the present study,we evaluated most commonly employed discriminant methods and explored their features and application in order to improve accuracy for tumor/tissue classification.We then incorporated prior biological knowledge into tumor classification to improve accuracy for tumor/tissue classification.We presented a new methodology that combines prior knowledge about gene functions and cluster analysis for analyzing gene expression data in order to improve the accuracy and explanation validity of cluster results.
     Methods
     The performances of several popular discrimination methods for gene expression data were studied with five publicly available cancer microarray datasets.Nearest shrunken centroid method(PAM),shrunken centroids regularized discriminant analysis (SCRDA) and multiple testing procedure(MTP) were used for feature gene selection, the methods of classification included K nearest-neighbor classifiers(KNN),linear discriminant analysis(LDA),SCRDA,PAM,C-classification support vector machine(C-SVM),shrinkage linear discriminant analysis(SLDA),shrinkage diagonal discriminant analysis(SDDA) and back-propagation artificial neural network(BP-ANN).The five publicly available cancer microarray datasets were(1) MPM &ADCA,(2) colon,(3) multi-class lung cancer(4) multi-class children cancer, (5) multi-class brain cancer.The performances of the above mentioned discrimination methods for significant gene selection were also studied.
     A public well-known dataset,Malignant pleural mesothelioma and lung cancer gene expression database,was used in this study.Information about genes which are associated with lung adenocarcinoma was retrieved from the journal entitled "Cancer Research".The location and expression level of these genes in database were gotten, differential expression was analyzed by multiple t test.Genes with significance were retained,feature(gene) set by combining gene from PAM or RDA gene selection method was constructed,and then the feature(gene) set was used for later discriminant analysis.The methods included K nearest-neighbor classifiers(KNN),linear discriminant analysis(LDA),quadratic discriminant analysis(QDA),shrunken centroids regularized discriminant analysis(SCRDA),nearest shrunken centroid method(PAM), partial least square(PLS),generalized partial least squares(GPLS),principal component regression(PCR),ridge regression(RR),C-classification support vector machine(C-SVM),shrinkage linear discriminant analysis(SLDA),shrinkage diagonal discriminant analysis(SDDA) and back-propagation artificial neural network (BP-ANN).
     To take advantage of accumulating gene functional annotations,we proposed incorporating known gene functions into a new distance metric,which equals the sum of the measure distance and biological distance.A two-step procedure was used,first, the shrinkage distance metric was used in any distance-based clustering method,e.g. K-medoids or hierarchical clustering,to cluster the genes with known functions. Second,while keeping the clustering results from the first step for the genes with known functions,the expression-based distance metric was used to cluster the remaining genes of unknown function,assigning each of them to either one of the clusters obtained in the first step or some new clusters.The above procedures were performed by software R 2.80(R foundation for Statistical Computer,Vienna,Austria).
     Results
     Conventional method-LDA could not work when the number of genes was more than sample size.The SCRDA used much more genes than PAM for all cancer datasets.
     When comparing the performance of classifiers in two-class and multi-class diagnosis problem,SDA,SCRDA and PAM all had better classification accuracy and stability than LDA.SVM got higher accuracy than BP-ANN.Performance of KNN declined obviously when the use of feature(gene) selection was compared with that of all genes.
     Compared with conventional methods,the performance of new method improved more or less except several special cases.Average accuracy of new method improved in training and test set when compared with conventional methods in most cases,while the standard deviations of new method were usually less than those of conventional method.
     A simulation study and an application to gene function prediction for the yeast demonstrated the advantage of our proposal over the standard method.
     Conclusions
     Variable selection did have impact on the performance of the classifiers,especially on KNN.There existed obvious differences for gene selection between PAM,SCRDA and MTP.PAM selected fewer genes than SCRDA and SCRDA selected fewer genes than MTP.Regularized discriminant method,especially SLDA was superior to conventional LDA.While given the same genes,performance of PAM,SCRDA and SDA had no difference at all.SVM showed better performance than BP-ANN in some circumstances,while selection of kernel and parameter should be paid more attention.
     The method that incorporated prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise.This idea may have good future not only in practice but also in methodology.
     The accuracy and explanation validity of cluster results for gene expression profiling could be improved by combining prior knowledge,it will have a good future of application.
引文
1 Campoli M,Ferrone S.HLA antigen changes in malignant cells:epigenetic mechanisms and biologic significance.Oncogene.2008;27(45):5869-5885.
    2 Dudoit S,Fridlyand J,Speed TP.Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.J Am Stat Assoc.2002;97(457):77-87.
    3 Statnikov A,Aliferis CF,Tsamardinos I,et al.A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.Bioinformatics.2004;21(5):631-643.
    4 Gordon GJ,Jensen RV,Hsiao LL,et al.Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gege Expression Ratios in Lung Cancer And Mesothelioma.Cancer Res.2002;62(17):4963-4967.
    5 Alon U,Barkai N,Notterman DA,et al.Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays.Proc Natl Acad Sci Usa.1999;96(12):6745-6750.
    6 Bhattacharjee A,Richards WG,Staunton J,et al.Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.Proc Natl Acad Sci USA.2001;98(24):13790-13795.
    7 Parmigiani G,Garrett-Mayer ES,Anbazhagan R,et al.A cross-study comparison of gene expression studies for the molecular classification of lung cancer.Clin Cancer Res.2004;10(9):2922-2927.
    8 Khan J,Wei JS,Ringner M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nat Med.2001;7(6):673-679.
    9 Pomeroy SL,Tamayo P,Gaasenbeek M,et al.Prediction of Central Nervous System Embryonal Tumor Outcome Based on Gene Expression.Nature.2002;415(6870):436-442.
    10 Dudoit S,van der Laan MJ,Pollard KS.Multiple Testing.Part I.Single-Step Procedures for Control of General Type I Error Rates,Stat Appl Genet Mol Biol.2004;3:Article13.
    11 Pollard KS,van der Laan MJ.Resampling-based Multiple Testing:Asymptotic Control of Type Ⅰ Error and Applications to Gene Expression Data.J Stat Plan Infer.2005;125:85-100.
    12 Tibshirani RJ,Hastie T,Narasimhan B,et al.Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.Proc Natl Acad Sci USA.2002;99(10):6567-6572.
    13 Guo Y,Hastie T,Tibshirani R.Regularized Discriminant Analysis and Its Application in Microarrays,Biostatistics.2005;1(1):1-18.
    14 Dhaeseleer P,Liang S,Somogyi R.Genetic network inference:from co-expression clustering to reverse engineering.Bioinformatics.2000,16(8):707.
    15 Hausser J,Strimmer K.Entropy inference and the James-Stein estimator,with application to nonlinear gene association networks.2008.http://arxiv.org/abs/0811.3579.
    16 Opgen-Rhein R,Strimmer K.Accurate ranking of differentially expressed genes by a distributionfree shrinkage approach.Stat Appl Genet Mol Biol.2007;6:Article9.
    17 Sch(a|¨)fer J,Strimmer K.A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics.Stat Appl Genet Mol Biol.2005;4:Article32.
    18 Cortes C,Vapnik V.Support-vector network.Machine Learning.1995;20:1-25.
    19 B,Smola A.Williamson RC,Bartlett P.New support vector algorithms.Neural Comput.2000;12(5):1207-1245.
    20 Baxt WG,Skora J.Prospective validation of artificial neural network trained to identify acute myocardial infarction.Lancet.1996;347(8993):12-15.
    21 孙荣荣,汪源源,方祖祥.基于灰色关联度k-近邻法的房性心律失常识别.航天医学与医学工程.2007;20(3):193-197.
    22 Jantscheff P,Terracciano L,Lowy A,et al.Expression of CEACAM6 in resectable colorectal cancer:a factor of independent prognostic significance.J Clin Oncol.2003;21(19):3638-3646.
    23 Tai F,Pan W.Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.Bioinformatics.2007;23(23):3170-3177.
    24 Zhang H,Yu CY,Singer B,et al.Recursive partioning for tumor classification with gene expression microarray data.Proc Natl Acad Sci Usa.2001;98(12):6730-6735.
    25 Pirooznia M,Yang JY,Yang MQ,et al.A comparative study of different machine learning methods on microarray gene expression data.BMC Genomics.2008;9 Suppl 1:S13.
    26 Rapaport F,Barillot E,Vert JP.Classification of arrayCGH data using fused SVM.Bioinformatics.2008;24(13):i375-i382.
    27 Nguyen DV,Rocke DM.Tumor classification by partial least squares using microarray gene expression data.Bioinformatics.2002;18(1):39-50.
    28 Huang DS,Zheng CH.Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.Bioinformatics.2006;22(15):1855-1862.
    29 Alexandridis R,Lin S,Irwin M.Class discovery and classification of tumor samples using mixture modeling of gene expression data--a unified approach.Bioinformatics.2004;20(16):2545-2552.
    30 Gudivada RC,Qu XA,Chen J,et al.Identifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledge.J Biomed Inform.2008;41(5):717-729.
    31 Tan Y,Shi L,Tong W,et al.Multi-class cancer classification by total principal component regression(TPCR) using microarray gene expression data.Nucleic Acids Res.2005;33(1):56-65.
    32 Brown MP,Grundy WN,Lin D,et al.Knowledge-based analysis of microarray gene expression data using support vector machines.Proc Natl Acad Sci USA.2000;97(1):262-267.
    33 Ildiko F,Jerome Friedman J.A Statistical View of Some Chemometrics Regression Tools,Technometrics.1993;35(2):109-148.
    34 Wang A,Gehan EA.Gene selection for microarray data analysis using principal component analysis.Stat Med.2005;24(13):2069-2087.
    35 Ghosh D.Penalized discriminant methods for the classification of tumors from gene expression data.Biometrics.2003;59(4):992-1000.
    36 Hastie T,Tibshirani R,Buja A.Flexible discriminant analysis by optimal scoring.J Am Stat Assoc.1994;89:1255-1270.
    37 王卫东,杨静宇.采用虚拟训练样本的二次判别分析方法.自动化学报.2008;34(4):400-407.
    38 Marx BD.Iteratively reweighted partial least squares estimation for generalized linear regression.Technometrics.1996;38:374-381.
    39 Firth D.Bias reduction of maximum likelihood estimates(Corr:95V82 p66 7).Biometrika.1993;80(1):27-38.
    40 Mootha VK,Lindgren CM,Eriksson KF,et al.PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.Nat Genet.2003;34(3):267-273.
    41 Al-Shahrour F,D(?)az-Uriarte R,Dopazo J.Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information.Bioinformatics.2005;21(13):2988-2993.
    42 史步海,朱学峰,陈锦威.基于先验知识的混凝沉淀过程神经网络建模.华南理工大学学报(自然科学版).2008;36(5):113-118.
    43 刘新冈,陈武凡,陈光杰.基于先验知识和MRF随机场模型的医学图像弹性配准方法.中国生物医学工程学报.2006;25(2):152-157.
    44 Mangasarian OL,Shavlik JW,Wild EW.Knowledge-based kernel approximation.Journal of Mach Learn Res.2004;5(9):1127-1141.
    45 Ong CS,Mary X,Canu S,et al.Learning with non-positive kernels.International Conference on Machine Learning.New York:ACM,2004;69:81.
    46 Tai F,Pan W.Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms.Bioinformatics.2007;23(14):1775-1782.
    47 Larsen P,Almasri E,Chen G,et al.A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments.BMC Bioinformatics.2007;8:317.
    48 Eisen M,Spellman PI,Brown PO,et al.Cluster analysis and display of genomewide expression patterns.Proc Natl Acad Sci USA.1998;95(25):14863-14868.
    49 Ross DT,Scherf U,Eisen MB,et al.Systematic variation in gene expression patterns in human cancer cell lines.Nat Genet.2000;24(3):227-235.
    50 Tamayo P,Slonim D,Mesirov J,et al.Interpreting patterns of gene expression with selforganizing maps:methods and application to hematopoietic differentiation.Proc Natl Acad Sci USA.1999;96(6):2907-2912.
    51 Morgan BJT,Ray APG.Non-uniqueness and inversions in cluster analysis.Appl Stat.1995;44(1):117-134.
    52 Nowak G,Tibshirani R.Complementary hierarchical clustering.Biostatistics.2008,9(3):467-483.
    53 Grabmeier J,Rudolph A.Techniques of cluster algorithms in data mining.Data Min Knowl Discov.2002;6(4):303-360.
    54 Boratyn GM,Datta S,Datta S.Incorporation of biological knowledge into distance for clustering genes.Bioinformation.2007;1(10):396-405.
    55 Qi Y,Missiuro PE,Kapoor A,et al.Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo.Bioinformatics.2006;22(14):e417-e423.
    56 韩丽,史丽萍,徐治皋.基于先验知识的长短记忆RBF网络结构.华北电力大学学报.2008;35(5):78-83.
    57 池涛,黄丹枫.基于先验知识的神经元模型实现pH检测和控制.郑州大学学报(理学版).2008;40(3):32-34.
    58 印鉴,梅芳,张钢,等.基于先验知识下支持向量机P-SVM的分类算法.小型微型计算机系统.2007;28(3):474-478.
    59 Srivastava S,Zhang L,Jin R,et al.A novel method incorporating gene ontology information for unsupervised clustering and feature selection.PLoS ONE.2008;3(12):e3860.
    60 Tseng GC.Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.Bioinformatics.2007;2307):2247-2255.
    61 Datta S,Datta S.Comparisons and validation of statistical clustering techniques for microarray gene expression data.Bioinformatics.2003;19(4):459-466.
    62 Kaufman L,Rousseeuw PJ.Finding Groups in Data:An Introduction to Cluster Analysis.New York:Wiley.1990:68-125.
    63 van der Laan,Pollard MJ,Bryan KS.A new partitioning around medoids algorithm.J Stat Comput Sim.2003;73:575-584.
    64 Kanehisa M.Toward pathway engineering:a new database of genetic and molecular pathway.Sci Technol Japan.1996;59:34-38.
    65 Ashburner M,Ball CA,Blake JA,et al.Gene ontology:tool for the unification of biology.The Gene Ontology Consortium.Nat Genet.2000;25(1):25-29.
    66 Mewes HW,Amid C,Arnold R,et al.MIPS:analysis and annotation of proteins from whole genomes.Nucleic Acids Res.2004;32(Database issue):D41-D44.
    67 Wu LF,Hughes TR,Davierwala AP,et al.Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters.Nat Genet.2002;31(3):255-265.
    68 Huang D,Pan W.Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.Bioinformatics.2006;22(10):1259-1268.
    69 Cheng J,Cline M,Martin J,et al.A knowledge-based clustering algorithm driven by gene ontology.J Biopharm Stat.2004;14(3):687-700.
    70 Tavazoie S,Hughes JD,Campbell MJ,et al.Systematic determination of genetic network architecture.Nat Genet.1999;22(3):281-285.
    71 Dudoit S,Fridlyand J.A prediction-based resampling method for estimating the number of clusters in a dataset.Genome Biol.2002;3(7):research0036.
    72 王开军,李健,张军英,等.聚类分析中类数估计方法的实验比较.计算机工程.2008;34(9):198-199,202.
    73 Van Uitert M,Meuleman W,Wessels L.Biclustering sparse binary genomic data.J Comput Biol.2008;15(10):1329-1345.
    74 Hastie T,Tibshirani R.Discriminant analysis by mixture modelling.J R Stat Soc B.1995;58(1):155-176.
    1 Kerr MK,Churchill GA.Statistical design and the analysis of gene expression microarrays.Genet Res.2001;77(2):123-128.
    2 Reimers M.Statistical analysis of microarray data.Addict Biol.2005;10(1):23-35.
    3 Kerr MK,Martin M,Churchill GA.Analysis of variance for gene expression microarrays.J Comput Biol.2000;7(6):819-37.
    4 Brown MP,Grundy WN,Lin D,et al.Knowledge based analysis of microarray gene expression data by using support vector machines.Proc Natl Acad Sci.2000;97(1):262-267.
    5 Dudoit S,Yang YH,Callow MJ,et al.Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments.Stat Sinica.2002;(12):111-139.
    6 Efron B,Tibshirani R,Storey JD,et al.Empirical Bayes analysis of a microarray experiment.J Am Stat Assoc.2001;96(456):1151-1160.
    7 Yang YH,Buckley MJ,Dudoit S,et al.Comparison of methods for image analysis on cDNA microarray data.J Comput Graph Stat.2002;11 (1):108-136.
    8 Gerhold D,Lu M,Xu J,et al.Monitoring expression of genes involved indrug metabolism and toxicology using DNA microarrays.Physiol Genomics.2001;5(4):161-170.
    9 Black MA,Doerge RW.Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments.Bioinformatics.2002;18(12):1609-1616.
    10 Baldi P,Long AD.A Bayesian framework for the analysis of microarray expression data:regularized t-test and statistical inferences of gene changes.Bioinformatics.2001;17(6):509-519.
    11 Tusher VG,Tibshirani R,Chu G Significance analysis of microarrays applied to the ionizing radiation response.Proc Nat Acad Sci USA.2001;98(9):5116-5121.
    12 Wright GW,Simon RM.A random variance model for detection of differential gene expression in small microarray experiments.Bioinformatics.2003;19(18):2448-2455.
    13 Zhao Y,Pan W.Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments.Bioinformatics.2003;19(9):1046-1054.
    14 Yeung KY,Haynor DR,Ruzzo WL.Validating clustering for gene expression data.Bioinformatics.2001;17(4):309-318.
    15 Efron B,Tibshirani R,Goss V,et al.Microarrays and their use in a comparative experiment.J Am Stat Assoc.2001;96(456):1151-1160.
    16 Pan W.A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments.Bioinformatics.2002;18(4):546-54.
    17 Pan W,Lin J,Le CT.A mixture model approach to detecting differentially expressed genes with microarray data.Funct Integr Genomic.2003;3(3):117-124.
    18 Pan W.Incorporating Gene Functional Annotations in Detecting Differential gene expression.Appl Stat.2006;55(3):301-316.
    19 Butte AJ,Kohane IS.Mutual information relevance networks:functional genomic clustering using pairwise entropy measurements.Pac Symp Biocomput.2000:418-429.
    20 Eisen MB,Brown PO.DNA arrays for analysis of gene expression.Meth Enzymol.1999;303:179-205.
    21 Eisen M,Spellman PI,Brown PO,et al.Cluster analysis and display of genomewide expression patterns.Proc Natl Acad Sci USA.1998;95(25):14863-14868.
    22 DeRisi J,Penland L,Brown PO,et al.Use of a cDNA microarray to analyse gene expression patterns in human cancer.Nat Genet.1996;14(4):457-460.
    23 Ross DT,Scherf U,Eisen MB,et al.Systematic variation in gene expression patterns in human cancer cell lines.Nat Genet.2000;24(3):227-235.
    24 Tamayo P,Slonim D,Mesirov J,et al.Interpreting patterns of gene expression with self-organizing maps:methods and application to hematopoietic differentiation.Proc Natl Acad Sci USA.1999;96(6):2907-2912.
    25 Kaufman L,Rousseeuw PJ.Finding Groups in Data:An Introduction to Cluster Analysis.New York:Wiley.1990:68-125.
    26 Morgan BJT,Ray APG Non-uniqueness and inversions in cluster analysis.Appl Stat.1995;44(1):117-134.
    27 Kohonen T.Self-Organizing Maps.Berlin:Springer,1995.
    28 Kaufman L,Rousseeuw PJ.Finding Groups in Data:An Introduction to Cluster Analysis.New York:Wiley.1990:1-280.
    29 Huang D,Pan W.Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.Bioinformatics.2006;22(10):1259-1268.
    30 Kohonen T.Comparison of SOM point densities based on different criteria.Neural Comput.1999;11(8):2081-2095.
    31 Toronen P,Kolehmainen M,Wong G,et al.Analysis of gene expression data using self-organizing maps.FEBS Lett.1999;451(2):142-146.
    32 Tamayo P,Slonim D,mesiroo J,et al.Interpreting patterns of gene expression with self-organizing maps:methods and application to hematopoietic differentiation.Proc Natl Acad Sci.1999;96(6):2907-2912.
    33 Liang S,Fuhrman S,Somogyi R.Reveal,A general reverse engineering algorithm for inference of genetic network architectures.Pac Symp Biocomput.1998;3:18-29.
    34 Erhaeseleer P,Liang S,Somogyi R.Genetic network inference:from co-expression clustering to reverse engineering.Bioinformatics.2000;16(8):707-726.
    35 Cortes C,VapnikV.Support-vector networks.Mach Learn.1995;20:273-297.
    36 Tibshirani R,Hastie T,Narasimhan B,et al.Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression.Proc Natl Acad Sci USA.2002;99(10):6567-6572.
    37 Tai F,Pan W.Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.Bioinformatics.2007;23(23):3170-3177.
    38 Golub TR,Slonim DK,Tamayo P,et al.Molecular classification of cancer:class discovery and class prediction by gene expression monitoting.Science.1999;286(5439):531-537.
    39 Dysvik B,Jonassen I.J-Express:exploring gene expression data using Java.Bioinformatics.2001;17(4):369-70.
    40 R Development Core Team.R:A language and environment for statistical computing.R Foundation for Statistical Computing.Vienna,Austria.2005,ISBN 3-900051-07-0,URL http://www.R-project.org.
    41 Durinck y S,Allemeerschy J,Carey VJ,et al.Importing MAGE-ML format microanay data into BioCdnductor.Bioinformatics.2004;20(18):3641-3642.
    42 Dudoit S,Gentleman RC,Quackenbush J.Open source software for the analysis of microanay data.Biotechniques.2003;(Suppl):45-51.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700