基于近似约简的基因选择方法
详细信息 本馆镜像全文    |  推荐本文 | | 获取馆网全文
摘要
为了对癌症等疾病分型、诊断及进行病理学研究,利用基因微阵列数据识别疾病相关基因.考虑到了基因微阵列数据是典型的矛盾决策系统,在证明矛盾系统在近似分布集上是协调的这一事实的基础上提出了一套近似分布约简理论,讨论了不同近似分布集上约简之间的关系,提出了基于近似约简的基因选择方法.使用两组真实的基因表达数据对所提出的方法进行了验证.实验结果表明,该方法能在保持分类能力的情况下降低特征基因集的相关性,从而显著地减少特征基因的数量.
To identify disease related genes from gene expression profiles(DNA microarray) has a very important practical significance for disease,such as cancer,subtype discovery,diagnosis and pathology study.Gene selection is a critical preprocessing technique for the DNA microarray data analysis.Gene sets of interest typically selected by usual ranking methods from DNA microarray data will contain many highly correlated genes and remain high dimension.Thinking of that DNA microarray data sets are typical inconsistent decision system,with introduction of the concept of that a inconsistent decision system is consistent according to its approximate distribution,a set of notions of approximate distribution reduct are proposed.After discussed the relations between the lower and upper approximation reducts of a inconsistent decision system,a gene selection method based on approximate distribute reduct is obtained.The experimental results on two publicly available DNA microarray datasets,lung cancer and NCI60(9-tumors),show that the proposed method got an equivalent classification effect with significantly reduced number of selected gene.
引文
[1]Zheng W,Wang Y,Luo J,et al.Microarray-based meth-od to analyze methylation status of E-cadherin gene in leu-kemia[J].Clinica Chimica Acta,2008,387(1/2):97-104.
    [2]Oshlack A,Chabot AE,Smyth G,et al.Using DNAmi-croarrays to study gene expression in closely related spe-cies[J].Bioinformatics,2007,23(10):1235-1242.
    [3]魏英杰,胡盛寿,黄洁.致心律失常型右室心肌病引起的心力衰竭分子标志物的筛选[J].中国生物工程杂志,2008,28(1):1-7.Wei Yingjie,Hu Shengshou,Huang Jie.Screening ofmolecular biomarkers associated with heart failure derivedfrom arrhythmogenic right ventricular cardiomyopathy[J].China Biotechnology,2008,28(1):1-7.(in Chinese)
    [4]Schumacher M,Binder H,Gerds T.Assessment of sur-vival prediction models based on microarray data[J].Bioinformatics,2007,23(14):1768-1774.
    [5]Calza S,Raffelsberger W,Ploner A,et al.Filteringgenes to improve sensitivity in oligonucleotide microarraydata analysis[J].Nucleic Acids Research,2007,35(16).
    [6]Xiong M,Fang X,Zhao J.Biomarker identification by fea-ture wrappers[J].Genome Research,2001,11(11):1878-1887.
    [7]赵翔,向一丹,刘同明.一种基于粗糙集的决策树生成算法[J].华东船舶工业学院学报:自然科学版,2005,19(4):73-76.Zhao Xiang,Xiang Yidan,Liu Tongming.An algorithmfor decision tree construction based on rough sets[J].Journal ofEast China Shipbuilding Institute:Natural Sci-ence Edition,2005,19(4):73-76.(in Chinese)
    [8]Qi Y,Sun H,Yang X,et al.Approach to approximatedistribution reduct in incomplete ordered decision system[J].Journal of Information and Computing Science,2008,3(3):189-198.
    [9]吴绍春,郑宇,吴耿锋.基于优势关系粗糙集的地震数据约简和规则提取[J].上海大学学报:自然科学版,2007,13(5):566-570.Wu Shaochun,Zheng Yu,Wu Gengfeng.Attribute re-duction and rule extraction for earthquake data based ondominance relation rough set[J].Journal ofShanghai U-niversity:Natural Science Edition,2007,13(5):566-570.(in Chinese)
    [10]郭宇红,童云海,唐世渭.数据库中的知识隐藏[J].软件学报,2007,18(11):2782-2799.Guo Yuhong,Tong Yunhai,Tang Shiwei.Knowledgehiding in database[J].Journal ofSoftware,2007,18(11):2782-2799.(in Chinese)
    [11]刘文军,谷云东,李洪兴.基于区分矩阵求决策算法的约简[J].北京师范大学学报:自然科学版,2003,39(3):311-315.Liu Wenjun,Gu Yundong,Li Hongxing.A reductionmethod of decision algorithm based on discernibility ma-trix[J].Journal ofBeijing Normal University:NaturalScience,2003,39(3):311-315.(in Chinese)
    [12]Statnikov A,Aliferis C F,Tsamardinos I,et al.A com-prehensive evaluation of multicategory classificationmethods for microarray gene expression cancer diagnosis[J].Bioinformatics,2005,21(5):631-643.
    [13]SVM toolbox[EB/OL].(2008-02-20)[2008-09-16].http://theoval.sys.uea.ac.uk/~gcc/svm/tool-box/.

版权所有:© 2023 中国地质图书馆 中国地质调查局地学文献中心