用户名: 密码: 验证码:
牛SNP芯片分型检出率和分型错误率对基因型填充准确率的影响
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Impacts of SNP genotyping call rate and SNP genotyping error rate on imputation accuracy in Holstein cattle
  • 作者:李智 ; 何俊 ; 蒋隽 ; Richard ; G.Tait ; Jr. ; Stewart ; Bauck ; 过伟 ; 吴晓林
  • 英文作者:Zhi Li;Jun He;Jun Jiang;Richard G.Tait Jr.;Stewart Bauck;Wei Guo;Xiao-Lin Wu;College of Animal Science and Technology, Hunan Agricultural University;Department of Animal Science, University of Wyoming;Biostatistics and Bioinformatics,Neogen GeneSeek;Department of Animal Sciences, University of Wisconsin;
  • 关键词:SNP芯片 ; 基因型分型 ; 填充准确率 ; 检出率 ; 错误率
  • 英文关键词:SNP chip;;genotyping;;imputation accuracy;;call rate;;error rate
  • 中文刊名:YCZZ
  • 英文刊名:Hereditas
  • 机构:湖南农业大学动物科技学院;美国怀俄明大学动物科学系;美国纽勤公司生物信息与生物统计部;美国威斯康星大学动物科学系;
  • 出版日期:2019-05-30 14:14
  • 出版单位:遗传
  • 年:2019
  • 期:v.41
  • 基金:湖南省百人计划项目;; 湖南省重点研发计划项目(编号:2018NK2081);; 湖南省畜禽安全协同创新中心项目;; 长沙市科技计划重点项目(编号:kq1801014)资助~~
  • 语种:中文;
  • 页:YCZZ201907007
  • 页数:9
  • CN:07
  • ISSN:11-1913/R
  • 分类号:82-90
摘要
SNP芯片已被广泛应用于动植物的遗传研究和生产实践,其基因分型的准确性至关重要。但在实际应用中,常有一定数量的基因型因缺失而需要去估计(填充)。此外,由于各种原因,又常常需要在不同芯片的基因型之间相互填充彼此没有的SNP基因型,或从低密度SNP填充到高密度SNP基因型。因此,基因型填充准确率直接影响后续数据分析的准确性和可靠性。为深入了解基因型填充准确率的影响因素,本研究利用20 116头美国荷斯坦牛的50K SNP芯片基因分型数据,在SNP分型检出率与错误率存在相关和没有相关两种情形下,分别评估了上述两个因素对下游基因型填充准确率的影响。当两者不相关时,模拟的SNP分型检出率从100%降低到50%,SNP分型错误率由0%提升到50%。当两者存在相关时,基因分型的检出率和错误率之间的关系是基于一个实际数据中这两个变量之间的线性回归方程来确定,即模拟的SNP分型检出率从100%降低到50%,SNP分型错误率从0%升高到13.35%。最后,采用5折交叉验证的方法评估基因型填充的准确率。结果表明,当原始数据的SNP分型检出率与错误率彼此独立发生时,基因型填充的错误率受原始SNP分型检出率影响不大(P>0.05),却随着原始SNP分型错误率的升高而显著提高(P<0.01)。当原始数据的SNP分型检出率与错误率存在负相关时,基因型填充的错误率随着原始SNP分型检出率的降低而显著提高(P<0.01)。在这两种情形下,建议SNP分型检出率应在90%以上,基因型填充准确率才能不低于98%。该结果可为提升实际的SNP分型和下游数据分析的质控提供参考依据。
        Single nucleotide polymorphism(SNP) chips have been widely used in genetic studies and breeding applications in animal and plant species. The quality of SNP genotypes is of paramount importance. More often than not,there are situations in which a number of genotypes may fail, requiring them to be imputed. There are also situations in which ungenotyped loci need to be imputed between different chips, or high-density genotypes need to be imputed based on low-density genotypes. Under these circumstances, the validity and reliability of subsequent data analyses is subject to the accuracy of these imputed genotypes. For justifying a better understanding of factors affecting imputation accuracy, in the present study, the impacts of SNP genotyping call rate and SNP genotyping error rate on the accuracy of genotype imputation were investigated under two scenarios in 20 116 U.S. Holstein cattle, each genotyped with a GGP 50K SNP chip.When the two factors were not correlated in scenario 1, simulated genotyping call rate varied from 50% to 100% and simulated genotyping error rate changed from 0% to 50%, with both factors being independent of each other. In scenario 2,genotyping error rates were correlated with genotyping call rate, and the relationship was set up by fitting a linear regression model between the two variables on a real dataset. That is, the simulated SNP call rate varied from 100% to 50% whereas the SNP genotyping rate changed from 0% to 13.55%. Finally, a 5-fold cross-validation was used to assess the subsequent imputation accuracy.The results showed that when original SNP genotyping call rate were independent of SNP genotyping error rate, the imputation accuracy did not change significantly with the original genotyping call rate(P>0.05), but it decreased significantly as the genotyping error rate increased(P<0.01). However, when original genotyping call rate was negatively correlated with genotyping error rate, the imputation error increased with elevated original genotyping error rate. In both scenarios, genotyping call rate needs to be no less than 0.90 in order to obtain 98% or higher genotype imputation accuracy. The present results can provide guidance for establishing quality assurance criteria for SNP genotyping in practice.
引文
[1]Blasco A,Toro MA.A short critical history of the application of genomics to animal breeding.Livest Sci,2014,166(8):4-9.
    [2]Thomson M.High-throughput SNP genotyping to accelerate crop improvement.Plant Breed Biotechnol,2014,2(3):195-212.
    [3]Scott LJ,Mohlke KL,Bonnycastle LL,Willer CJ,Li Y,Duren WL,Erdos MR,Stringham HM,Chines PS,Jackson AU,Prokunina-Olsson L,Ding CJ,Swift AJ,Narisu N,Hu T,Pruim R,Xiao R,Li XY,Conneely KN,Riebow NL,Sprau AG,Tong M,White PP,Hetrick KN,Barnhart MW,Bark CW,Goldstein JL,Watkins L,Xiang F,Saramies J,Buchanan TA,Watanabe RM,Valle TT,Kinnunen L,Abecasis GR,Pugh EW,Doheny KF,Bergman RN,Tuomilehto J,Collins FS,Boehnke M.Agenome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants.Science,2007,316(5829):1341-1345.
    [4]Duerr RH,Taylor KD,Brant SR,Rioux JD,Silverberg MS,Daly MJ,Steinhart AH,Abraham C,Regueiro M,Griffiths A,Dassopoulos T,Bitton A,Yang H,Targan S,Datta LW,Kistner EO,Schumm LP,Lee AT,Gregersen PK,Barmada MM,Rotter JI,Nicolae DL,Cho JH.A genomewide association study identifies IL23R as an inflammatory bowel disease gene.Science,2006,314(5804):1461-1463.
    [5]Meuwissen THE,Hayes BJ,Goddard ME.Prediction of total genetic value using genome-wide dense marker maps.Genetics,2001,157(4):1819-1829.
    [6]Tan C,Bian C,Yang D,Li N,Wu ZF,Hu XX,Li MZ.Application of genomic selection in farm animal breeding.Hereditas(Beijing),2017,39(11):1033-1045.谈成,边成,杨达,李宁,吴珍芳,胡晓湘,李明洲.基因组选择技术在农业动物育种中的应用.遗传,2017,39(11):1033-1045.
    [7]He J,Qian CS,Richard T Jr.,Bauck S,Wu XL.Estimating genomic breed composition of individual animals using selected SNPs.Hereditas(Beijing),2018,40(4):305-314.何俊,钱长嵩,Richard G.Tait Jr.,Stewart Bauck,吴晓林.SNP芯片数据估计动物个体基因组品种构成的方法及应用.遗传,2018,40(4):305-314.
    [8]van Eenennaam AL,Weigel KA,Young AE,Cleveland MA,Dekkers JCM.Applied animal genomics:results from the field.Annu Rev Anim Biosci,2013,2(2):105-139.
    [9]Wiggans GR,Cole JB,Hubbard SM,Sonstegard TS.Genomic selection in dairy cattle:the USDA experience.Annu Rev Anim Biosci,2017,5(1):309-327.
    [10]Akdemir D,Sánchez JI.Efficient breeding by genomic mating.Front Genet,2016,7:210.
    [11]Marchini J,Howie B.Genotype imputation for genomewide association studies.Nat Rev Genet,2010,11(7):499-511.
    [12]He S,Ding XD,Zhang Q.Comparison of different genotype imputation methods.Chin J Anim Sci,2013,49(23):95-100.何桑,丁向东,张勤,基因型填充方法介绍及比较.中国畜牧杂志,2013,49(23):95-100.
    [13]Aittokallio T.Dealing with missing values in large-scale studies:microarray data imputation and beyond.Brief Bioinform,2009,11(2):253-264.
    [14]Weigel KA,de los Campos G,González-Recio O,Naya H,Wu XL,Long N,Rosa GJ,Gianola D.Predictive ability of direct genomic values for lifetime net merit of holstein sires using selected subsets of single nucleotide polymerphism markers.J Dairy Sci,2009,92(10):5248-5257.
    [15]Felipe VP,Okut H,Gianola D,Silva MA,Rosa GJ.Effect of genotype imputation on genome-enabled prediction of complex traits:an empirical study with mice data.BMCGenet,2014,15(1):149.
    [16]Zhang Z,Druet T.Marker imputation with low-density marker panels in dutch holstein cattle.J Dairy Sci,2010,93(11):5487-5494.
    [17]Wu XL,Gianola D,Hu ZL,Reecy JM.Meta-analysis of quantitative trait association and mapping studies using parametric and non-parametric models.J Biom Biostat,2011,1:1-9.
    [18]Lopes FB,Wu XL,Li H,Xu J,Perkins T,Genho J,Ferretti R,Tait RG Jr,Bauck S,Rosa GJ.Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.J Anim Breed Genet,2018,135(1):14-27.
    [19]Li Y,Willer C,Sanna S,Abecasis G.Genotype imputation.Annu Rev Genomics Hum Genet,2009,10:387-406.
    [20]Chen L,Li C,Sargolzaei M,Schenkel F.Impact of genotype imputation on the performance of GBLUP and Bayesian methods for genomic prediction.PLoS One,2014,9(7):e101544.
    [21]Pimentel ECG,Edel C,Emmerling R,G?tz KU.How imputation errors bias genomic predictions.J Dairy Sci,2015,98(6):4131-4138.
    [22]Browning BL,Browning SR.A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.Am JHum Genet,2009,84(2):210-223.
    [23]Ventura RV,Lu D,Schenkel FS,Wang Z,Li C,Miller SP.Impact of reference population on accuracy of imputation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle.J Anim Sci,2014,92(4):1433-1444.
    [24]Roshyara NR,Scholz M.Impact of genetic similarity on imputation accuracy.BMC Genet,2015,16(1):90.
    [25]Purfield DC,McClure M,Berry DP.Justification for setting the individual animal genotype call rate threshold at eighty-five percent.J Anim Sci,2016,94(11):4558-4569.
    [26]Boison SA,Santos DJA,Utsunomiya AHT,Carvalheiro R,Neves HHR,O’Brien AMP,Garcia JF,S?lkner J,da Silva MVGB.Strategies for single nucleotide polymorphism(SNP)genotyping to enhance genotype imputation in Gyr(Bos indicus)dairy cattle:comparison of commercially available SNP chips.J Dairy Sci,2015,98(7):4969-4989.
    [27]Ventura RV,Miller SP,Dodds KG,Auvray B,Lee M,Bixley M,Clarke SM,McEwan JC.Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population.Genet Sel Evol,2016,48(1):71.
    [28]Mitt M,Kals M,P?rn K,Gabriel SB,Lander ES,Palotie A,Ripatti S,Morris AP,Metspalu A,Esko T,M?gi R,Palta P.Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.Eur J Hum Genet,2017,25(7):869-876.
    [29]Hess MA,Rhydderch JG,LeClair LL,Buckley RM,Kawase M,Hauser L.Estimation of genotyping error rate from repeat genotyping,unintentional recaptures and known parent-offspring comparisons in 16 microsatellite loci for brown rockfish(Sebastes auriculatus).Mol Ecol Resour,2012,12(6):1114-1123.
    [30]Wall JD,Tang LF,Zerbe B,Kvale MN,Kwok PY,Schaefer C,Risch N.Estimating genotype error rates from high-coverage next-generation sequence data.Genome Res,2014,24(11):1734-1739.
    [31]Wang J.Estimating genotyping errors from genotype and reconstructed pedigree data.Methods Ecol Evol,2018,9(1):109-120.
    [32]Sargolzaei M,Chesnais JP,Schenkel FS.A new approach for efficient genotype imputation using information from relatives.BMC Genomics,2014,15(1):478.
    [33]Calus MP,Bouwman AC,Hickey JM,Veerkamp RF,Mulder HA.Evaluation of measures of correctness of genotype imputation in the context of genomic prediction:a review of livestock applications.Animal,2014,8(11):1743-1753.
    [34]Wu XL,Xu J,Feng G,Wiggans GR,Taylor JF,He J,Qian C,Qiu J,Simpson B,Walker J,Bauck S.Optimal design of low-density SNP arrays for genomic prediction:algorithm and applications.PLoS One,2016,11(9):e0161719.
    [35]Zhang B,Zhi D,Zhang K,Gao G,Limdi NN,Liu N.Practical consideration of genotype imputation:sample size,window size,reference choice,and untyped rate.Stat Interface,2011,4(3):339-352.
    [36]Spits C,Le Caignec C,de Rycke M,van Haute L,van Steirteghem A,Liebaers I,Sermon K.Whole-genome multiple displacement amplification from single cells.Nat Protoc,2006,1(4):1965-1970.
    [37]Hao K,Li C,Rosenow C,Wong WH.Estimation of genotype error rate using samples with pedigree information-an application on the GeneChip Mapping10K array.Genomics,2004,84(4):623-630.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700