用户名: 密码: 验证码:
计算机辅助医学影像诊断中的关键学习技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
利用计算机技术辅助放射科医生进行病例诊断,即计算机辅助诊断(Computer Aided Diagnosis, CAD)在早期乳腺癌检查中起到越来越重要的作用,能有效帮助减少乳腺癌患者的死亡率。临床上已标记病例样本难以搜集同时阴性病例样本数远大于阳性病例样本数,因而在CAD应用中存在小样本、非平衡数据集的学习问题。非平衡及小样本学习问题是关于类别严重不对称及信息欠充分表达数据集的学习性能问题。非平衡及小样本学习在许多现实应用中具有重要意义,尽管经典机器学习与数据挖掘技术在许多实际应用中取得很大成功,然而针对小样本及非平衡数据的学习对于学者们来说仍然是一个很大的挑战。本论文系统地阐述了机器学习在小样本与非平衡学习环境下性能下降的主要原因,并就目前解决小样本、非平衡学习问题的有效方法进行了综述。本论文在充分理解常用欠采样方法在处理非平衡样本时易于丢失类别信息的问题基础上,重点研究如何合理、有效处理非平衡数据。论文提出两种欠采样新方法有效提取最富含类别信息的样本以此解决欠采样引起的类别信息丢失问题。另外针对小样本学习问题,论文提出新的类别标记算法。该算法通过自动标记未标记样本扩大训练样本集,同时有效减少标记过程中易发生的标记错误。
     本论文聚焦小样本、非平衡数据的学习技术研究。围绕非平衡数据集的重采样及未标记样本的类别标记等问题展开研究。论文的主要工作包括:
     (1)针对CAD应用中标记病例样本难以收集所引起的小样本学习问题,本论文利用大量存在的未标记样本来扩充训练样本集以此解决小样本学习问题。然而样本标记过程中往往存在错误类别标记,误标记样本如同噪声会显著降低学习性能。针对半监督学习中的误标记问题,本论文提出混合类别标记(Hybrid Class Labeling)算法,算法从几何距离、概率分布及语义概念三个不同角度分别进行类别标记。三种标记方法基于不同原理,具有显著差异性。将三种标记方法有一致标记结果的未标记样本加入训练样本集。为进一步减少可能存在的误标记样本对学习过程造成的不利影响,算法将伪标记隶属度引入SVM(Support Vector Machine)学习中,由隶属度控制样本对学习过程的贡献程度。基于UCI中Breast-cancer数据集的实验结果表明该算法能有效地解决小样本学习问题。相比于单一的类别标记技术,该算法造成更少的错误标记样本,得到显著优于其它算法的学习性能。
     (2)针对常用欠采样技术在采样过程中往往会丢失有效类别信息的问题,本论文提出了基于凸壳(Convex Hull,CH)结构的欠采样新方法。数据集的凸壳是包含集合中所有样本的最小凸集,所有样本点都位于凸壳顶点构成的多边形或多面体内。受凸壳的几何特性启发,算法采样大类样本集得到其凸壳结构,以简约的凸壳顶点替代大类训练样本达到平衡样本集的目的。鉴于实际应用中两类样本往往重叠,对应凸壳也将重叠。此时采用凸壳来表征大类的边界结构对学习过程是一个挑战,容易引起过学习及学习机的泛化能力下降。考虑到缩减凸壳(Reduced Convex Hull,RCH)、缩放凸壳(Scaled Convex Hull,SCH)在凸壳缩减过程中带来边界信息丢失的问题,我们提出多层次缩减凸壳结构(Hierarchy Reduced Convex Hull,HRCH)。受RCH与SCH结构上存在显著差异性及互补性的启发,我们将RCH与SCH进行融合生成HRCH结构。相比于其它缩减凸壳结构,HRCH包含更多样、互补的类别信息,有效减少凸壳缩减过程中类别的信息丢失。算法通过选择不同取值的缩减因子与缩放因子采样大类,所得多个HRCH结构分别与稀有类样本组成训练样本集。由此训练得多个学习机,并通过集成学习产生最终分类器。通过与其它四种参考算法的实验对比分析,该算法表现出更好分类性能及鲁棒性。
     (3)针对欠采样算法中类别信息的丢失问题,本论文进一步提出基于反向k近邻的欠采样新方法,RKNN。相比于广泛采用的k近邻,反向k近邻是基于全局的角度来检查邻域。任一点的反向k近邻不仅与其周围邻近点有关,也受数据集中的其余点影响。样本集的数据分布改变会导致每个样本点的反向最近邻关系发生变化,它能整体反应样本集的完整分布结构。利用反向最近邻将样本相邻关系进行传递的特点,克服最近邻查询仅关注查询点局部分布的缺陷。该算法针对大类样本集,采用反向k最近邻技术去除噪声、不稳定的边界样本及冗余样本,保留最富含类别信息且可靠的样本作为训练样本。算法在平衡训练样本的同时有效改善了欠采样引起的类别信息丢失问题。基于UCI中Breast-cancer数据集的实验结果验证了该算法解决非平衡学习问题的有效性。相比于基于k最近邻的欠采样方法,RKNN算法得到了更好的性能表现。
Computer-aided diagnosis(CAD) which use computer technologies to assist radiologiest for diagnosis in decision-making processes can play a key role in the early detection of breast cancer and help to reduce the death rate from female breast cancer. But it is so hard to collect enough cases which are labeled by radiologist in clinic, and moreover, the number of positive cases is always much less than that of negative cases. So there always exists imbalanced and small sample learning in the CAD. The imbalanced and small sample learning problem are concerned with the perfermance of learning algorithms in the presence of severe class distribution skews and underrepresented data respectively. Learning from imbalanced and underrepresented data has great significance in the real world. Although machine learning and data mining techniques have shown great success in many applications, but the imbalanced and small sample learning are still the big challenges to researchers. In this dissertation, the main causes of degradation on the learning perfomace when the training dataset is small and highly imbalanced is explained firstly and then the popular and advanced solutions of this special learning task are systematically reviewed. Fully understanding the shortcomings of common under-sampling methods which give rise to the loss of class information, we focus on how to deal with majority class reasonalbly and in order to solve imbalanced learning problem effectively. Two novel under-sampling methods are proposed in this dissertation to avoid the loss of class information by selecting mostly representative samples effectively.In addition, a novel class-labeling algorithm is also proposed to solve the problem of the small sample learning. This algorithm expands the training dataset by labeling the unlabeled samples automaticly, and moreover, the mistakes of class labeling are decreased effectively.
     The problems of learning from imbalanced and underrepresented data are studied in this dissertation. We focus on how to deal with imbalanced learning problem by the novel resampling schemes and how to expand the training dataset by the novel class labeling scheme. The following paragraphs overview the contributions of this dissertation.
     (1)Aiming at dealing with the learning problem resulting from the underrepresented labeled training set in CAD. the proposed scheme in this dissertation is to enlarge the labeled training set by adding pseudo-labeled samples from the abundant unlabeled samples. However the mistakes always occur in the common class labeling algorithms, the samples labeled falsely would degrade the learning peformace as the noises. In order to avoid the labeling mistakes, a novel hybrid class labeling(HCL) algorithm is proposed. The HCL algorithm is formed by three different class labeling schemes from the view point of geometric similarity, probabilistic distribution and semantic concept respectively. There are the distinct differences among these three class labeling schemes which are based on the different principles. Only those unlabeled samples which get the unanimous labeling results from three different labeling schemes are added to the training set. In oder to go a step further in reducing the harmfulness for learning performance resulting from the still existing labeling mistakes, the memberships of pseudo-labeled samples are introduced to SVM in the algorithm. The contribution of pseudo-labeled sample to learning task is determined' by its membership. Classification experimental results based on Breast-cancer dataset in UCI show that the proposed algorithm is effective to deal with the small sample learning problems and has less mistakes, better classification performance comparing with the other algorithms which adopted the single labeling scheme.
     (2)To deal with the loss of class information resulting from the common under-sampling methods, a novel under-sampling scheme based on convex hull(CH) is proposed in this dissertation. The convex hull of a dataset is the smallest convex set which contain all data points in this dataset. All data points lie inside the convex polygon or polyhedron formed by its vertices. Enlighted by the geometric characteristics of the convex hull, we try to sample the convex hull from majority class and its vertices are selected to form the reduced training set to balance the training set. In view of the fact that the data points from two classes are always overlapped in real-world applications, the convex hulls of two classes are also overlapped. In this situation, the training set represented by its convex hull is a challenge for learning task which can lead to the overfitting and degradation of generalization ability. Considering that both Reduced Convex Hull(RCH) and Scaled Convex Hull (SCH) would lead to the loss of class information, a novel structure of reduced convex hull, Hierarchy Reduced Convex Hull(HRCH), is proposed.lnspired by the obvious diversity and complementarity between RCH and SCH, we mix RCH and SCH together to build HRCH. By comparison with the other reduced convex hulls, HRCH contains more diverse and complementary class information and effectively alleviates the loss of class information during the reducing process. By choosing different reduced factor and scaled factor, Several diverse HRCHs are acquired from the majority class. Then each HRCH and minority class form a training set. Several learners learning from these training sets are integrated into the final classifier. Classification experimental results reveal that the proposed algorithm has better and more robust classification performance comparing with the other four traditional algorithms.
     (3)An improved under-sampling algorithm based on reverse k nearest neighbors(RKNN) is further proposed to overcome the loss of class information resulting from the common under-sampling. By comparison with k nearest neighbors(k NN), the RKNN examine the neighborhood globally. The RKNN of a data point is not only concerned with its surrounding points, but also concerned with the other points in the dataset. The change of data distribution can result in the change of reverse nearest neighbors for each point in the dataset. The characteristic of RNN is that the relationship of neighborhood can spread through the dataset. This characteristic overcomes the shortcoming that NN is only concerned with the local distribution. This algorithm trys to find more representative and reliable samples from majority class by removing noisy and redundant majority samples using RKNN, thereby balances the training set and avoids the loss of majority class information. Classification experimental results based on Breast-cancer dataset in UCI show that the proposed algorithm is effective to deal with the class-imbalanced problems and has better classification performance comparing with the scheme of k NN.
引文
[1]Warren R. Duffy S. Comparison of single and double reading of mammograms, and change in effectiveness with experience[J]. The British journal of radiology, 1995,68(813):958-962.
    [2]Muller H, Michoux N, Bandon D, et al. A review of content-based image retrieval applications-clinical benefits and future directions[J]. International Journal of Medical Informatics,2004.73:1-23.
    [3]Guyon I, Weston J, Barnhill S. et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning,2002,46(1-3):389-422.
    [4]Tang Y, Zhang Y Q, Huang Z. Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(3):365-381.
    [5]沈晔,李敏丹,夏顺仁.计算机辅助乳腺癌诊断中的非平衡学习技术研究[J].浙江大学学报(工学版),2013,147(1):1-7.
    [6]WEISS G M. Mining with rarity:a unifying framework[J].Sigkdd Explorations, 2004,6(1):7-19.
    [7]Rahman M M, Bhattacharya P, Desai. B C. A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback[J]. IEEE Transactions on Information Technology in Biomed icine,2007,11 (1):58-69.
    [8]Shatnawi,Yousef,Al-khassaweneh, et al.Fault Diagnosis in Internal Combustion Engines Using Extension Neural Network[J]. IEEE Transactions on Industrial Electronics,2014,61 (3):1434-1443.
    [9]Roumani, Yazan F, May, et al. Classifying highly imbalanced ICU data[J]. Health Care Management Science,2013,16(2):119-128.
    [10]Yousef A, Moghadam Charkari N. A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences[J]. Journal of Theoretical Biology,2013,336:231-239.
    [11]Pang S, Zhu L, Chen G, et al. Dynamic class imbalance learning for incremental LPSVM[J]. Neural Networks,2013,44:87-100.
    [12]Wu G, Chang E. KBA:kernel boundary alignment consider imbalanced data distribution[J]. IEEE Transactions on Knowledge and Data Engineering,2005, 1(6):786-795.
    [13]Batista G E A P A, Prati R C, Monard M C.A study of the behavior of several methods for balancing machine learning training data[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):20-29.
    [14]Chawla N V, Bowyer K W, Hall L O, et al.SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research,2002,16: 321-357.
    [15]Chawla N V, Japkowicz N, Kolcz A,et al. Special Issue Learning Imbalanced Data sets[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
    [16]Chawla N, Cieslak D, Hall L, et al. Automatically countering imbalance and its empirical relationship to cost[J]. Data Mining and Knowledge Discovery, 2008,17(2):225-252.
    [17]Freitas A, Costa-Pereira A, Brazdil P. Cost-sensitive decision trees applied to medical data[J]. Data Warehousing and Knowledge Discovery(Lecture Notes Series in Computer Science),2007,4654:303-312.
    [18]Polikar R. Ensemble based systems in decision making[J]. IEEE Circuits and Systems Magazine,2006,6(3):21-45.
    [19]Rokach L. Ensemble-based classifiers[J]. Artificial Intelligence Review,2010,33 (1-2):1-39.
    [20]Rokach L. Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography[J]. Computational Statistics and Data Analysis,2009,53(12):4046-4072.
    [21]Lu W Z, Wang D. Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme[J]. Science of The Total Environment,2008,395(2-3):109-116.
    [22]Huang Y M, Hung C M, Jiau H C. Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem[J]. Nonlinear Anal. R. World Appl.,2006,7(4):720-747.
    [23]Peng X, King 1. Robust BMPM training based on second-order cone programming and its application in medical diagnosis[J]. Neural Netw, 2008,21(2-3):450-457.
    [24]Kilic K, Uncu O, Burhan Turksen I.Comparison of different strategies of utilizing fuzzy clustering in structure identification[J]. Information Sciences.2007, 177(23):5153-5162.
    [25]Japkowicz N. Stephen S. The class imbalance problem:Asystematic study[J]. Intelligent Data Analysis.2002,6(5):429-449.
    [26]Garcia V, Mollineda R A. Sanchez J S. On the k-nn performance in a challenging scenario of imbalance and overlapping[J]. Pattern Analysis and Applications, 2008,11(3-4):269-280.
    [27]Van Rijsbergen C J. Information Retrieval (2nd ed.)[M]. Newton, MA: Butterworth-Heinemann.1979.
    [28]Matthews B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme[J]. Biochimica et Biophysica Acta,1975,405(2):442-451.
    [29]Japkowicz N(Eds.). Proc.AAAI'00 workshop on Learning from Imbalanced Data Sets[C], Technical Report WS-00-05,AAAI Press,Menlo Park,CA,2000.
    [30]Chawla N V. Japkowicz N, Kolcz A(Eds.), ICML'03 Workshop Learning from Imbalanced Data SetsII[C], Washington DC, USA,2003.
    [31]Chawla N V, Japkowicz N, Kolcz A. Editorial:special issue on learning from imbalanced data sets[J]. ACM SIGKDD Explorations Newsletter, 2004,6(1):1-6.
    [32]Chawla N, Japkowicz N, Zhou Z H(Eds.), Proceedings PAKDD'09 of the Workshop on Data Mining When Classes are Imbalanced and Errors Have Costs(ICEC)[C], Bangkok,Thailand,2009.
    [33]Wang Senzhang, Li Zhoujun,Chao Wenhan, et al. Applying Adaptive Over-sampling Technique Based on Data Density and Cost- Sensitive SVM to Imbalanced Learning[A]. in Proceedings of International Joint Conference on Neural Networks[C].2012:1-8.
    [34]Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique[J], Journal of Artificial Intelligence Research,2002, 16:321-357.
    [35]Liu X Y, Wu J, Zhou Z H. Exploratory under-sampling for class-imbalance learning[A]. in Proceedings of IEEE 6th International Conference on Data Mining[C].2006:965-969.
    [36]Li Q, Yang B, Li Y, et al. Constructing support vector machine ensemble with segmentation for imbalanced datasets[J]. Neural Computing & Applications, 2013,22(1):S249-S256.
    [37]Liu X Y, Wu J, Zhou Z H.ExpIoratory under-sampling for class-imbalance learning[J]. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics,2009,39(2):539-550.
    [38]Jo T, Japkowicz N. Class imbalances versus small disjuncts[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):40-49.
    [39]Yen S, Lee Y. Cluster-based under-sampling approaches for imbalanced data distributions[J]. Expert Systems with Applications,2009,36(3):5718-5727.
    [40]Han H, Wang W Y, Mao B H. Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[A]. in Proceedings of International Conference on Intelligent Computing[C].2005:878-887.
    [41]He H, Bai Y, Garcia E A, et al. ADASYN:adaptive synthetic sampling approach for imbalanced learning[A]. in Proceedings of International Joint Conference on Neural Networks[C].2008:1322-1328.
    [42]Zeng Z Q, Gao J. Improving SVM classification with imbalance data set[A]. in 16th International Conference on Neural Information Processing [C].2009:389-398.
    [43]Batista G E, Prati R C, Monard M C.A study of the behavior of several methods for balancing machine learning training data[J]. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, 2004,5(1):20-29.
    [44]Li G Z, Meng H H, Lu W C, et al. Asymmetric bagging and feature selection for activities prediction of drug molecules[J]. BMC Bioinformatics,2008,9(6): S108-S114.
    [45]Li C. Classifying imbalanced data using a bagging ensemble variation(BEV) [A]. in Proceedings of the 45th Annual ACM Southeast Regional Conference[C]. 2007:203-208.
    [46]Hido S,Kashima H,Takahashi Y. Roughly balanced bagging for imbalanced data[J]. Statistical Analysis and Data Mining,2009,2(5-6):412-426.
    [47]Sun Y, Kamel M S, Wong A K C, et al. Cost-sensitive boosting for classification of imbalanced data[J]. Pattern Recognition,2007,40(12):3358-3378.
    [48]Sun Y. Cost-sensitive boosting for classification of imbalanced data[D]. University of Waterloo,2007.
    [49]Chawla N V. Lazarevic A, Hall L O. et al. SMOTEBoost:improving prediction of the minority class in boosting[A]. in Proceedings of 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.2003:107-119.
    [50]Seiffert C. Khoshgoftaar T M, Hulse J V. et al.RUSBoost:A hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man and Cybernetics, Part A:Systems and Humans,2010,40(1):185-197.
    [51]Guo H, Viktor H L. Learning from imbalanced data sets with boosting and data generation:The DataBoost-IM approach[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):30-39.
    [52]Elkan C. The foundations of cost-sensitive learning[A]. in Proceedings of 17th International Joint Conference on Artificial Intelligence[C].2001:973-978.
    [53]He H, Garcia E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
    [54]Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown[A].in Proceedings of ICML 2003 Workshop on Learning from Imbalanced Daasets[C].2003.
    [55]Kukar M Z, Kononenko I. Cost-sensitive learning with neural networks[A]. in Proceedings of European Conference on Artificial Intelligence[C].1998:445-449.
    [56]Chang C C, Lin C J. LIBSVM:a library for support vector machines[EB/OL]: http://www.csie.ntu.edu.tw/-cjlin/libsvm,accessed on:Feb,7tn,2011.
    [57]Raskutti B, Kowalczyk A. Extreme re-balancing for SVMs:A case study[J]. ACM SIGKDD Explorations Newsletter,2004,6(1):60-69.
    [58]Wu G, Chang E Y. Aligning boundary in kernel space for learning imbalanced data set[A]. in Proceedings of 4th IEEE Conference on Data Mining[C].2004:265-272.
    [59]Tang Y, Zhang Y Q. Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction[A]. in Proceedings of IEEE Conference on Granular Computing[C].2006:457-460.
    [60]Tang Y C, Zhang Y Q, Chawla N V, et al. SVMs modeling for highly imbalanced classification[J]. IEEE Transactions on Systems, Man, and Cybernetics. Part B, 2009,39(1):281-288.
    [61]Zhou X S, Huang T S. Small sample learning during multimedia retrieval using BiasMap[A]. in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].2001,1:11-17.
    [62]Wu K, Yap K H. Fuzzy SVM for Content-Based Image Retrieval[J]. IEEE Computational Intelligence Magazine,2006,1(2):10-16.
    [63]Zhu X J, Ghahramani G B, Lafferty J.Semi-supervised learning using Gaussian fields and harmonic functions[A]. in Proceedings of the 20th International Conference on Machine Learning (ICML'03)[C].2003:912-919.
    [64]Zhou D Y, Bousquet O, Weston J, et al. Learning with local and global consistency[A]. in Proceedings of Advances in Neural Information Processing System 16[C].2004:321-328.
    [65]Wang F, Zhan C S. Label propagation through linear neighborhoods[J]. IEEE Transactions on Knowledge and Data Engineering,2008,20(1):55-66.
    [66]Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[A]. in Proceedings of the 11th Annual Conference on Computational Learning Theory[C].1998:92-100.
    [67]Zhou Z H, Chen K J, Yuan J. Exploiting unlabeled data in content-based image retrieval[A]. in Proceedings of the 15th European Conference on Machine Learning [C].2004:525-536.
    [68]Zhou Z H, Chen K J, Dai H B. Enhancing relevance feedback in image retrieval using unlabeled data[J]. ACM Transactions on Information Systems,2006, 24(2):219-244.
    [69]Zhou Z H, Li M. Tri-training:Exploiting unlabeled data using three classifier[J]. IEEE Transactions on Knowledge and Data Engineering,2005,117(11): 1529-1541.
    [70]Li M. Zhou Z H. Improve computer-aided diagnosis with machine learning techiniques using undiagnosed samples[J]. IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans,2007,37(6):1088-1098.
    [71]Tao D C, Li X L, Stephen J M. Negative samples analysis in relevance feedback[J]. IEEE Transactions on Knowledge and Data Engineering,2007, 19(4):568-580.
    [72]Vapnik V. Statistical learning theory[M]. New York:Wiley,1998.
    [73]Abe S. Support Vector Machines for pattern classification[M]. London:Springer Vergas.2005.
    [74]Wang L, Gao Y, Chan K L. et al. Retrieval with Knowledge-driven Kernel Design:An Approach to Improving SVM-based CBIR with Relevance Feedback[A]. in Proceedings of the Tenth IEEE International Conference on Computer Vision[C].2005:1355-1362.
    [75]Vapnik V N. The nature of statistical learning theory[M]. New York:Springer Verlay.1995.
    [76]Krebel U. Pairwise classification and support vector machines[A]. in Proceedings of Advances in Kernel Methods-Support Vector Learning[C].1998:255-268.
    [77]Platt J C, Cristianini N, Shawe-Taylor J. Large margin dags for multiclass classification[A]. in Proceedings of Advances in Neural Information Processing Systems[C].2000,547-553.
    [78]Dietterich T G, Bakiri G. Solving multi-class learning problems via error-correcting output codes[J]. Journal of Artificial Intelligence Research,1995, 2:263-286.
    [79]Zhou X S, Huang T S. Small sample learning during multimedia retrieval using BiasMap[A]. in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition[C].2001,1:11-17.
    [80]Nigam K, McCallum A K, Thrun S, et al. Text classification from labeled and unlabeled documents using EM[J]. Machine Learning,2000,39(2-3):103-134.
    [81]Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions[A]. in Proceedings of the 20th International Conference on Machine Learning[C].2003:912-919.
    [82]Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency[A]. in Proceedings of Advances in Neural Information Processing Systems 16[C].2004:321-328.
    [83]Wang F, Zhan C S. Label Propagation through Linear Neighborhoods[J]. IEEE Transactions on Knowledge and Data Engineering,2008,20(1):55-66.
    [84]Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[A].in Proceedings of the 11th Annual Conference on Computational Learning Theory[C].1998:92-100.
    [85]Tong S, Chang E. Support vector machine active learning for image retrieval[A].in Proceedings of the ninth ACM international conference on Multimedia Multimedia[C].2001:107-118.
    [86]Gosselin P H, Cord M. Active Learing Methods for Interactive Image Retrieval[J]. IEEE Transactions on Image Processing,2008,17(7):1200-1211.
    [87]Chiu S. Fuzzy model identification based on cluster estimation[J]. Journal of Intelligent & Fuzzy Systems,1994,2(3):267-278.
    [88]邓乃杨.田英杰.数据挖掘中的新方法:支持向量机[M].北京:科学出版社,2006.
    [89]Theodoridis S, Mavroforakis M. Reduced Convex Hulls:A Geometric Approach to Support Vector Machines[J]. IEEE Signal Processing Magazine, 2007,24(3):119-122.
    [90]Liu Z B, Liu J G, Pan C, et al. A Novel Geometric Approach to Binary Classification Based on Sclaed Convex Hulls[J]. IEEE Transactions on Neural Networks,2009,20(7):1215-1220.
    [91]Li Y. Selecting training points for one-class support vector machines[J]. Pattern Recognition Letters,2011,32(11):1517-1522.
    [92]Xia C Y, Hsu W, Lee M L, et al. Efficient Computation of Boundary Points[J]. IEEE Transactions on Knowledge and Data Engineering,2006,18(3):289-303.
    [93]Cervantes J, Li X, Yu W, et al. Support vector machine classification for large data sets via minimum enclosing ball clustering[J]. Neurocomputing, 2008,71 (4-6):611-619.
    [94]Tsang I W. Kwok J T, Cheung P M. Core Vector Machines:Fast SVM Training on Very Large Data Sets[J]. Journal of Machine Learning Research. 2005,6:363-392.
    [95]Crisp D J. Burges C J C. A Geometric Interpretation of SVM Classifiers[A]. in Proceedings of Advances in Neural Information Processing Systems 12[C], 2000:244-250.
    [96]Bennett K P, Bredensteiner E J. Duality and Geometry in SVM Classifiers[A]. in Proceedings of 17th international conference on Machine Learning[C]. 2000:57-64.
    [97]Keerthi S S. Shevade S K, Bhattacharyya C, et al. A fast iterative nearest point algorithm for Support Vector Machine classifier design[J]. IEEE Transaction On Neural Network,2000,11 (1):124-136.
    [98]Theodoridis S, Koutroumbas K. Pattern Recognition(3rd ed)[M]. New York: Academic,2006.
    [99]Liu Y B, Wong R C W, Wang K, et al. A new approach for maximizing bichromatic reverse nearest neighbor search[J]. Knowledge and Information Systems,2013,36(1):23-58.
    [100]Korn F, Muthukrishnan S. Influence Sets Based on Reverse Nearest Neighbor Queries[A]. in Proceedings of the 2000 ACM SIGMOD international conference on Management of data [C].2000:201-212.
    [101]Fort M. Antoni Sellares J. Finding influential location regions based on reverse k-neighbor queries[J]. Knowledge-Based Systems,2013,47:35-52.
    [102]Zheng K, Huang Z, Zhou A Y, et al. Discovering the Most Influential Sites over Uncertain Data:A Rank-Based Approach[J]. IEEE Transactions on Knowledge and Data Engineering,2012,24(12):2156-2169.
    [103]Yang C, Lin K I. An index structure for efficient reverse nearest neighbor queries[A]. in Proceedings of the IEEE International Conference on Data Engineering[C].2001:485-492.
    [104]WEISS G M. Mining with rarity:a unifying framework[J]. Sigkdd Explorations, 2004,6(1):7-19.
    [105]杨风召,朱扬勇.一种有效的量化交易数据相似性搜索方法[J].计算机研究与发展,2004,41(2):361-368.
    [106]Pisano ED, Gatsonis C, Hendrick E, et al. Diagnostic performance of digitalersus film mammography for breast-cancer screening[J]. New England Journal of Medicine,2005,353(17):1773-1783.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700