面向文本数据的半监督学习研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向文本数据的半监督学习研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Semi-supervised Learning on Text Data
作者：朱岩
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：半监督学习 ; 文本标注 ; K近邻 ; 向量空问模型 ; 特征加权 ; 非负矩阵分解 ; 成对约束 ; 多类型惩罚 ; 半监督聚类
英文关键词：Semi-supervised learning ; Text labeling ; K nearest neighbors ; Vector
英文关键词：space model ; Term weighting ; Nonnegative matrix factorization ; Pair-
英文关键词：wise constraints ; Multi-type penalties ; Semi-supervised clustering
学位年度：2012
导师：于剑
学科代码：081202
学位授予单位：北京交通大学
论文提交日期：2012-06-01
答辩委员会主席：石纯一

摘要

随着计算机和存储技术的发展,电子文本数据呈现出海量性和杂乱无章性。为了从大量杂乱无章的文本数据中获取有用的信息,人们需要利用文本分类技术对文档数据进行有效的组织管理。传统的分类技术包括有监督的分类和无监督的聚类。有监督的分类需要大量标注样本的指导,然而标注文本数据费时费力,大规模标注不切实际。无监督的聚类由于缺少标注样本的指导,其性能还有待于进一步提高。因此,只需要少量标注样本和大量无标注样本的半监督学习应运而生,并受到人们的广泛关注。本文针对半监督文本分类领域中存在的数据标注问题、文本表示问题和学习模型设计问题进行研究。我们的创新点主要体现在以下几个方面：
     (1)由于标注文本数据费时费力,本文研究如何在受限条件下选择待标注样本及对待标注样本进行合理标注。为了使抽取的待标注样本更符合原始数据的分布,本文避免选择已标注样本的K近邻作为新一轮的标注样本。这种方法在一定程度上减少随机标注可能出现的小范围标注稠密的情况,使得分布在不同区域的样本有更多的标注机会。而对待标注文档进行人工标注时,我们考虑文档中单词包含的丰富信息,标注出每篇文档的关键词,进而得到每个类的类别关键词,将未标注文档和类别关键词进行匹配,匹配到的文档作为新的监督信息。
     (2)通过研究,我们发现文本分类中的噪音单词在各类间分布较均衡。因此,本文提出了一种文本数据加权方法tf.sdf。该方法能够对类间分布不均衡的单词赋予较高的权重,对类间分布均衡的单词赋予较低的权重,进而消除噪音单词对文本分类算法性能的影响。为了在只有少量监督信息的情况下仍可对文本数据进行合理表示,本文将特征加权方法tf.sdf与基分类器相结合,给出了一种文本表示和分类相交互的半监督学习框架。这样,合理的文本表示能提高分类算法的性能,高性能的分类结果又能促进文本的更合理表示。
     (3)考虑不同类型的成对约束在非负矩阵分解中所起的作用不同,本文提出了一种基于成对约束的多类型惩罚的非负矩阵分解。在这种新算法中,must-link约束主要控制数据压缩表示下的距离,cannot-link约束主要控制样本类别指示向量的相似度。实验表明多类型惩罚的非负矩阵分解可以提高半监督文本聚类的性能。
     (4)为了扩大非负矩阵分解的应用范围,本文提出了一种基于成对约束的相似度矩阵分解方法,并在理论上证明了其收敛性。由于相似度矩阵分解的应用范围远大于基于原始数据的非负矩阵分解,本文将提出的方法应用于一般UCI数据、文本数据和社会网络数据。实验表明提出的基于成对约束的相似度矩阵分解方法好于其它半监督聚类算法。
With the rapid development of computer and storage technology, there are more and more disordered text data. In order to obtain useful information from such data, people need text classification technology to organize text data efficiently. Traditional classification technology includes supervised classification and unsupervised clustering. Supervised classification learns from a large number of labeled data. However, it is time-consuming to label text data in a large-scale. Meanwhile, the performance of unsuper-vised clustering needs to be improved due to the lack of labeled data. In such case, semi-supervised learning, which learns from very few labeled data and a large number of unlabeled data, emerges and attracts people's attention. This dissertation engages in aca-demic research on text labeling, text representation and semi-supervised learning model designing. Our innovations are mainly reflected as follows:
     (1) As it is time-consuming to label text data, this dissertation discusses how to select text data for labeling and how to label the selected data reasonably. In order to make the distribution of the labeled data more consistent with the distribution of the orig-inal data, a sampling method is proposed to avoid selecting the K nearest neighbors of the labeled data to be the new labeled data. With the help of this method, the data located in various regions will have more opportunities to be labeled. When labeling the selected data, we consider the category information contained in the words of the documents, and mark some keywords of a document when we assign the document a label, we can easily obtain the keywords for every category. Then, we can obtain the labels of some unlabeled data if the unlabeled data are matched with the category keywords. Such labels can be considered to be the additional supervised information.
     (2) Through academic investigation, it is found that most noise words are uniformly distributed. Thus, this dissertation proposes a new term weighting method tf.sdf. The proposed method has the ability to emphasize the importance of terms that are unevenly distributed among all the classes, and weaken the importance of terms that are uniformly distributed. In other words, this method can reduce the bad effect of the noise words. Besides, in order to represent the text data with very few labeled data, this dissertation combines tf.sdf with the base classifier, and proposes a new semi-supervised learning framework with simultaneous text representation. The reasonable text representation can improve the performance of the classification process. The classification results can make the text representation more proper.
     (3) Considering that different kinds of pairwise constraints play different roles in non-negative matrix factorization (NMF), this dissertation proposes a constrained NMF method. In the new method, must-link constraints are used to control the distance of the data in the compressed form, and cannot-link constraints are used to control the encoding factors. Experimental results on real world text data sets show the good performance of the proposed method.
     (4) In order to enlarge the applications of the NMF method, this dissertation proposes a novel NMF method based on the similarity matrix. The proposed method makes use of the prior knowledge in the forms of pairwise constraints to guide the de-composition process. Theoretical analysis proves the convergence of the method. Besides, as similarity matrix factorization has wider application than traditional nonnegative matrix factorization, we test the proposed method on general UCI data sets, text data sets and social network data sets. Experimental results indicate that the proposed method is effective.

引文

[1]C. Aggarwal, P. Yu. On clustering massive text and categorical data streams. Knowledge and Information Systems,2010,24:171-179.
    [2]N. Friedman, D. Geiger, M. Goldszmidt. Bayesian network classifiers. Machine Learning, 1997,29:131-163.
    [3]G. F. Cooper, T. Dietterich. A bayesian method for the induction of probabilistic networks from data. Machine Learning,1992,9:309-347.
    [4]P. Langley, W. Iba, K. Thompson. An analysis of bayesian classifiers. Proceedings of American Association for Artificial Intelligence,1992.223-228.
    [5]C. Cortes, V. Vapnik. Support-vector networks. Machine Learning,1995,20:273-297.
    [6]M. Aizerman, E. Braverman, L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control,1964,25:821-837.
    [7]B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory,1992. 144-152.
    [8]K. Duan, S. S. Keerthi. Which is the best multiclass SVM method? An empirical study. Lecture Notes in Computer Science,2005,3541:278-285.
    [9]C. W. Hsu, C. J. Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks,2002,13:415-425.
    [10]T. Hastie, R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transac-tions on Pattern Analysis and Machine Intelligence,1996,18:607-615.
    [11]K. Q. Weinberger, L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research,2009,10:207-244.
    [12]L. K. Hansen, P. Salamon. Neural network emembles. IEEE Transactions on Pattern Analysis and Machine Intelligence,1990,12:993-1001.
    [13]R. E. Schapire. Strength of weak learnability. Machine Learning,1990,5:197-227.
    [14]Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 1995,121:256-285.
    [15]H. Drucker, R. E. Schapireand, P. Simard. Boosting performance in neural networks. Interna-tional Journal of Pattern Recognition and Artificial Intelligence,1993,7:705-719.
    [16]R. E. Schapire, Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning,1999,37:297-336.
    [17]J. B. MacQueen. Some methods for classification and analysis of multivariate observation. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,1967. 281-297.
    [18]L. Kaufman, P. J. Rousseeuwn. Finding groups in data:an introduction to cluster analysis. New York:John Wiley and Sons,1990.
    [19]B. J. Frey, D. Dueck. Clustering by passing messages between data points. Science,2007, 315:972-976.
    [20]M. Ester, H. Kriegel, J. Sander, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining,1996.226-231.
    [21]M. Ankerst, M. M. Breunig, H.-P. Kriegel, et al. OPTICS:ordering points to identify the clus-tering structure. Proceedings of the ACM SIGMOD International Conference on Management of Data,1999.49-60.
    [22]A. Hinneburg, D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining,1998.58-65.
    [23]J. C. Gower, G. J. S. Ross. Minimum spanning trees and single linkage cluster analysis. Ap-plied Statistics,1969,18:54-64.
    [24]J. Shi, J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22:888-905.
    [25]L. Hagen, A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computed-Aided Design,1992,11:1074-1085.
    [26]C. Ding, X. He, H. Zha, et al. A min-max cult algorithm for graph partitioning and data clustering. Proceedings of IEEE International Conference on Data Mining,2001.107-114.
    [27]A. Y. Ng, M. I. Jordan, Y. Weiss. On spectral clustering:analysis and an algorithm. Advances in Neural Information Processing Systems,2001,14:849-856.
    [28]I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2001.269-274.
    [29]I. S. Dhillon, S. Mallela, D. S. Modha. Information-theoretic co-clustering. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.89-98.
    [30]D. D. Lee, H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature,1999,401:788-791.
    [31]W. Xu, X. Liu, Y. Gong. Document clustering based on non-negative matrix factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and De-velopment in Information Retrieval,2003.267-273.
    [32]D. Cai, X. He, J. Han. Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering,2011,23:902-913.
    [33]F. Shahnaz, M. W. Berry, V. P. Pauca, et al. Document clustering using nonnegative matrix factorization. Information Processing and Management,2006,42:373-386.
    [34]O. Chapelle, B. Scholkopf, A. Zien. Semi-supervised learning. MIT Press,2006.
    [35]X. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison,2008.
    [36]K. Nigam, A. K. Mccallum, S. Thrun, et al. Text classification from labeled and unlabeled documents using EM. Machine Learning,2000,39:103-134.
    [37]B. M. Shahshahani, D. A. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geo-science and Remote Sensing,1994,32:1087-1095.
    [38]G. Chen, Y. Song, F. Wang, et al. Semi-supervised multi-label learning by solving a sylvester equation. Proceedings of SIAM Conference on Data Mining,2008.410-419.
    [39]G. Li, K. Chang, S. C. Hoi. Multi-view semi-supervised learning with consensus. IEEE Trans-actions on Knowledge and Data Engineering.2011.
    [40]Y. Wang, S. Chen, Z. Zhou. New semi-supervised classification method based on modified cluster assumption. IEEE Transactions on Neural Networks and Learning Systems,2012. in press.
    [41]A. Hotho, S. Staab, G. Stumme. Wordnet improves text document clustering. Proceedings of ACM SIGIR Workshop on Semantic Web,2003.541-544.
    [42]S. Banerjee, K. Ramanathan, A. Gupta. Clustering short texts using wikipedia. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2007.787-788.
    [43]S. Basu, A. Banerjee, R. Mooney. Semi-supervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning,2002.19-26.
    [44]F. Wang, C. Zhang. Label propagation through linear neighborhoods. Proceedings of the 23rd International Conference on Machine Learning,2006.985-992.
    [45]F. Wang, C. Zhang. Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering,2008,20:55-67.
    [46]T. Joachims. Transductive inference for text classification using support vector machines. Pro-ceedings of the 16th International Conference on Machine Learning,1999.200-209.
    [47]K. WagstafF, C. Cardie, S. Rogers, et al. Constrained k-means clustering with background knowledge. Proceedings of 18th International Conference on Machine Learning,2001.577-584.
    [48]F. Wang, T. Li, C. Zhang. Semi-supervised clustering via matrix factorization. Proceedings of the 8th SIAM International Conference on Data Mining,2008.1-12.
    [49]Y. Chen, M. Rege, M. Dong, et al. Non-negative matrix factorization for semi-supervised data clustering. Knowledge and Information Systems,2008,17:355-379.
    [50]I. Muslea, S. Minton, C. A. Knoblock. Selective sampling with redundant views. Proceedings of the 17th National Conference on Artificial Intelligence,2000.621-626.
    [51]I. Muslea, S. Minton, C. A. Knoblock. Active+semi-supervised learning=robust multi-view learning. Proceedings of the 19th International Conference on Machine Learning,2002. 435-442.
    [52]I. Muslea, S. Minton, C. A. Knoblock. Active learning with multiple views. Journal of Artificial Intelligence Research,2006,27:203-233.
    [53]H. T. Nguyen, A. Smeulders. Active learning using pre-clustering. Proceedings of 21st Inter-national Conference on Machine Learning,2004.623-630.
    [54]Z. Xu, K. Yu, V. Tresp, et al. Representative sampling for text classification using support vector machines. Proceedings of the 25th European Conference on Information Retrieval Re-search, volume LNCS 2633,2003.933-407.
    [55]R. Baeza-Yates, B. Ribeiro-Neto. Modern Information Retrieval. New York:Addison-Wesley-Longman,1999.
    [56]Y. Chang, I. Ounis, M. Kim. Query reformulation using automatically generated query con-cepts from a document space. Information Processing and Management,2006,42:453-468.
    [57]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注.软件学报,2007,18：565-573.
    [58]J. Gao, J. Y. Nie, H. He, et al. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. Proceedings of the 25th Annual Interna-tional ACM SIGIR Conference on Research and Development in Information Retrieval,2002. 183-190.
    [59]M. G. Jang, S. H. Myaeng, S. Y. Park. Using mutual information to resolve query transla-tion ambiguities and query term weighting. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics,1999.223-229.
    [60]李文,陈叶旺,彭鑫,et al.一种有效的基于本体的词语一概念映射方法.计算机科学,2010,37：138-142.
    [61]H. Alani, S. Kim, D. E. Millard, et al. Automatic ontology-based knowledge extraction from Web documents. Intelligent Systems,2003,18:14-21.
    [62]E. Schutz, P. Buitelaar. RelExt:A tool for relation extraction from text in ontology extension. Proceedings of 4th International Semantic Web Conference,2005.593-606.
    [63]V. Christophides, G. Karvounarakis, D. Plexousakis. Optimizing taxonomic semantic web queries using labeling schemes. Journal of Web Sematics,2004,1:207-228.
    [64]S. Tenier, Y. Toussaint, A. Napoli, et al. Instantiation of relations for semantic annotation. Proceedings of International Conference on Web Intelligence,2006.463-472.
    [65]G. Salton, A. Wong, C. S. Yang. A vector space model for automatic indexing. Communica-tions of the ACM,1975,18:613-620.
    [66]L. Jing, M. Ng, J. Huang. Knowledge-based vector space model for text clustering. Knowledge and Information Systems,2010,25:35-55.
    [67]C. Y. Suen. N-gram statistics for natural language understanding and text processing. IEEE Transactions on Pattern Analysis and Machine Intelligence,1978,1:164-172.
    [68]O. Zamir, O. Etzioni. Web document clustering:a feasibility demonstration. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1998.46-54.
    [69]G. Grefenstette, P. Tapanainen. What is a word, What is a sentence? Problems of Tokenization. Proceedings of the 3rd International Conference on Computational Lexicography,1994.79-87.
    [70]C. J. Fox. Lexical analysis and stoplists. Proceedings of Information Retrieval:Data Structures and Algorithms.1992:102-130.
    [71]The porter stemming algorithm. http://tartarus.org/martin/PorterStemmer/.
    [72]M. Porter. An algorithm for suffix stripping. Program,1980,14:130-137.
    [73]Y. Yang, J. O. Pedersen. A comparative study on feature selection in text categorization. Pro-ceedings of the 14th International Conference on Machine Learning,1997.412-420.
    [74]K. W. Church, P. Hanks. Word association norms, mutual information, and lexicography. Com-putational Linguistics,1990,16:22-29.
    [75]D. Mladenic, M. Grobelnik. Feature selection for unbalanced class distribution and naive bayes. Proceedings of the 6th International Conference on Machine Learning,1999.258-267.
    [76]F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002,34:1-47.
    [77]D. Mladenic. Feature subset selection in text learning. Proceedings of the 10th European Conference on Machine Learning,1998.95-100.
    [78]H. T. Ng, W. B. Goh, K. L. Low. Feature selection, perception learning, and a usability case study for text categorization. Proceedings of the 20th Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval,1997.67-73.
    [79]M. Lan, C. L. Tan, J. Su, et al. Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009, 31:721-735.
    [80]E. Leopold, J. Kindermann. Text categorization with support vector machines. How to repre-sent texts in input space? Machine Learning,2002,46:423-444.
    [81]G. Salton, C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management,1988,28:513-523.
    [82]C. Buckley. Automatic query expansion using SMART:TREC 3. Proceedings of the 3rd Text Retrieval Conference,1994.69-80.
    [83]H. Wu, G. Salton. A comparison of search term weighting:term relevance versus inverse docu-ment frequency. Proceedings of the 4th Annual International ACM Conference on Information Storage and Retrieval,1981.30-39.
    [84]K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation,1972,28:11-21.
    [85]K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation,2004,60:493-502.
    [86]S. Robertson. Understanding inverse document frequency:on theoretical arguments for IDF. Journal of Documentation,2004,60:503-520.
    [87]Z. Deng, S. Tang, D. Yang, et al. A comparative study on feature weight in text categorization. Proceedings of Asia-Pacific Web Conference,2004.588-597.
    [88]F. Debole, F. Sebastiani. Supervised term weighting for automated text categorization. Pro-ceedings of the ACM Symposium on Applied Computing,2003.784-788.
    [89]K. Chen, C. Zong. A new weighting algorithm for linear classification. Proceedings of Natural Language Processing and Knowledge Engineering,2003.650-655.
    [90]R. Basili, A. Moschitti, M. T. Pazienza. A text classifier based on linguistic processing. Pro-ceedings of IJCAI 99 Workshop, Machine Learning for Information Filtering,1997.571-577.
    [91]A. Fujino, N. Ueda, K. Saito. A hybrid generative/discriminative approach to semi-supervised classifier design. Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference,2005.764-769.
    [92]D. J. Miller, H. S. Uyar. A mixture of experts classifier with learning based on both labelled and unlabelled data. Proceedings of Neural Information Processing Systems,1997.571-577.
    [93]Y. Li, J. T. Kwok, Z. Zhou. Semi-supervised learning using label mean. Proceedings of the 26th International Conference on Machine Learning,2009.633-640.
    [94]O. Chapelle, A. Zien. Semi-supervised classification by low density separation. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics,2005.57-64.
    [95]O. Chapelle, M. Chi, A. Zien. A continuation method for semi-supervised SVMs. Proceedings of the 23rd International Conference on Machine Learning,2006.185-192.
    [96]Y. Li, Z. Zhou. S4VM:safe semi-supervised support vector machine. CoRR abs/1005.1545. 2010.
    [97]Y. Li, J. T. Kwok, Z. Zhou. Cost-sensitive semi-supervised support vector machine. Proceed-ings of the 24th AAAI Conference on Artificial Intelligence,2010.500-505.
    [98]D. Zhou, O. Bousquet, T. N. Lal, et al. Learning with local and global consistency. Advances in Neural Information Processing Systems,2004,16:321-328.
    [99]X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using Gaussian fields and har-monic functions. Proceedings of 20th International Conference on Machine Learning,2003. 912-919.
    [100]M. Belkin, P. Niyogi, V. Sindhwani. Manifold regularization:a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research,2006, 7:2399-2434.
    [101]A. Blum, J. Lafferty, M. R. Rwebangira, et al. Learning from labeled and unlabeled data using graph mincuts. Proceedings of 21st International Conference on Machine Learning,2004. 92-100.
    [102]M. Szummer, T. Jaakkola. Partially labeled classification with Markov random walks. Ad-vances in Neural Information Processing Systems,2002,14:945-952.
    [103]X. Zhu, Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University, June,2002.
    [104]R. I. Kondor, J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. Proceed-ings of the 19th International Conference on Machine Learning,2002.315-322.
    [105]D. Zhou, B. Scholkopf. Learning from labeled and unlabeled data using random walks. Pro-ceedings of the 26th DAGM Symposium,2004.237-244.
    [106]M. Belkin, I. Matveeva, P. Niyogi. Regularization and semi-supervised learning on large graphs. Proceedings of the 17th Annual Conference on Learning Theory,2004.624-638.
    [107]T. Joachims. Transductive learning via spectral graph partitioning. Proceedings of the 20th International Conference on Machine Learning,2003.290-297.
    [108]A. Kapoor, Y. Qi, H. Ahn, et al. Hyperparameter and kernel learning for graph based semi-supervised classification. Advances in Neural Information Systems,2005,18:627-634.
    [109]C. Ding, H. D. Simon, R. Jin, et al. A learning framework using Green's function and kernel regularization with application to recommender system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2007.260-269.
    [110]M. Balcan, A. Blum, P. P. Choi, et al. Person identification in webcam images:an applica-tion of semi-supervised learning. Proceedings of ICML Workshop on Learning from Partially Classified Training Data,2005.1-9.
    [111]M. Carreira-Perpinan, R. Zemel. Proximity graphs for clustering and manifold learning. Ad-vances in Neural Information Systems,2005,17.
    [112]C. Lee, S. Wang, F. Jiao, et al. Learning to model spatial dependency:semi-supervised dis-criminative random fields. Advances in Neural Information Systems,2006,19.
    [113]Z. Zhou, D. C. Zhan, Q. Yang. Semi-supervised learning with very few labeled training ex-amples. Proceedings of the 22nd AAAI Conference on Artificial Intelligence and the 19th Innovative Applications of Artificial Intelligence Conference,2007.675-680.
    [114]Z. Zhou, M. Li. Tri-training:exploiting unlabeled data using three classifiers. IEEE Transac-tions on Knowledge and Data Engineering,2005,17:1529-1541.
    [115]A. Blum, T. Mitchell. Combining labeled and unlabeled data with co-training. Proceedings of the Annual ACM Conference on Computational Learning Theory,1998.92-100.
    [116]K. Nigam, R. Ghani. Analyzing the effectiveness and applicability of co-training. Proceedings of the 9th ACM International Conference on Information and Knowledge Management,2000. 86-93.
    [117]S. Goldman, Y. Zhou. Enhancing supervised learning with unlabeled data. Proceedings of the 17th International Conference on Machine Learning,2000.327-334.
    [118]M. Li, Z. Zhou. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics,2007,37:1088-1098.
    [119]D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. Pro-ceedings of the 33rd Annual Meeting of the Association for Computational Linguistics,1995. 189-196.
    [120]E. Riloff, J. Wiebe, T. Wilson. Learning subjective nouns using extraction pattern bootstrap-ping. Proceedings of the 7th Conference on Natural Language Learning,2003.25-32.
    [121]B. Maeireizo, D. Litman, R. Hwa. Co-training for predicting emotions with spoken dialogue data. Proceedings of the 42th Annual Meeting of the Association for Computational Linguis-tics,2004.203-206.
    [122]C. Rosenberg, M. Hebert, H. Schneiderman. Semi-supervised self-training of object detection models. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision,2005. 29-36.
    [123]G. R. Haffari, A. Sarkar. Analysis of semi-supervised learning with the Yarowsky algorithm. Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence,2007.159-166.
    [124]M. Culp, G. Michailidis. An iterative algorithm for extending learners to a semi-supervised setting. Journal of Computational and Graphical Statistics,2008,17:545-571.
    [125]M. Li, Z. Zhou. SETRED:Self-training with editing. Proceedings of the 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining,2005.611-621.
    [126]S. Basu, A. Banerjee, R. J. Mooney. Active semi-supervision for pairwise constrained cluster-ing. Proceedings of the SIAM International Conference on Data Mining,2004.333-344.
    [127]K. Wagstaff, C. Cardie. Clustering with instance-level constraints. Proceedings of the 7th International Conference on Machine Learning,2000.1103-1110.
    [128]S. Basu, M. Bilenko, R. J. Mooney. A probabilistic framework for semi-supervised clustering. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2004.59-68.
    [129]I. E. Givoni, B. J. Frey. Semi-supervised affinity propagation with instance-level constraints. Journal of Machine Learning Research-Proceedings Track 5,2009.161-168.
    [130]肖宇,于剑.基于近邻传播算法的半监督聚类.软件学报,2008,19(11)：2803-2813.
    [131]Y. Fu, Z. Li, X. Zhou, et al. Laplacian affinity propagation for femi-supervised object classifi-cation. Proceedings of IEEE International Conference on Image Processing,2007.189-192.
    [132]Z. Li, J. Liu, X. Tang. Constrained clustering via spectral regularization. Proceedings of International Conference on Computer Vision and Pattern Recognition,2009.421-428.
    [133]Z. Li, J. Liu. Constrained clustering by spectral kernel learning. Proceedings of the IEEE International Conference on Computer Vision,2009.421-427.
    [134]X. Wang, I. Davidson. Flexible constrained spectral clustering. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2010.563-572.
    [135]F. Shang, Y. Liu, F. Wang. Learning spectral embedding for semi-supervised clustering. Pro-ceedings of 1 lth International Conference on Data Mining,2011.597-606.
    [136]T. Li, C. Ding, M. I. Jordan. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. Proceedings of the 7th IEEE International Conference on Data Mining,2007.577-582.
    [137]H. Liu, Z. Wu. Non-negative matrix factorization with constraints. Proceedings of the 24th AAAI Conference on Artificial Intelligence,2010.506-511.
    [138]R. Zhi, M. Flierl, Q. Ruan, et al. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics,2011,41:38-52.
    [139]Y. Chen, L. Wang, M. Dong. Semi-supervised document clustering with simultaneous text rep-resentation and categorization. Proceedings of Joint European Conference on Machine Learn-ing/European Conference on Principles and Practice of Knowledge Discovery in Databases, 2009.211-226.
    [140]Y. Chen, L. Wang, M. Dong. Non-negative matrix factorization for semi supervised hetero-geneous data coclustering. IEEE Transactions on Knowledge and Data Engineering,2010, 22:1459-1474.
    [141]Y. Chen, M. Rege, M. Dong, et al. Incorporating user provided constraints into document clustering. Proceedings of 11th International Conference on Data Mining,2007.103-112.
    [142]H. Lee, J. Yoo, S. Choi. Semi-supervised nonnegative matrix factorization. IEEE Signal Processing Letters,2010,17:4-7.
    [143]C. Ding, T. Li, M. I. Jordan. Convex and semi-nonnegative matrix mactorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32:45-55.
    [144]C. Ding, T. Li, W. Peng, et al. Orthogonal nonnegative matrix tri-factorizations for clustering. Proceedings of Knowledge Discovery and Data Mining,2006.126-135.
    [145]B. Long, Z. M. Zhang, P. S. Yu. Co-clustering by block value decomposition. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 2005.635-640.
    [146]X. Zhang, D. Zhao, L. Chen, et al. Batch mode active learning based multi-view text classi-fication. Proceedings of the 6th International Conference on Fuzzy Systems and Knowledge Discovery,2009.472-476.
    [147]W. Shang, H. Huang, H. Zhu, et al. A novel feature selection algorithm for text categorization. Expert Systems with Applications,2007,33:1-5.
    [148]C. Buckley. OHSUMED:an interactive retrieval evaluation and new large test collection for research. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1994.192-201.
    [149]Y. Zhao, G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning,2004,55:311-331.
    [150]S. Zhong, J. Ghosh. A comparative study of generative models for document clustering. Pro-ceedings of the SIAM Data Mining Conference, Workshop on Clustering High Dimensional Data and its Applications,2003.
    [151]P. K. Mallapragada, R. Jin, A. K. Jain, et al. Semiboost:boosting for semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31:2000-2014.
    [152]C. W. Hsu, C. C. Chang, C. J. Lin. A practical guide to support vector classification. Tech-nical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei,2003.
    [153]A. Pascual-Montano, J. M. Carazo, K. Kochi, et al. Nonsmooth nonnegative matrix factoriza-tion(nsNMF). IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28:403-415.
    [154]Z. Wu, C. Cheng, C. Li. Social and semantics analysis via non-negative matrix factorization. Proceedings of the 17th International Conference on World Wide Web,2008.1245-1246.
    [155]D. Donoho, V. Stodden. When does non-negative matrix factorization give a correct decompo-sition into parts. Advances in Neural Information Processing.2004.
    [156]D. D. Lee, H. S. Seung. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems,2000.556-562.
    [157]S. Z. Li, X. W. Hou, H. J. Zhang, et al. Learning spatially localized, parts-based representa-tion. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2001.207-212.
    [158]P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research,2004,5:1457-1469.
    [159]D. Cai, X. He, X. Wu, et al. Non-negative matrix factorization on manifold. Proceedings of the 8th IEEE International Conference on Data Mining,2008.63-72.
    [160]D. Cai, X. He, X. Wang, et al. Locality preserving nonnegative matrix factorization. Proceed-ings of the 21st International Joint Conference on Artificial Intelligence,2009.1010-1015.
    [161]C. J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computa-tion,2007,19:2756-2779.
    [162]R. Krishnapuram, H. Frigui, O. Nasraoui. Fuzzy and probabilistic shell clustering algorithms and their application to boundary detection and surface approximation. IEEE Transactions on Fuzzy Systems,1995,3:29-60.
    [163]C. Xia, W. Hsu, M. L. Lee, et al. BORDER:efficient computation of boundary points. IEEE Transactions on Knowledge and Data Engineering,2006,18:289-303.
    [164]X. Ji, W. Xu, S. Zhu. Document clustering with prior knowledge. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2006.405-4-12.
    [165]Text Retrieval Conference,1999. http://trec.nist.gov.
    [166]K. Nakai, M. Kanehisa. Expert system for predicting protein localization sites in gram-negative bacteria. Proteins,1991,11:95-110.
    [167]K. Nakai, M. Kanehisa. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics,1992,14:897-911.
    [168]V. G. Sigillito, S.P. Wing, L. V. Hutton, et al. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest,1989,10:262-26.
    [169]M. A. Little, P. E. McSharry, E. J. Hunter, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE Transactions on Biomedical Engineering,2008, 56:1015-1022.
    [170]L. Zelnik-manor, P. Perona. Self-tuning spectral clustering. Advances in Neural Information Processing Systems,2004,17:1601-1608.
    [171]W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research,1977,33:452-4-73.
    [172]D. Lusseau, K. Schneider, O. Boisseau, et al. The bottlenose dolphin community of doubt-ful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology,2003,55:396-405.
    [173]M. Girvan, M. E. J. Newman. Community structure in social and biological networks. Pro-ceedings of the National Academy of Sciences,2002,99:7821-7826.
    [174]S. Philips, J. Pitton, L. Atlas. Perceptual feature identification for active sonar echoes. Pro-ceedings of OCEANS,2006.
    [175]G. G. R. Lanckriet, M. Deng, N. Cristianini, et al. Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of Pacific Symposium Biocomputing, 2004.300-311.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700