基于显著局部特征的视觉物体表示方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于显著局部特征的视觉物体表示方法

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Visual Object Representation Based on Salient Local Features
作者：王彦杰
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：视觉物体表示 ; 显著局部特征 ; 视觉单词 ; 物体识别 ; 物体定位 ; 图像分类
英文关键词：Visual object representation ; Salient local feature ; Visual word ; Object recognition ; Object localization ; Image classification
学位年度：2010
导师：贾云得
学科代码：081203
学位授予单位：北京理工大学
论文提交日期：2010-06-01

摘要

视觉物体表示(visual object representation)是联系底层图像信息和高层语义概念的纽带,在图像感知、场景理解等计算机视觉任务中起着关键性的作用。基于局部特征的视觉物体表示具有表示能力强、对图像遮挡和背景混淆较为鲁棒的特点,近年来引起人们的高度重视。本文从局部特征的统计建模与判别学习入手,研究基于局部特征的视觉物体表示方法,主要研究包括视觉单词的统计建模与学习、类别显著局部特征检测(category salient local feature detection)以及通用和类别视觉单词(visual word)的相互协作表示。对视觉单词进行统计建模,可以度量局部特征的变化规律,更准确地表示视觉物体。检测类别显著局部特征可以快速定位物体在图像中的位置,并为建立类特定的视觉物体表示奠定基础。将类别的和通用视觉单词结合起来,充分发挥二者的优势,提高物体表示的类别可区分能力,更有利于解决图像分类问题。
     本文提出了一种视觉单词的统计建模和判别学习方法。假设属于同一视觉单词的局部特征服从高斯混合分布;采用改进的最大-最小后验伪概率判别学习方法从样本中计算高斯混合分布,利用基于最小描述长度的期望最大化方法估计该分布的成分个数。在此基础上,采用后验伪概率度量局部特征与视觉单词间的相似度,建立视觉单词软直方图(soft histogram)表示。本文给出了两种图像软直方图表示策略:一种是基于分类的软直方图方法,根据相似度最大原则建立局部特征与视觉单词间的对应关系;另一种是完全软直方图方法,根据相似度将局部特征对应到所有的视觉单词。在Caltech-4和PASCAL VOC 2006两个图像数据库上进行的实验结果表明,高斯混合模型建模的视觉单词优于高斯模型建模的视觉单词和基于聚类中心的视觉单词;利用判别学习算法可以提高相似度度量和物体识别的准确性;在相同条件下,视觉单词软直方图表示比硬直方图表示更加优越。
     本文提出了一种类别显著局部特征的检测方法,该方法结合表观(appearance)和上下文(context)两种信息定义局部特征的类别显著性,以序贯的方式提取表观显著且上下文显著的局部特征。根据局部特征属于某个物体类别的后验概率来度量表观显著性,并采用易于直接计算的后验伪概率模拟后验概率,其中的未知参数由最大-最小后验伪概率判决学习方法从仅具有类别标注的训练图像中学习得到。在显著局部特征检测的基础上建立邻域同现星状模型(co-occurrence star model),根据局部特征邻域的视觉单词同现性规律定义上下文显著性。本文将类别显著局部特征检测算法应用于物体识别与定位问题,以初始候选窗口的显著局部特征分布情况为依据快速地选择有效候选窗口,排除大量不相关的初始候选窗口,以此提高物体定位的计算效率和效果。在数据库INRIA horse和PASCAL VOC 2006上进行的显著局部特征检测和物体定位实验表明,检测的类别显著局部特征具有物体类别的可区分性,并取得较好的物体定位结果。
     本文提出了一种类别和通用视觉单词相互协作的视觉物体表示方法。假定属于某一个物体类别的局部特征服从高斯混合分布,每一个高斯成分被看做是一个类别视觉单词。对所有的类别视觉单词进行k-means聚类以产生通用视觉单词,记录通用视觉单词和类别视觉单词间的对应关系。在图像表示过程中,计算所有局部特征属于通用视觉单词的平均后验伪概率,将该值分配到与之对应的类别视觉单词,该值反映了通用视觉单词及其对应的类别视觉单词在图像中出现的可能性。将一个物体类别所有的类别视觉单词在图像中出现的可能性按照指定顺序排列,得到类特定的特征向量表示。本文提出的图像表示方法具有较强的类别可区分能力,在PASCAL VOC 2006和Caltech-101数据库上进行物体识别实验,在Corel-5K数据库上进行图像标注与检索实验,均取得了较好的图像分类实验结果。
Visual object representation bridges the gap between low-level image features and high-level semantic concepts. It plays an important role in computer vision tasks such as image recognition, scene understanding, and etc. Visual object representation based on local features has advantages of expressiveness and robustness, and attracted a lot of attention in recent years. This dissertation focuses on the statistical modeling and discriminative learning of category local features for visual object representation, including the statistical modeling and discriminative learning of visual words, category salient local feature detection, and the cooperation between category and universal visual words. By statistically modeling visual words, the accuracy of visual object representation can be improved. Through detecting category salient local features, object positions in images can be rapidly located. Using the cooperation between category and universal visual words, the better discriminative ability will be brought to object classifiers.
     An approach to the statistical modeling and discriminative learning of visual words is proposed. The distribution of local features from each visual word is assumed as the Gaussian mixture model (GMM) and learned from the training data by the Max-Min posterior Pseudo-probabilities (MMP), a discriminative learning method. The similarities between each visual word and corresponding local features are computed, summed up, and normalized to construct a soft-histogram. Two representation methods are considered in the object recognition experiments, to evaluate the proposed algorithm. The first one is called classification-based soft histogram, in which each local feature is assigned to only one visual word with maximum similarity. The second one is called completely soft histogram, in which each local feature is assigned to all the visual words. The experiments are conducted in Caltech-4 and PASCAL VOC 2006 databases.
     An algorithm is presented to detect category salient local features in images. We consider category appearance saliency as well as category context saliency of local features. Firstly, the category appearance saliency of a local feature is determined by the posterior probability of being a specific object category. Then, the local features with category appearance saliency are verified by contextual information in their neighborhood. Actually, a co-occurrence star model is constructed to measure category context saliency of local features based on the co-occurrence relationship between visual words. We apply the proposed algorithm to object localization and recognition. The experimental results on INRIA horse, PASCAL VOC 2006, and Caltech-101 datasets show that our algorithm brings better efficiency and effectiveness of object localization and improves the accuracy of object recognition.
     An image representation and classification algorithm with the cooperation between category and universal visual words is described. Category visual words are generated by assuming that local features in training images of a class are of a distribution of GMM. The number of visual words for a class is automatically determined by the minimum description length criterion. All the category visual words are clustered to obtain universal visual words. A category-specific image representation is defined by employing the cooperation between two types of visual words. The resultant feature vectors of an image vary with different classes, including their dimensionalities and elements. We integrate the proposed method into the MMP learning to perform image classification. The corresponding image classifier is evaluated in its applications to object categorization and automatic image annotation. Experimental results on PASCAL VOC 2006, Caltech-101 and Corel-5K datasets show that the proposed method is effective and promising.

引文

[1] I. Biederman. An invitation to cognitive science, visual cognition, visual object recognition [M]. MIT Press, 1995.
    [2] I. Biederman. Recognition-by-components: a theory of human image understanding[J]. Psychological Review, 1987, 94:115-147.
    [3]贾云得.机器视觉[M].北京:科学出版社,2000.
    [4] D. Forsyth, and J. Ponce. Computer vision: a modern approach[M]. Prentice Hall, 1998.
    [5] T. Tuytelaars, and K. Mikolajczyk. Local invariant feature detectors: A survey[J]. Foundations and Trends in Computer Graphics and Vision, 2008, 3(3):177-280.
    [6] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: a comprehensive study[J]. International Journal of Computer Vision, 2007,73(2):213-238.
    [7] S. Lazebnik. Local, semi-local and global models for texture, object and scene recognition[D]. PhD thesis,University of Illinois at Urbana-Champaign, 2006.
    [8] C. Harris, and M. Stephens. A combined corner and edge detector[C]. In Proc. of 4th Alvey Vision Conference, 1988:147-151.
    [9] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and evaluating interest points[C]. In Proc. of 6th International Conference on Computer Vision, New Delhi, India, 1998:230–235.
    [10] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interest point detectors[J]. International Journal of Computer Vision, 2000, 37(2):151-172.
    [11] K. Mikolajczyk, and C. Schmid. Scale and affine invariant interest point detectors[J]. International Journal of Computer Vision, 2004, 60(1):63-86.
    [12] T. Lindeberg. Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention[J]. International Journal of Computer Vision, 1993, 11(3):283-318.
    [13] T. Lindeberg. Feature detection with automatic scale selection[J]. International Journal of Computer Vision, 1998, 30(2):79-116.
    [14] T. Lindeberg, and J. Garding. Shape-adapted smoothing in estimation of 3-D shape cues from affinedeformations of local 2-D brightness structure[J]. Image and Vision Computing, 1997, 15(6):415-434.
    [15] T. Tuytelaars, and L. Van Gool. Content-based image retrieval based on local affinely invariant regions[C]. In Proc. of 3rd International Conference on Visual Information Systems, Amsterdam, Netherlands, 1999:493-500.
    [16] T. Tuytelaars, and L. van Gool. Matching widely separated views based on affine invariant regions[J]. International Journal of Computer Vision, 2004, 59(1):61-85.
    [17] J. Canny. A computational approach to edge detection[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1986, 8(6):679-698.
    [18] P.R. Beaudet. Rotationally invariant image operators[C]. In Proc. of 4th International Joint Conference on Pattern Recognition, Kyoto, Japan, 1978:579-583.
    [19] D. Lowe. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
    [20] D. G. Lowe. Object recognition from local scale-invariant features[C]. In Proc. of 7th International Conference on Computer Vision, Los Alamitos, CA, USA, 1999:1150-1157.
    [21] T. Kadir, and M. Brady. Scale, saliency and image description[J]. International Journal of Computer Vision, 2001, 45(2):83-105.
    [22] T. Kadir, M. Brady, and A. Zisserman. An affine invariant method for selecting salient regions in images[C]. In Proc. 8th European Conference on Computer Vision, Prague, Czech, 2004:345-457.
    [23] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo from maximally stable extremal regions[C]. In Proc. of 13th British Machine Vision Computing, Cardiff, UK, 2002:384-393.
    [24] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004, 22(10):761-767.
    [25] P.E. Forssen. Maximally stable colour regions for recognition and matching[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:1220-1227.
    [26] I. Laptev. Improving object detection with boosted histograms[J]. Image and Vision Computing, 2009, 27(5):534-544.
    [27] B. Leibe, and B. Schiele. Interleaved object categorization and segmentation[C]. In Proc. of British Machine Vision Computing, Norwich, UK, 2003.
    [28] K. Levi and Y. Weiss. Learning object detection from a small number of examples: the importanceof good features[C]. IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 2004:53-60.
    [29] K. Mikolajczyk, and C. Schmid. A performance evaluation of local descriptors[C]. IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 2003: 257-263.
    [30] A. Johnson, and M. Hebert. Object recognition by matching oriented points[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997: 684-689.
    [31] S. Lazebnik, C. Schmid, and J. Ponce. Sparse texture representation using affine-invariant neighborhoods[C]. IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 2003:319-324.
    [32] R. Zabih, and J. Woodfill. Non-parametric local transforms for computing visual correspondance[C]. In Proc. 3rd European Conference on Computer Vision, Stockholm, Sweden, 1994: 151-158.
    [33] A. Ashbrook, N. Thacker, P. Rockett, and C. Brown. Robust recognition of scaled shapes using pairwise geometric histograms[C]. In Proc. of British Machine Vision Computing, Birmingham, UK, 1995:503-512.
    [34] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2002, 24(4):509-522.
    [35] D. Gabor. Theory of communication[J]. Journal of the Institute of Electrical Engineers, 1946, 93(3): 429-457.
    [36] J.K.M. Vetterli. Wavelets and Subband Coding[M]. Prentice Hall, 1995.
    [37] J. Koenderink, and A. van Doorn. Representation of local geometry in the visual system[J]. Biological Cybernetics, 1987, 55(6):367-75.
    [38] L. Florack, B. Haar, J. Koenderink, and M. Viergever. General intensity transformations and second order invariants[C]. In Proc. of 7th Scandinavian Conference on Image Analysis, Aalborg, Denmark, 1991: 338-345.
    [39] W. Freeman, and E. Adelson. The design and use of steerable filters. IEEE Trans. on Pattern Analysisand Machine Intelligence, 1991, 13(9):891-906.
    [40] A. Baumberg. Reliable feature matching across widely separated views[C]. IEEE Conference on Computer Vision and Pattern Recognition, South Carolina, USA, 2000: 774-781.
    [41] F. Schaffalitzky, and A. Zisserman. Multi-view matching for unordered image sets[C]. In Proc. 7th European Conference on Computer Vision, Copenhagen, Denmark, 2002:414-431.
    [42] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints[C]. In Proc. of ECCV Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 2004.
    [43] J. Sivic, and A. Zisserman. Video google: A text retrieval approach to object matching in videos[C]. In Proc. of 9th International Conference on Computer Vision, Nice, France, 2003:1470-1477.
    [44] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features[C]. In Proc. of 14th International Conference on Computer Communications and Networks, 2005.
    [45] T. Joachims. Text categorization with suport vector machines: learning with many relevant features[C]. In Proc. of 10th European Conference on Machine Learning, 1998.
    [46] A. Bosch, A. Zissrman, and X. Munoz. Scene classification using a hybrid generative discriminative approach[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2008, 30(4):712-727.
    [47] J. Liu, and M. Shah. Scene modeling using co-clustering[C]. In Proc. of 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:214-220.
    [48] M. Marsza?ek, C. Schmid, H. Harzallah, and J. van de Weijer. Learning object representations for visual object class recognition[R]. pascal voc, 2007.
    [49] E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification[C]. In Proc. of 9th European Conference on Computer Vision, Graz, Austria, 2006:390-503.
    [50] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A thousand words in a scene[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007, 29(9):1575-1589.
    [51] K. van de Sande, T. Gevers, and C. Snoek. Evaluation of color descriptors for object and scene recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, US, 2008.
    [52] E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Describing visual scenes using transformed objects and parts[J]. International Journal of Computer Vision, 2008, 77(1-3):291-330.
    [53] L. Yang, R. Jin, C. Pantofaru, and R. Sukthankar. Discriminative cluster refinement: Improving object category recognition given limited training data[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:2536-2543.
    [54] F. Jurie, and B. Triggs. Creating efficient codebooks for visual recognition. In Proc. of the 10th International Conference on Computer Vision, Beijing, China, 2005: 604-610.
    [55] D. Nister, and H. Stewenius. Scalable recognition with a vocabulary tree[C]. IEEE Conference onComputer Vision and Pattern Recognition, New York, USA, 2006: 2161-2168.
    [56] F. Moosmann, B. Tiggs, and, F. Jurie. Randomized clustering forests for building fast and discriminative visual vocabularies[C]. In Proc. of Neural Information Processing Systems, 2006.
    [57] B. Leibe, K. Mikolajczyk, and B. Schiele. Efficient clustering and matching for object class recognition[C]. In Proc. of British Machine Vision Computing, Edinburgh, UK, 2006.
    [58] T. Tuytelaars, and C. Schmid. Vector quantizing feature space with a regular lattice[C]. In Proc. of 11 International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:670-677.
    [59] M. Boutell, J. Luo, and C. Brown. Factor-graphs for region-based whole-scene classification[C]. In Proc. of CVPR SLAM Workshop, 2006.
    [60] J. van Gemert, J. Geusebroek, C. Veenman, C. Snoek, and A. Smeulders. Robust scene categorization by learning image statistics in context[C]. In Proc. of CVPR SLAM Workshop, 2006.
    [61] A. Mojsilovic, J. Gomes, and B. Rogowitz. Semantic-friendly indexing and quering of images based on the extraction of the objective semantic cues[J]. International Journal of Computer Vision, 2004, 56(1-2):79-107.
    [62] J. Vogel, and B. Schiele. Semantic modeling of natural scenes for content-based image retrieval[J]. International Journal of Computer Vision, 2007, 72(2):133-157.
    [63] J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, and J.M. Geusebroek. Visual word ambiguity[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009.
    [64] Y.G. Jiang, C.W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval[C]. In Proc. of 6th ACM International Conference on Image and Video Retrieval, Amsterdam, Netherlands, 2007:494-501.
    [65] A. Agarwal, and B. Triggs. Multilevel image coding with hyperfeatures[J]. International Journal of Computer Vision, 2008, 78(1):15-27.
    [66] D. Batra, R. Sukthankar, and T. Chen. Learning class-specific affinities for image labelling[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [67] K. Mikolajczyk, B. Leibe, and B. Schiele. Multiple object class detection with a generative model[C]. IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2006:26-33.
    [68] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases[C]. IEEE Conference on Computer Vision and PatternRecognition, Anchorage, AK, USA, 2008.
    [69] J. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor. Improving“Bag-of-keypoints”image categorization[R]. University of Southampton, 2005.
    [70] F. Perronnin. Universal and adapted vocabularies for generic visual categorization[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2008, 30(7):1243-1256.
    [71] F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization[C]. In Proc. of 9th European Conference on Computer Vision, Graz, Austria, 2006:364-475.
    [72] H. Jegou, M. Douze, and C. Schmid. On the burstiness of visual elements[C]. IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009:1169-1176.
    [73] O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [74] G. Schindler, M.Brown, and R. Szeliski. City-scale location recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:1455-1467.
    [75] J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary[C]. In Proc. of 10th International Conference on Computer Vision, Beijing, China, 2005:1800-1807.
    [76] L. Yang, R. Jin, R. Sukthankar, and F. Jurie. Unifying discriminative visual codebook generation with classifier training for object category recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [77] M. Marszalek, and C. Schmid. Spatial weighting for bag-of-featrues[C]. IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006:2118-2125.
    [78] S. Lazebnik, C. Schimid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006:2169-2178.
    [79] J. Phibin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:1703-1710.
    [80] H. Jegou, M. Douze, C. Schmid, and P. Perez. Aggregating local descriptors into a compact image representation[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Francisco,USA, 2010.
    [81] A. Oliva, and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001, 42(3):145-175.
    [82] M. Douze, H. Jegou, H. Singh, L. Amsaleg, and C. Schmid. Evlatuation of GIST descriptors for web-scale image search[C]. In Proc. of ACM International Conference on Image and Video Retrieval, Santorini, Fira, Greece, 2009.
    [83] B. Kulis, and K. Grauman. Kernelized locality-sensitive hashing for scalable image search[C]. In Proc. of 12th International Conference on Computer Vision, 2009.
    [84] A. Torralba, R. Fergus, and Y. Weiss. Small codes and large databases for recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [85] Y. Weiss, A. Torralba, and R. Fergus. Spetral hashing[C]. In Proc. of Neural Information Processing Systems, 2008.
    [86] H. Jegou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search[J]. International Journal of Computer Vision, 2010, 87(3):316-336.
    [87] H. Jeou, H. Harzallah, and C. Schmid. A contextual dissimilarity measure for accurate and efficient image search[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007.
    [88] F. Fraundorfer, H. Stewenius, and D. Nister. A binning scheme for fast hard drive based image search[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:1719-1724.
    [89] M.A. Fischler, and R.A. Elschlager. The representation and matching of pictorial structures[J]. IEEE Trans. on Computers, 1973, C-22(1):67-92.
    [90] M. Burl, and P. Perona. Recognition of planar object classes[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 1996:223-230.
    [91] M. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry[C]. In Proc. of 5th European Conference on Computer Vision, Freiburg, Germany, 1998:628-641.
    [92] M. Weber. Unsupervised Learning of Models for Object Recognition[D]. PhD thesis, California Institute of Technology, 2000.
    [93] M. Weber, W. Einhauser, M. Welling, and P. Perona. Viewpoint-invariant learning anddetection ofhuman heads[C]. In Proc. of 4th IEEE International Conference Automatic Face and Gesture Recognition, Grenoble, France, 2000:20-27.
    [94] M. Weber, M. Welling, and P. Perona. Towards automatic discovery of object categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, South Carolina, USA, 2000:101-108.
    [95] M. Weber, M. Welling, and P. Perona. Unsupervised learning of models for recognition[C]. In Proc. of 6th European Conference on Computer Vision, Dublin, Ireland, 2000:18-32.
    [96] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning[C]. IEEE Conference on Computer Vision and Pattern Recognition, Wisconsin, USA, 2003:264-271.
    [97] F. Li, R. Fergus, and P. Perona. A Bayesian approach to unsupervised one-shot learning of object categories[C]. In Proc. of 9th International Conference on Computer Vision, Nice, France, 1998:1134-1141.
    [98] S. Agarwal, and D. Roth. Learning a sparse representation for object detection[C]. In Proc. of 7th European Conference on Computer Vision, Copenhagen, Denmar, 2002:113-130.
    [99] P. Feltzenswalb, and D. Hutenlocher. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1):55-79.
    [100] R. Fergus, P. Perona, and A. Zisserman. A sparse object category model for efficient learning and exhaustive recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:380-387.
    [101] N. Loeff, H. Arora, A. Sorokin, and D. Forsyth. Efficient unsupervised learning for localization and detection in object categories In Proc. of Neural Information Processing Systems, 2005.
    [102] B. Leibe, A. Leonardis, and B. Schiele. Combined object categorization and segmentation with an implicit shape model[C]. In Proc. of ECCV Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 2004.
    [103] B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation[J]. International Journal of Computer Vision, 2008, 77(1-3):259-289.
    [104] D. Crandall, P. Felzenszwalb, and D. Huttenlocher. Spatial priors for part-based recognition using statistical models[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:10-17.
    [105] G. Carneiro, and D. Lowe. Sparse flexible models of local features[C] In Proc. of 9th EuropeanConference on Computer Vision, Graz, Austria, 2006:29-43.
    [106] S. Lazebnik, C. Schimid, and J. Ponce. A maximum entropy framework for part-based texture and object recognition[C]. In Proc. of 10th International Conference on Computer Vision, Beijing, China, 2005:832-838.
    [107] G. Bouchard, and B. Triggs. Hierarchical part-based visual object categorization[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:710-715.
    [108] A. Kushal, C. Schmid, and J. Ponce. Flexible object models for category-level 3D object recogniton[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:1447-1454.
    [109] R. Fergus. Visual Object Category Recognition[D]. PhD thesis,University of Oxford, 2005.
    [110]林谅.基于统计建模和计算的视觉物体识别问题研究[D].北京:北京理工大学,2008.
    [111] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel[C]. In Proc. of 6th ACM International Conference on Image and Video Retrieval, Amsterdam, Netherlands, 2007:401-408.
    [112] K. Grauman and T. Darrell. The pyramid match kernel: discriminative classification with sets of image features[C]. In Proc. of 10th International Conference on Computer Vision, Beijing, China, 2005:1458-1465.
    [113] G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset[R]. Technical Report, California Institute of Technology, 2005.
    [114] A. Kumar and C. Sminchinsescu. Support kernel machines for object recognition[C]. In Proc. of 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:1771-1778.
    [115] Y. Lin, T. Liu, and C. Fuh. Local ensemble kernel learning for object category recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:952-959.
    [116] M. Varma, and D. Ray. Learning the discriminative power invariance trade-off[C]. In Proc. of 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:285-292.
    [117] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Generic Object Recognition with Boosting[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2006, 28(3):416-431.
    [118] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak hypotheses and boosting for generic object detection and recognition[C]. In Proc. of 8th European Conference on Computer Vision, Prague, Czech, 2004:71-84.
    [119] A. Bosch, A. Zisserman, and X. Munoz. Image classification using random forests and ferns[C]. In Proc. of 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:1779-1786.
    [120] A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell. Active learning with gaussian processes for object categorization[C]. In Proc. of 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007:1-8.
    [121] C.K.I. Williams, and D. Barber. Bayesian classification with Gaussian processes[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1998, 20(12):1342-1351.
    [122] A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell. Gaussian Processes for Object Categorization[J]. International Journal of Computer Vision, 2010, 88(2):169-188.
    [123] H. Kuck, and N. de Freitas. Learning about individuals from group statistics[C]. In Proc. Of Uncertainty in Artificial Intelligence, Edinburgh, Scotland, 2005.
    [124] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine learning, 2001, 42(1-2):177-196.
    [125] J. Sivic, B. Russell, A. Efros, A. Zisserman, and B. Freeeman. Discovering objects and their location in images[C]. In Proc. of 10th International Conference on Computer Vision,. Beijing, China, 2005: 370-377.
    [126] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning object categories from google’s image search[C]. In Proc. of 10th International Conference on Computer Vision,. Beijing, China, 2005:1816-1823.
    [127] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation In Proc. of Neural Information Processing Systems, 2003.
    [128] L. Fei-Fei, and P. Perona. A Bayesian hierarchical model for learning natural scene categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:524-531.
    [129] C.H. Lampert, M.B. Blaschko, and T. Hofmann. Beyond sliding windows: object localization by efficient subwindow search[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [130] O. Chum, and A. Zisserman. An exemplar model for learning object class[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minnesota, USA, 2007:620-627.
    [131] V. Ferrari, L. Fevrier, F. Jurie, and C. Schimid. Groups of adjacent contour segments for object detection[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2008, 30(1):36-51.
    [132] N. Dalal, and B. Triggs. Histograms of oriented gradients for human detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:886-893.
    [133] H. A. Rowley, S. Baluja, and T. Kanade. Human Face detection in visual scenes[C]. In Proc. of Neural Information Processing Systems, 1996.
    [134] G. Heitz, and D. Koller. Learning spatial context: Using stuff to find things[C]. In Proc. of 10th European Conference on Computer Vision, Marseille, France, 2008:30-43.
    [135] P. Viola, and M. J. Jones. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2):137-154.
    [136] M. Inoue. Image retrieval: research and use in the information explosion[J]. Progress in Informatics. 2009, 6:3-14.
    [137] M. Swain, and B. Ballard. Color indexing[J]. International Journal of Computer Vision, 1991, 7(1):11-32.
    [138] M. Flickner, H. Sawhney, et al. Query by image and video content: The QBIC system[J]. IEEE Computer, 1995, 28(9):23-32.
    [139] J. Huang, S. Kumar, M. Mitra, W.J. Zhu, and R. Zabih. Spatial color indexing and applications[J]. International Journal of Computer Vision, 1999, 35(3):245-268.
    [140] B. manjunath, and W.Y. Ma. Texture features for browsing and retrieval of iamge data[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1996, 18(8):837-842.
    [141] J. Wang, G. Wiederhold, O. Firschein, and S. Wei. Content-based image indexing and searching using Daubechies’wavelets[J]. International Journal on Digital Libraries, 1998, 1(4):311-328.
    [142] C. Schmid, and R. Mohr. Local grayvalue invariants for image retrieval[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1997, 19(5):530-535.
    [143] T. Tuytelaars, L. Van Gool. Content-based image retrieval based on local affinely invariant regions[C]. In proc. of International Conference on Visual Information Systems, 1999.
    [144] J. Shi, and J. Malik. Normalized cuts and image segmentation[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
    [145] C. Carson, S. Belongie, H. Greenspan, and J. Malik. Image segmentation using expectation maximization and its application to image querying[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2002, 24(8):1026-1038.
    [146] M. Bar, and S. Ullman. Spatial context in recognition[J]. Perception, 1993, 25:343-352.
    [147]. P. Lipson, E. Grimson, and P. Sinha. Configuration based scene classification and image indexing[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997: 1007-1013.
    [148] J. Yuan, Y. Wu, and M. Yang. Discovery of collocation patterns: from visual words to visual phrases[C]. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA 2007: 2088-2095.
    [149] Y.T. Zheng, M. Zhao, S.Y. Neo, T.S. Chua, and Q. Tian. Visual synset: towards a higher-level visual representation[C]. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.
    [150] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio. Robust Object recognition with cortex-like mechanisms[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007, 29(3):411-426.
    [151] C. Siagian, and L. Itti. Rapid biologically-inspired scene classification using features shared with visual attention[J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007, 29(2):300-312.
    [152] E. Meyers, L. Wolf. Using biologically inspired features for face processing[J]. International Journal of Computer Vision, 2008, 76(1):93-104.
    [153] P. Moerland. A comparison of mixture models for density estimation[C]. In Proc. of 9th International Conference on Artificial Neural Networks, Edinburgh, UK, 1999:25-30.
    [154] G.J. McLachlan, and K.E. Basford. Mixture models: Inference and applications to clustering[M]. Statistics: Textbooks and monographs. New York: Dekker, 1988.
    [155] A. Dempster, N. Laird, D. Rubin. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society, 1977, 39(1):1-38.
    [156] X. Liu, Y. Jia, X. Chen, Y. Deng, and H. Fu. Image classification using the Max-Min Posterior Pseudo-probabilities method[R]. Technical Report, BIT-CS-20080001, Beijing Institute of Technology. http://www.mcislab.org.cn/member/~xiabi/papers/2008_1.PDF , 2008.
    [157] P.J. Escamilla-Ambrosio, and N. Lieven. Soft-histogram degradation analysis of a tie bar of roter-head structure[J]. Journal of Aircraft, 2008, 45(6):2161-2164.
    [158] G. Carneiro, A.B. Chan, P.J. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval[J]. IEEE Trans. on Pattern Analysis and Machine intelligence, 2007, 29(3): 394-410.
    [159] V.N. Vapnik. The nature of statistical learning theory [M]. New York: Wiley, 1998.
    [160] B.H. Juang, and S. Katagiri. Discriminative learning for minimum error classification[J]. IEEE Trans. on Acoust, Speech, Signal Process, 1992, 40(12): 3043-3054.
    [161] P.F. Brown, The acoustic-modeling problem in automatic speech recognition[D]. PhD thesis, Carnegie-Mellon University, 1987.
    [162] H. Jiang, and X. Li. Incorporating training errors for large margin HMMs under semi-definite programming framework[C]. In Proc. of International Conference on Acoustics, Speech, and Signal Processing, Honolulu, HI, USA, 2007:629-632.
    [163] M.H. Hansen, and B. Yu. Model selection and the principle of minimum description length[J]. Journal of American Statistical Association, 2001, 96: 746-774.
    [164] M. Everingham, A. Zisserman, C.K.I. Williams, and L. van Gool. The PASCAL visual object classes challenge 2006 (VOC2006) Results[R], Technical Report, 2006.
    [165] G. Dorko and C. Schmid. Object class recognition using discriminative local features[J]. IEEE PAMI, (Submitted), 2004.
    [166] D. Gao and N. Vasconcelos. Discriminant saliency for visual recognition from clutterd scenes[C]. In Proc. of Neural Information Processing Systems, 2004.
    [167] F. Jurie, and C. Schmid. Scale-invariant shape features for recognition of object categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, US, 2004:90-96.
    [168] T. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple-instance problem with axis-parallel rectangles[J]. Artificial Intelligentce, 1997, 89:31-71.
    [169] Y. Chen, J. Bi, and J.Z. Wang. MILES: Multiple-instance learning via embedded instance selection[J]. IEEE Trans. on Pattern Analysis and Machine intelligence, 2006, 28(12): 1931-1947.
    [170] S. Ray, and M. Craven. Supervised versus Multiple instance learning: an empirical comparison. International Conference on Machine Learning, Bonn, Germany, 2005.
    [171] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories[J]. Computer Vision and Image Understanding, 2007, 106(1):59-70.
    [172] P. Duygulu, K. Barnard, J. F. G. Freitas, and D. A.Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary[C]. In Proc. of 7th European Conference on Computer Vision, Copenhagen, Denmark, 2002:97-112.
    [173] K. Grauman, and T. Darrell. Efficient image matching with distributions of local invariant feature[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:627-634.
    [174] A.C. Berg, T.L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondence[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005:26-33.
    [175] S.L. Feng, R. Manmatha and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation.In Proc. of IEEE Conf. Computer Vision and Pattern Recognition, Washington, DC, USA, June. 2004:1002–1009.
    [176] Y. Mori, H. Takahashi and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words[C]. In Proc. of Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.
    [177] V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures[C]. In Proc. of Neural Information Processing Systems, 2003.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700