用户名: 密码: 验证码:
图像检索中的浅语义鸿沟词库构建方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,图像检索成为多媒体信息检索领域的重要研究课题。“语义鸿沟”,即用户从视觉数据中提取的信息和用户自身对视觉数据的理解之间的不一致性,是图像检索中一个根深蒂固的问题。在基于语义的图像检索技术中,通过物体识别或者自动标注等建立有效语义概念模型都是为了缩短语义鸿沟。定义一个良好的语义概念库是这些方法中数据搜集、模型建立的第一步也是极为关键的一步。
     语义概念所固有的语义鸿沟不尽相同,目前的信息处理与图像理解方法还远远达不到提取图像中抽象(深层)语义的要求。更现实的途径是设法找出那些计算机容易学习的具有较浅语义鸿沟的语义概念,这些语义概念更有助于概念检测模型的训练,继而进行语义的识别及自动标注。因此,找出浅语义鸿沟词库对于基于语义的图像检索技术有着重要的意义,其涉及到两个主要问题:1)如何定义浅语义鸿沟的“浅”,也就是说如何有效地衡量语义鸿沟?2)如何自动找出此类语义?本课题所做的工作就是创新性地解决这两个问题,最终构建浅语义鸿沟词库,该语义词库能在研究大规模图像检索时的数据搜集、特征选择、构建检索模型、图像标注等方面提供有用的建议。
     本文首先阐述了构建浅语义鸿沟词库的基本框架:1)对240万幅互联网图像提取语义文本特征以及多种低层视觉特征,分别建立有效索引。2)在不同的语义鸿沟模型下,对每一幅图片计算其视觉-文本置信度,也就是衡量该图像及其近邻在视觉特征空间和文本特征空间下两种分布的一致性。3)利用仿射传播聚类算法对具有最高视觉-文本置信度的图像进行聚类。4)从聚类结果中进行基于文本内容的关键词提取工作,相关度最高的关键词则是最具有浅语义鸿沟的语义概念。
     针对不同视觉空间下语义鸿沟不同的情况,本文从多个低层视觉空间出发,分别基于颜色特征、纹理特征以及颜色纹理综合特征,构建了对应的浅语义鸿沟词表。比较分析其异同点,得到基于视觉特征的浅语义鸿沟词库,它能为图像检索中语义概念的特征选择提供有效的建议。
     针对图像在视觉空间和文本空间的分布不一致性,本文提出了两种对偶的语义鸿沟模型——文本扩散模型和视觉扩散模型。从本质上来说,两种语义鸿沟模型分别对应于基于视觉内容的检索方式和基于文本内容的检索方式。综合由两种模型得到的浅语义鸿沟词库能为语义概念选择合适的检索方式,并能应用于图像标注的优化。
     本文提出了采用仿射传播聚类算法解决大规模图像聚类问题。该聚类算法有四大优点:1)无需事先确定聚类的类别数。2)要求的输入是相似性矩阵。对于需要同时考虑视觉和文本两重相似性的图像聚类来说,利用相似性矩阵比利用高维数据点更合理有效。3)亦适用于两图像间相似性不对称的情况。4)能有效处理大规模数据集。
     大量的实验数据充分表明:在本文构建的浅语义鸿沟词库中,各个浅语义鸿沟词表相互独立相互补充,在大规模图像检索研究中的数据搜集、低层特征选择、有效检索方式选择以及图像标注等环节均起到了重要作用,为基于语义的图像检索技术的发展提供了新的思路。
Recently image retrieval has become one of the most important research topics in Multimedia Information Retrieval (MIR). A fundamental challenge in image retrieval is the semantic gap, which is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. In order to reduce the semantic gap, a promising paradigm of concept-based image retrieval focuses on modeling high-level semantic concepts, either by object recognition or image annotation. Among various approaches, the first step is to select a good lexicon that is relative easy for computers to understand, and then to collect training data to learn the concepts.
     Semantic gaps are actually not uniform for various semantic concepts. It is inappropriate to ignore the semantic gap difference. Choosing semantic concepts with small semantic gap is very meaningful because they can help train better high-level semantic concept models. There are two main problems: how to define“small semantic gap”, which means how to measure“semantic gap”? How to find these concepts with small semantic gap automatically? This paper quantitatively analyzed semantic gap and proposed a novel framework of constructing a lexica family with small semantic gap based on different low-level visual features and different semantic gap models to address these two problems.
     The whole procedure for lexica construction is: firstly, for 2.4 million large-scale web images database, we extracted textual feature from their rich surrounding textual information. Several types of low level visual feature are also extracted and well indexed. Secondly, based on different semantic gap models, we calculated every image's content-context confidence score which measuring the consistence between distributions of images in visual feature space and textual feature space. Thirdly, we clustered the images with highest content-context confidence score by using affinity propagation clustering algorithm. Finally, we extracted keywords from the textual information of clusters by measuring every keyword's related degree. The keywords with highest related score constitute the final lexica family of high-level concepts with small semantic gap.
     Semantic gaps are different in various visual feature spaces. Based on color, co-occurence texture, wavelet texture, a visual feature based lexica with small semantic gap is constructed. These lexica provide feature selection for image retrieval with concepts. Based on two different semantic gap models -- loose-textual semantic gap and loose-visual semantic gap, a semantic gap model based lexica with small semantic gap is constructed. These lexica make suggestion of choosing search model for concepts and help improve performance of image annotation.
     In this paper, we choose affinity propagation clustering algorithm to cluster a large-scale image database. There are four reasons: 1) it don't need set definite number of clusters. 2) Its input is similar matrix. It is better than high dimension data point here considering both visual similarity and textual similarity. 3) Similarities between two images are asymmetric. 4) This algorithm is very effective and efficient when processing large-scale database. The experimental results demonstrate the validity of the developed lexica family. The lexica are independent to each other and mutually complementary.
     They provide helpful suggestions about data collection, semantic concept model construction, low-level feature selection, search model construction and image annotation for large-scale image retrieval.
引文
[1] Flickner, M., Sawhney, H., Niblack, W. et al., Query by Image and Video Content: The QBIC system [J]. IEEE Computer, September, 1995: 23-32.
    [2] Smith, J., Chang, S.-F., VisualSEEk: a Fully Automated Content-Based Image Query System. In Proc. ACM Multimedia [A], Boston, MA, November 1996.
    [3] Pentland, A., Picard, R. W., and Sclaroff, S., Photobook: Tools for content-based manipulation of databases [A]. In Proc. Storage and Retrieval for Image and Video Databases II, SPIE, 1994, 185:34-47
    [4] Jain, R., Infoscopes: information systems for the next century [A]. In Proc. of Multimedia Information System and Hypermedia, Tokyo, Japan, March 1995.
    [5] Smeulders. A.W., Worring, M., Santini, S., Gupta, A., Jain, R., Content-Based Image Retrieval at the End of the Early Years [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12):1349-1380
    [6] Agarwal, S., Roth, D., Learning a sparse representation for object detection [A]. In Proc. of European Conference of Computer Vision (ECCV), Copenhagen, Denmark, 2002.
    [7] Li, F.-F, Fergus, R., Perona, P., Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories [A]. In Proc. of IEEE Conference of Computer Vision Pattern Recognition Workshop on Generative-Model Based Vision, 2004.
    [8] Griffin, G., Holub, A. D., and Perona, P., The Caltech-256 [A]. Caltech Technical Report, CIT, 2006.
    [9] http://www.pascal-network.org/challenges/VOC.
    [10] von Ahn, L., Dabbish, L., Labeling images with a computer game [A]. In Proc. of ACM Conference of Humor Factors Computing System. 2004.
    [11] Russell, B. C., Torralba, A., Murphy, K. P. and Freeman, W. T., LabelMe: a database and web-based tool for image annotation [A]. Technical report, MIT, AI Lab Memo 2005.
    [12] Naphade, C. M., Smith, J. R., Tesic, J., and Chang, S. F., et al. Large-scale concept ontology for multimedia. IEEE MultiMedia [J], 2006, 13(3):86–91.
    [13] Snoek, C. G., Worring, M., van Gemert, J. C., Geusebroek, J. M., and Smeulders, A. W., The challenge problem for automated detection of 101 semantic concepts in multimedia [A]. In Proc. of ACM Multimedia, 2006.
    [14] Hauptmann, A., Yan, R., Lin, W. H., Christel, M. and Wactlar, H., Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news [J]. IEEE Transactions on Multimedia, 2007, 9(5):958-966.
    [15] http://www-nlpir.nist.gov/projects/trecvid/.
    [16] Frey, B. J., and Dueck, D., Clustering by passing messages between data points [A]. Science, 2007, 315:972-976.
    [17] Wang, C. H., Jing, F., Zhang, L., and Zhang, H. J., Scalable search-based image annotation of personal images [A]. In Proc. of the 8th ACM international Workshop on Multimedia information Retrieval, Santa Barbara, CA, USA, 2006.
    [18] Jain, A. K., Murty, M. N., Flynn, P. J., Data Clustering: A Review [J]. ACM Computing Surveys, 1999, 31(3):264-323.
    [19] Bradley, P., Fayyad U., Refining Initial Points for K-means Clustering [A]. In Proc. of the 15th ICML, 1998, 91-99.
    [20] Dhillon, I., Guan, Y., Kogan, J., Refining Clusters in High Dimensional Data [A]. The 2nd SIAM ICDM, Workshop on Clustering High Dimensional Data, 2002.
    [21] Zhang, B., Generalized K-harmonic Means: Dynamic Weighting of Data in Unsupervised Learning [A]. In Proc. of the 1st SIAM ICDM, 2001.
    [22] Pelleg, D., Moore, A., X-means:Extending K-means with Efficient Estimation of the Number of the Clusters [A]. In Proc. of the 17th ICML, 2000.
    [23] Safaris, I., Zalzala, A. M. S., Trinder, P. W., A Genetic Rule-based Data Clustering Toolkit [C]. Honolulu: Congress on Evolutionary Computation(CEC), 2002.
    [24] Strehl, A., Ghosh, J., A Scalable Approach to Balanced, High dimensional Clustering of Market Baskets [A]. In Proc. of the 17th International Conference on High Performance Computing, Bangalore, Springer LNCS, 2000, 525-536.
    [25] Banerjee, A., Ghosh, J., On Scaling up Balanced Clustering Algorithms [A]. In Proc. of the 2nd SIAM ICDM, 2002.
    [26] Berkhin, P., Becher, J., Learning Simple Relations: Theory and Applications [A]. In Proc. of the 2nd SIAM ICDM, 2002, 333-349.
    [27] Kaufman, L., Rousseeuw, P., Finding Groups in Data: An Introduction to Cluster Analysis [M]. New York: John Wiley and Sons, 1990.
    [28] Ng, R., Han, J., Efficient and Effective Clustering Methods for Spatial Data Mining [A]. In Proc. of the 20th Conference on VLDB, 1994, 144-155.
    [29] Mitchell, T., Machine Learning [M]. New York: McGraw, Hill, 1997.
    [30] Zhang, T., Ramakrishnan, R. and Livny, M., BIRCH: An Efficient Data Clustering Method for Very Large Databases, ACM SIGMOD Record, 1996, 25(2):103-114.
    [31] Guha, S., Rastogl, R., Shim, K., CURE: An Efficient Clustering Algorithm for Large Databases [A]. In Proc. of the ACM SIGMOD, 1998, 73-84.
    [32] Ester, M., Kriegel, H. P., Sander, J. et a1. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [A]. In Proc. the 2nd ACM SIGKDD, 1996, 226-231.
    [33] Wang, W., Yang, J., Muntz, R., STING: A Statistical Information Grid Approach to Spatial Data Mining [A]. In Proc. of the 23rd Conference on VLDB, 1997, 186-195.
    [34] Kohonen, T., Self-Organizing Maps [M]. Springer: Series in Information Sciences, 2001.
    [35] Brown, D., Huntley, C., A Practical Application of Simulated Annealing to Clustering [Dissertation]. Virginia: University of Virginia, 1991.
    [36] Cristofor, D., Simovici, D. A., An Information-theoretical Approach to Clustering Categorical Databases Using Genetic Algorithms [C]. The 2nd SIAM ICDM, 2002.
    [37] Ganti, V., Gehrke, J., Ramakrishna, R., CACTUS-Clustering Categorical Data Using Summaries [A].In Proc. of the 5th ACM SIGKDD, 1999, 73-83.
    [38] Stricker, M. and Orengo, M., Similarity of color images [J]. SPIE Storage and Retrieval for Image and Video Databases III, 1995, 2185: 381-392.
    [39] Smith, J. R., and Chang, S.-F., Tools and techniques for color image retrieval [A]. In Proc. of SPIE: Storage and Retrieval for Image and Video Database, 1995, 2670.
    [40] Pass, G. and Zabih, R., Histogram refinement for content-based image retrieval [C]. IEEE Workshop on Applications of Computer Vision, 1996, 96-102.
    [41] Mao, J. and Jain, A. K., Texture classification and segmentation using multi-resolution simultaneous autoregressive models [J]. Pattern Recognition, 1992, 25(2): 173-188.
    [42] Haralick, R. M., Shanmugam K., Texture features for image classification [J]. IEEE Trans. on Systems, Man and Cybernetics, 1973, SMC-3(6): 610-621.
    [43] Gotlieb, C. C., Kreyszig, H. E., Texture descriptors based on co-occurrence matrices [J]. Computer Vision, Graphics and Image Processing, 1990, 51: 70-86.
    [44] Daugman, J. G., Complete discrete 2D Gabor transforms by neural network for image analysis and Compression [J]. IEEE Trans ASSP, July 1998, 36: 1169-1179.
    [45] Hauptmann, A., Yan, R., and Lin, W. H., How many high-level concepts will fill the semantic gap in video retrieval? [A] In Proc. of the international conference on image and video retrieval, 2007.
    [46] http://www.cs.washington.edu/research/imagedatabase/groundtruth/
    [47] Miller, G. A. WordNet: A lexical database for English. Communication of ACM, 38(11): 39–41, 1995.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700