基于自适应区域监督机制的场景分类算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于自适应区域监督机制的场景分类算法

详细信息查看全文 | 推荐本文 |

英文篇名：Scene classification algorithm based on adaptively regional supervision
作者：陈志鸿 ; 胡海峰 ; 马水平 ; 于遨波
英文作者：CHEN Zhihong;HU Haifeng;MA Shuiping;YU Aobo;School of Electronics and Information Technology,Sun Yat-sen University;
关键词：深度卷积神经网络 ; 热点图 ; 自适应监督 ; 场景分类
英文关键词：deep convolution neural network;;heat map;;adaptive supervision;;scene classification
中文刊名：中山大学学报(自然科学版)
英文刊名：Acta Scientiarum Naturalium Universitatis Sunyatseni
机构：中山大学电子与信息工程学院;
出版日期：2019-03-15
出版单位：中山大学学报(自然科学版)
年：2019
期：02
基金：国家自然科学基金(61673402,61273270,60802069);; 广东省自然科学基金(2017A030311029,2016B010123005,2017B090909005);; 广州市科技计划项目(201704020180,201604020024)
语种：中文;
页：15-20
页数：6
CN：44-1241/N
ISSN：0529-6579
分类号：TP391.41;TP183

摘要

深度卷积神经网络(deep convolution neural network,DCNN)是当前流行的场景分类算法。随着神经网络深度的加深以及宽度的加宽,网络训练难度也随之增大。随机裁剪(crop sampling)可降低网络训练的难度,但这种做法也会降低输入网络的图像与目标标签的相关性。为此,提出了一种基于自适应区域监督机制的场景分类算法,算法由三部分组成:热点图生成层、自适应监督裁剪层、分类层。该算法通过热点图生成层生成每张图像的热点图,自适应监督裁剪层依据热点图自适应地对图像进行裁剪,最后使用分类层对裁剪后的场景图像进行分类,以提升裁剪图像与目标标签的相关性。通过在15-Scene和MIT Indoor两个经典场景分类数据集上的实验,结果发现,所提的算法在训练效率以及识别性能上优于原始的网络算法架构,算法对于复杂的场景有更好的识别率和鲁棒性。
Deep Convolution Neural Network(DCNN) is a popular scene classification method. However, with the neural network deeper and wider, the difficulty of training network also increases. Some scholars have proposed crop images randomly to reduce the difficulty of network training, which will reduce the relevance of the cropped image to the label. To solve this, we propose a scene classification algorithm based on adaptively regional supervision, which is constructed by three parts: heat map generation layer, adaptively supervised cropping layer, and classification layer. The algorithm generates heat map for each image, adaptively crops the image based on the heat map, and finally classifies the cropped images, which improves the relevance of the cropped images to labels. Experiments on the 15-Scene and MIT Indoor datasets show our algorithm outperforms the original network architecture in training efficiency and recognition performance, which shows the accuracy and robustness of our algorithm.

引文

[1] SUYKENS J A K,VANDEWALLE J.Least squares support vector machine classifiers[J].Neural Processing letters,1999,9(3):293-300.
    [2] MERCIER G,LENNON M.Support vector machines for hyperspectral image classification with spectral-based kernels[C]//Geoscience and Remote Sensing Symposium.IEEE,2003,1:288-290.
    [3] 张雨浓,邓健豪,金龙,等.唯一性逻辑及其BP神经网络侦测[J].中山大学学报(自然科学版),2013 (3):1-5.ZHANG Y,DENG J,JIN L.Uniqueness logic and BP neural network detection [J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2013 (3):1-5.
    [4] GAO S,TSANG I W H,CHIA L T.Kernel sparse representation for image classification and face recognition[C]//European Conference on Computer Vision.Berlin,Heidelberg:Springer,2010:1-14.
    [5] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]// CVPR,2015.
    [6] ZHOU B,KHOSLA A,LAPEDRIZA A,et al.Learning deep features for discriminative localization[C]//Computer Vision and Pattern Recognition (CVPR).IEEE,2016:2921-2929.
    [7] LIN M,CHEN Q,YAN S.Network in network[J/OL].arXiv preprint arXiv:1312.4400,2013.[2019.2.20].https://arxiv.org/abs/1312.4400.
    [8] LAZEBNIK S,SCHMID C,PONCE J.Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories[C]//Computer Vision and Pattern Recognition.IEEE,2006,2:2169-2178.
    [9] HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
    [10] QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on.IEEE,2009:413-420.
    [11] TANG P,ZHANG J,WANG X,et al.Learning extremely shared middle-level image representation for scene classification[J].Knowledge and Information Systems,2017,52(2):509-530.
    [12] GONG Y,WANG L,GUO R,et al.Multi-scale orderless pooling of deep convolutional activation features[C]//European Conference on Computer Vision.Cham:Springer,2014:392-407.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700