基于深度学习的监控视频中多类别车辆检测

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于深度学习的监控视频中多类别车辆检测

详细信息查看全文 | 推荐本文 |

英文篇名：Multi-class vehicle detection in surveillance video based on deep learning
作者：徐子豪 ; 黄伟泉 ; 王胤
英文作者：XU Zihao;HUANG Weiquan;WANG Yin;Department of Computer Science and Technology, Tongji University;Key Laboratory of Embedded Systems and Service Computing (Tongji University);
关键词：深度学习 ; 车辆检测 ; 空洞卷积 ; 特征金字塔 ; 焦点损失
英文关键词：deep learning;;vehicle detection;;dilated convolution;;feature pyramid;;focal loss
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：同济大学计算机科学与技术系;嵌入式系统与服务计算教育部重点实验室(同济大学);
出版日期：2018-09-29 09:44
出版单位：计算机应用
年：2019
期：v.39;No.343
基金：上海科委基金资助项目(17511104502)~~
语种：中文;
页：JSJY201903015
页数：6
CN：03
ISSN：51-1307/TP
分类号：84-89

摘要

针对传统机器学习算法在交通监控视频的车辆检测中易受视频质量、拍摄角度、天气环境等客观因素影响,预处理过程繁琐、难以进行泛化、鲁棒性差等问题,结合空洞卷积、特征金字塔、焦点损失,提出改进的更快的区域卷积神经网络(Faster R-CNN)和单阶段多边框检测检测器(SSD)两种深度学习模型进行多类别车辆检测。首先从监控视频中截取的不同时间的851张标注图构建数据集;然后在保证训练策略相同的情况下,对两种改进后的模型与原模型进行训练;最后对每个模型的平均准确率进行评估。实验结果表明,与原Faster R-CNN和SSD模型相比,改进后的Faster R-CNN和SSD模型的平均准确率分别提高了0.8个百分点和1.7个百分点,两种深度学习方法较传统方法更适应复杂情况下的车辆检测任务,前者准确度较高、速度较慢,更适用于视频离线处理,后者准确度较低、速度较快,更适用于视频实时检测。
Since performance of traditional machine learning methods of detecting vehicles in traffic surveillance video is influenced by objective factors such as video quality, shooting angle and weather, which results in complex preprocessing, hard generalization and poor robustness, combined with dilated convolution, feature pyramid and focal loss, two deep learning models which are improved Faster R-CNN(Faster Regions with Convolutional Neural Network) and SSD(Single Shot multibox Detector) model were proposed for vehicle detection. Firstly, a dataset was composed of 851 labeled images captured from the surveillance video at different time. Secondly, improved and original models were trained under same training strategies. Finally, average accuracy of each model were calculated to evaluate. Experimental results show that compared with original Faster R-CNN and SSD, the average accuracies of the improved models improve 0.8 percentage points and 1.7 percentage points respectively. Both deep learning methods are more suitable for vehicle detection in complicated situation than traditional methods. The former has higher accuracy and slower speed, which is more suitable for video off-line processing, while the latter has lower accuracy and higher speed, which is more suitable for video real-time detection.

引文

[1]WANG F-Y.Agent-based control for networked traffic management systems[J].IEEE Intelligent Systems,2005,20(5):92-96.
    [2]ROSSETTI R J F,FERREIRA P A F,BRAGA R A M,et al.Towards an artificial traffic control system[C]//Proceedings of the2008 11th International IEEE Conference on Intelligent Transportation Systems.Piscataway,NJ:IEEE,2008:14-19.
    [3]赵娜,袁家斌,徐晗.智能交通系统综述[J].计算机科学,2014,41(11):7-11.(ZHAO N,YUAN J B,XU H.Survey on intelligent transport system[J].Computer Science,2014,41(11):7-11.)
    [4]刘小明,何忠贺.城市智能交通系统技术发展现状及趋势[J].自动化博览,2015(1):58-60.(LIU X M,HE Z H.Development and tendency of intelligent transportation systems in China[J].Automation Panorama,2015(1):58-60.)
    [5]MICHALOPOULOS P G.Vehicle detection video through image processing:the autoscope system[J].IEEE Transactions on Vehicular Technology,1991,40(1):21-29.
    [6]SUN Z,BEBIS G,MILLER R.On-road vehicle detection using Gabor filters and support vector machines[C]//Proceedings of the2002 14th International Conference on Digital Signal Processing.Piscataway,NJ:IEEE,2002:1019-1022.
    [7]TZOMAKAS C,von SEELEN W.Vehicle detection in traffic scenes using shadows[EB/OL].[2018-07-02].http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=EB25161C6B0FFE3581F4DF3532E6DE28?doi=10.1.1.45.3234&rep=rep1&type=pdf.
    [8]TSAI L-W,HSIEH J-W,FAN K-C.Vehicle detection using normalized color and edge map[J].IEEE Transactions on Image Processing,2007,16(3):850-864.
    [9]宋晓琳,邬紫阳,张伟伟.基于阴影和类Haar特征的动态车辆检测[J].电子测量与仪器学报,2015,29(9):1340-1347.(SONGX L,WU Z Y,ZHANG W W.Dynamic vehicle detection based on shadow and Haar-like feature[J].Journal of Electronic Measurement and Instrumentation,2015,29(9):1340-1347.)
    [10]Le CUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
    [11]REDMON J,DIVVALA S,GIRSHICK R,et al.You only look once:unified,real-time object detection[C]//Proceedings of the2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2016:779-788.
    [12]LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision.Berlin:Springer,2016:21-37.
    [13]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards realtime object detection with region proposal networks[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
    [14]YU F,KOLTUN V.Multi-scale context aggregation by dilated convolutions[EB/OL].(2016-04-30)[2018-07-29].https://arxiv.org/pdf/1511.07122v3.pdf.
    [15]LIN T-Y,DOLLR P,GIRSHICK R,et al.Feature pyramid networks for object detection[EB/OL].[2018-07-11].https://arxiv.org/pdf/1612.03144.pdf.
    [16]LIN T-Y,GOYALP,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision.Washington,DC:IEEE Computer Society,2017:2999-3007.
    [17]ZHAN C,DUAN X,XU S,et al.An improved moving object detection algorithm based on frame difference and edge detection[C]//Proceedings of the 4th International Conference on Image and Graphics.Washington,DC:IEEE Computer Society,2007:519-523.
    [18]HORN B K P,SCHUNCK B G.Determining optical flow[J].Artificial Intelligence,1981,17(1/2/3):185-203.
    [19]HAN X,ZHANG D Q,YU H H.System and method for video detection and tracking:U.S.Patent Application 13/720,653[P].2014-06-19.
    [20]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
    [21]PAPAGEORGIOU C P,OREN M,POGGIO T.A general framework for object detection[C]//Proceedings of the 6th International Conference on Computer Vision.Washington,DC:IEEE Computer Society,1998:555-562.
    [22]DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2005,1:886-893
    [23]OJALA T,PIETIKINEN M,HARWOOD D.A comparative study of texture measures with classification based on featured distribution[J].Pattern Recognition,1996,29(1):51-59.
    [24]NG P C,HENIKOFF S.SIFT:predicting amino acid changes that affect protein function[J].Nucleic Acids Research,2003,31(13):3812-3814.
    [25]SCHAPIRE R E,SINGER Y.Improved boosting algorithms using confidence-rated predictions[J].Machine Learning,1999,37(3):297-336.
    [26]CHEN P-H,LIN C-J,SCHLKOPF B.A tutorial on v-support vector machines[J].Applied Stochastic Models in Business and Industry,2005,21(2):111-136.
    [27]刘操,郑宏,黎曦,等.基于多通道融合HOG特征的全天候运动车辆检测方法[J].武汉大学学报(信息科学版),2015,40(8):1048-1053.(LIU C,ZHENG H,LI X,et al.A method of moving vehicle detection in all-weather based on melted multi-channel HOG feature[J].Geomatics and Information Science of Wuhan University,2015,40(8):1048-1053.)
    [28]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Image Net classification with deep convolutional neural networks[C]//Proceedings of the 2012 Advances in Neural Information Processing Systems.Piscataway,NJ:IEEE,2012:1097-1105.
    [29]SERMANET P,EIGEN D,ZHANG X,et al.Over Feat:integrated recognition,localization and detection using convolutional networks[EB/OL].(2014-02-24)[2018-07-28].https://arxiv.org/pdf/1312.6229v4.pdf.
    [30]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2014:580-587.
    [31]UIJLINGS J R R,van de SANDE K E A,GEVERS T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
    [32]GIRSHICK R.Fast R-CNN[C]//Proceedings of the 2015 IEEEInternational Conference on Computer Vision.Piscataway,NJ:IEEE,2015:1440-1448.
    [33]JEONG J,PARK H,KWAK N.Enhancement of SSD by concatenating feature maps for object detection[EB/OL].(2017-05-26)[2018-07-29].https://arxiv.org/pdf/1705.09587v1.pdf.
    [34]FU C-Y,LIU W,RANGA A,et al.DSSD:deconvolutional single shot detector[EB/OL].(2017-01-23)[2018-07-28].https://arxiv.org/pdf/1701.06659v1.pdf.
    [35]REDMON J,FARHADI A.YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2017:6517-6525.
    [36]REDMON J,FARHADI A.YOLOv3:an incremental improvement[EB/OL].(2018-04-08)[2018-07-30].https://arxiv.org/pdf/1804.02767v1.pdf.
    [37]HE K,GKIOXARI G,DOLLAR P,et al.Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision.Washington,DC:IEEE Computer Society,2017:2980-2988.
    [38]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-04-10)[2018-07-25].https://arxiv.org/pdf/1409.1556v6.pdf.
    [39]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2016:770-778.
    [40]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the2016 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway,NJ:IEEE,2016:2818-2826.
    [41]LIN T-Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//Proceedings of the 2014 European Conference on Computer Vision.Berlin:Springer,2014:740-755.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700