《计算机应用研究》|Application Research of Computers

基于卷积神经网络的目标检测研究综述

Review of object detection based on convolutional neural networks

免费全文下载 (已被下载 次)  
获取PDF全文
作者 李旭冬,叶茂,李涛
机构 1.电子科技大学 机器人研究中心,成都 611731;2.河南广播电视大学 信息工程系,郑州 450008
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2017)10-2881-06
DOI 10.3969/j.issn.1001-3695.2017.10.001
摘要 随着训练数据的增加以及机器性能的提高,基于卷积神经网络的目标检测冲破了传统目标检测的瓶颈,成为当前目标检测的主流算法。因此,研究如何有效地利用卷积神经网络进行目标检测具有重要价值。首先回顾了卷积神经网络如何解决传统目标检测中存在的问题;介绍了卷积神经网络的基本结构,描述了当前卷积神经网络的研究进展及常用的卷积神经网络;重点分析和讨论了两种应用卷积神经网络进行目标检测的思路和方法,指出了目前存在的不足。最后总结了基于卷积神经网络的目标检测以及未来的发展方向。
关键词 卷积神经网络;目标检测;深度学习
基金项目 国家自然科学基金资助项目(61375038)
本文URL http://www.arocmag.com/article/01-2017-10-001.html
英文标题 Review of object detection based on convolutional neural networks
作者英文名 Li Xudong, Ye Mao, Li Tao
机构英文名 1.RoboticsResearchCenter,UniversityofElectronicScience&TechnologyofChina,Chengdu611731,China;2.Dept.ofInformationEngineering,HenanRadio&TelevisionUniversity,Zhengzhou450008,China
英文摘要 With the increase of training data and the improvement of the performance of computers, the CNN-based object detection breaks the bottleneck of traditional object detection and has been the main method of current object detection. Therefore, it is a significant research that how to effectively utilize CNN for object detection. Firstly, this paper reviewed how to solve the problems of traditional object detection by CNN. Secondly, it introduced the basic architecture of CNN and described the current research development and widely-used CNN. Thirdly, this paper mainly analyzed and discussed two kinds of ideas and methods of the CNN-based object detection and pointed out the present deficiency. Finally, it concluded the CNN-based object detection and the future direction.
英文关键词 convolutional neural network(CNN); object detection; deep learning
参考文献 查看稿件参考文献
  [1] Szeliski R. Computer vision:algorithms and applications[M] . New York:Springer, 2010.
[2] Lowe D. Distinctive image features from scale-invariant keypoints[J] . International Journal of Computer Vision, 2004, 60(2):91-110.
[3] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2001:511-518.
[4] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2005:886-893.
[5] Zheng Wei, Liang Luhong. Fast car detection using image strip features[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2009:2703-2710.
[6] Ferreira A J, Figueiredo M A T. Boosting algorithms:a review of methods, theory, and applications[M] //Ensemble Machine Lear-ning. New York:Springer, 2012:35-85.
[7] Vapnik V N. The nature of statistical learning theory[M] . New York:Springer-Verlag, 2000.
[8] Felzenszwalb P F, Girshick R B, McAllester D, et al. Object detection with discriminatively trained part-based models[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
[9] Breiman L. Random forests[J] . Machine Learning, 2001, 45(1):5-32.
[10] Lecun Y, Bengio Y, Hinton G. Deep learning[J] . Nature, 2015, 521(7553):436-444.
[11] Hinton G, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J] . Science, 2006, 313(5786):504-507.
[12] Bengio Y. Learning deep architectures for AI[J] . Foundations and Trends in Machine Learning, 2009, 2(1):1-127.
[13] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J] . Proceedings of the IEEE, 1998, 86(11):2278-2324.
[14] Sermanet P, Chintala S, LeCun Y. Convolutional neural networks applied to house numbers digit classification[C] //Proc of International Conference on Pattern Recognition. 2012:3288-3291.
[15] Lin Min, Chen Qiang, Yan Shuicheng. Network in network[J] . arXiv preprint ar-Xiv:1312. 4400, 2013.
[16] Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9.
[17] Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines Vinod Nair[C] //Proc of the 27th International Conference on Machine Learning. 2010:807-814.
[18] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C] //Proc of IEEE International Conference on Computer Vision. 2015:1026-1034.
[19] Simoncelli E P, Heeger D J. A model of neuronal responses in visual area MT[J] . Vision Research, 1998, 38(5):743-761.
[20] Zeiler M D, Fergus R. Stochastic pooling for regularization of deep convolutional neural networks[J] . arXiv:1301. 3557, 2013.
[21] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J] . arXiv preprint arXiv:1207. 0580, 2012.
[22] Wan Li, Zeiler M D, Zhang Sixin, et al. Regularization of neural networks using dropconnect[C] //Proc of the 30th International Conference on Machine Learning. 2013:1058-1066.
[23] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C] //Proc of the 25th Internatio-nal Conference on Neural Information Processing Systems. 2012:1097-1105.
[24] Deng Jia, Dong Wei, Socher R, et al. ImageNet:a large-scale hie-rarchical image database[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2009:248-255.
[25] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C] //Proc of International Conference on Learning Representations. 2015:1-14.
[26] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[J] . arXiv preprint arXiv:1512. 03385, 2015.
[27] Vaillant R, Monrocq C, LeCun Y. Original approach for the localisation of objects in images[J] . Vision, Image and Signal Proces-sing, 1994, 141(4):245 - 250.
[28] Garcia C, Delakis M. Convolutional face finder:a neural architecture for fast and robust face detection[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(11):1408-23.
[29] Sermanet P, Kavukcuoglu K, Chintala S, et al. Pedestrian detection with unsupervised multi-stage feature learning[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2013:3626-3633.
[30] Ouyang Wanli, Wang Xiaogang. Joint deep learning for pedestrian detection[C] //Proc of IEEE International Conference on Computer Vision. 2013:2056-2063.
[31] Tian Yonglong, Luo Ping, Wang Xiaogang, et al. Pedestrian detection aided by deep learning semantic tasks[C] //Proc of IEEE Confe-rence on Computer Vision and Pattern Recognition. 2015:5079-5087.
[32] Chen Yinnong, Han C C, Wang C T, et al. A CNN-based face detector with a simple feature map and a coarse-to-fine classifier[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, PP(99):1-13.
[33] Li Haoxiang, Lin Zhe, Shen Xiaohui, et al. A convolutional neural network cascade for face detection[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:5325-5334.
[34] Zeng Xingyu, Ouyang Wanli, Wang Meng, et al. Deep learning of scene-specific classifier for pedestrian detection[C] //Proc of European Conference on Computer Vision. 2014:472-487.
[35] Oquab M, Bottou L, Laptev I, et al. Learning and transferring mid-level image representations using convolutional neural networks[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2014:1717-1724.
[36] Li Xudong, Ye Mao, Fu Min, et al. Domain adaption of vehicle detector based on convolutional neural networks[J] . International Journal of Control Automation and Systems, 2015, 13(4):1020-1031.
[37] Uijlings J R, Sande K E, Gevers T, et al. Selective search for object recognition[J] . International Journal of Computer Vision, 2013, 104(2):154-171.
[38] Alexe B, Deselaers T, Ferrari V. Measuring the objectness of image windows[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(11):2189-202.
[39] Carreira J, Sminchisescu C. CPMC:automatic object segmentation using constrained parametric min-cuts[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2012, 34(7):1312-28.
[40] Endres I, Hoiem D. Category independent object proposals[C] //Proc of European Conference on Computer Vision. 2010:575-588.
[41] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
[42] He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[43] Girshick R. Fast R-CNN[C] //Proc of IEEE International Conference on Computer Vision. 2015:1440-1448.
[44] Zhang Yuting, Sohn K, Villegas R, et al. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:249-258.
[45] Ouyang Wanli, Wang Xiaogang, Zeng Xingyu, et al. DeepID-Net:deformable deep convolutional neural networks for object detection[C] //Proc of IEEE Conference on Computer Vision and Pattern Re-cognition. 2015:2403-2412.
[46] Ren Shaoqing, He Kaiming, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C] //Proc of Advances in Neural Information Processing Systems. 2015:91-99.
[47] Szegedy C, Toshev A, Erhan D. Deep neural networks for object detection[C] //Advances in Neural Information Processing Systems. 2013:2553-2561.
[48] Erhan D, Szegedy C, Toshev A, et al. Scalable object detection using deep neural networks[C] //Proc of IEEE Conference on Compu-ter Vision and Pattern Recognition. 2014:2155-2162.
[49] Szegedy C, Reed S, Erhan D, et al. Scalable, high-quality object detection[J] . arXiv preprint arXiv:1412. 1441, 2014.
[50] Girshick R, Iandola F, Darrell T, et al. Deformable part models are convolutional neural networks[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:437-446.
[51] Wan Li, Eigen D, Fergus R. End-to-end integration of a convolutio-nal network, deformable parts model and non-maximum suppression[C] //Proc of IEEE Conference on Computer Vision and Pattern Re-cognition. 2015:851-859.
[52] Oquab M, Bottou L, Laptev I, et al. Is object localization for free?:Weakly-supervised learning with convolutional neural networks[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:685-694.
[53] Sermanet P, Eigen D, Zhang Xiang, et al. OverFeat:integrated re-cognition, localization and detection using convolutional networks[J] . arXiv preprint arXiv:1312. 6229, 2013.
[54] Li Xudong, Ye Mao, Liu Dan, et al. Memory-based object detection in surveillance scenes[C] //Proc of IEEE International Conference on Multimedia and Expo. 2016:1-6.
[55] Redmon J, Divvala S, Girshick R, et al. You only look once:unified, real-time object detection[C] //Proc of IEEE Conference on Computer Vision and Pattern Recognition. 2015:779-788.
[56] Liu Wei, Anguelov D, Erhan D, et al. SSD:single shot multi-box detector[C] //Proc of European Conference on Computer Vision. 2016:21-37.
收稿日期 2016/10/24
修回日期 2016/12/20
页码 2881-2886,2891
中图分类号 TP183
文献标志码 A