《计算机应用研究》|Application Research of Computers

基于改进的深度卷积神经网络的人体动作识别方法

Action recognition based on improved deep convolutional neural network

免费全文下载 (已被下载 次)  
获取PDF全文
作者 陈胜娣,魏维,何冰倩,陈思宇,刘基缘
机构 成都信息工程大学 计算机学院,成都 610225
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)03-061-0945-05
DOI 10.19734/j.issn.1001-3695.2017.10.1018
摘要 针对现有的动作识别算法的特征提取复杂、识别率低等问题,提出了基于批归一化变换(batch normalization)与GoogLeNet网络模型相结合的网络结构,将图像分类领域的批归一化思想应用到动作识别领域中进行训练算法改进,实现了对视频动作训练样本的网络输入进行微批量(mini-batch)归一化处理。该方法以RGB图像作为空间网络的输入,光流场作为时间网络输入,然后融合时空网络得到最终动作识别结果。在UCF101和HMDB51数据集上进行实验,分别取得了93.50%和68.32%的准确率。实验结果表明,改进的网络架构在视频人体动作识别问题上具有较高的识别准确率。
关键词 动作识别;批归一化;深度学习;卷积神经网络
基金项目 四川省教育厅重点科研项目(2017Z026)
本文URL http://www.arocmag.com/article/01-2019-03-061.html
英文标题 Action recognition based on improved deep convolutional neural network
作者英文名 Chen Shengdi, Wei Wei, He Bingqian, Chen Siyu, Liu Jiyuan
机构英文名 CollegeofComputerScience&Technology,ChengduUniversityofInformationTechnology,Chengdu610225,China
英文摘要 Aiming at the problem of complex feature extraction and low accuracy in human action recognition, this paper proposed a network structure combining batch normalization algorithm with GoogLeNet network model.Applying batch normalization idea in the field of image classification to action recognition field, it improved the algorithm by normalizing the network input training sample by mini-batch.For convolutional network, RGB image was the spatial input, and stacked optical flows was the temporal input.Then, it fused the spatio-temporal networks to get the final action recognition result.It trained and evaluated the architecture on the standard video actions benchmarks of UCF101 and HMDB51, which achieved the accuracy of 93.50% and 68.32%.The results show that the improved convolutional neural network has a significant improvement in improving the recognition rate and has obvious advantages in action recognition.
英文关键词 action recognition; batch normalization; deep learning; convolutional neural network
参考文献 查看稿件参考文献
  [1] Jhuang H, Serre T, Wolf L, et al. A biologically inspired system for action recognition[C] //Proc of IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE Press, 2007:1-8.
[2] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C] //Proc of Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE Press, 2014:1725-1732.
[3] 凌佩佩, 邱崧, 蔡茗名, 等. 结合特权信息的人体动作识别[J] . 中国图象图形学报, 2017, 22(4):482-491. (Ling Peipei, Qiu Song, Cai Mingming, et al. Human action recognition based on privileged information[J] . Journal of Image and Graphics, 2017, 22(4):482-491. )
[4] Wang Heng, Schmid C. Action recognition with improved trajectories[C] //Proc of IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE Press, 2013:3551-3558.
[5] 申晓霞, 张桦, 高赞, 等. 基于深度信息和RGB图像的行为识别算法[J] . 模式识别与人工智能, 2013, 26(8):722-728. (Shen Xiaoxia, Zhang Hua, Gao Zan, et al. Behavior recognition algorithm based on depth information and RGB image[J] . Pattern Recognition and Artificial Intelligence, 2013, 26(8):722-728. )
[6] 张杰, 吴剑章, 汤嘉立, 等. 基于时空图像分割和交互区域检测的人体动作识别方法[J] . 计算机应用研究, 2017, 34(1):302-305, 320. (Zhang Jie, Wu Jianzhang, Tang Jiali, et al. Human action recognition method based on spatio-temporal image segmentation and interactive area detection[J] . Application Research of Computers, 2017, 34(1):302-305, 320. )
[7] Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images[C] //Proc of Computer Vision and Pattern Recognition Conference. Piscataway, NJ:IEEE Press, 2011:1297-1304.
[8] Ofli F, Chaudhry R, Kurillo G, et al. Sequence of the most informative j oints(SMIJ):a new representation for human skeletal action recognition[J] . Journal of Visual Communication and Image Representation, 2014, 25(1):24-38.
[9] Chen Chen, Jafari R, Kehtarnavaz N. Action recognition from depth sequences using depth motion maps-based local binary patterns[C] //Proc of Applications of Computer Vision. Piscataway, NJ:IEEE Press, 2015:1092-1099.
[10] 赵晓健, 曾晓勤. 基于稠密光流轨迹和稀疏编码算法的行为识别方法[J] . 计算机应用, 2016, 36(1):181-187. (Zhao Xiaojian, Zeng Xiaoqin. Action recognition method based on dense optical flow trajectory and sparse coding algorithm[J] . Journal of Computer Applications, 2016, 36(1):181-187. )
[11] 李亚玮, 金立左, 孙长银, 等. 基于光流约束自编码器的动作识别[J] . 东南大学学报:自然科学版, 2017, 47(4):691-696. (Li Yawei, Jin Lizuo, Sun Changyin, et al. Action recognition based on optical flow constrained auto-encoder[J] . Journal of Southeast University:Natural Science, 2017, 47(4):691-696. )
[12] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[J] . Advances in Neural Information Processing Systems, 2014, 1(4):568-576.
[13] He Kaiming, Zhang Xiangyu, Ren Shaoqin, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J] . IEEE Trans on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[14] Wang Keze, Wang Xiaolong, Lin Liang, et al. 3D human activity recognition with reconfigurable convolutional neural networks[C] //Proc of ACM International Conference on Multimedia. New York:ACM Press, 2015:97-106.
[15] Wang Limin, Xiong Yuanjun, Wang Zhe, et al. Towards good practices for very deep two-stream ConvNets[EB/OL] . 2015-07-08. https://arxiv. org/abs/1507. 02159.
[16] 王忠民, 曹洪江, 范琳. 一种基于卷积神经网络深度学习的人体行为识别方法[J] . 计算机科学, 2016, 43(11A):56-58. (Wang Zhongmin, Cao Hongjiang, Fan Lin. Method on human activity recognition based on convolutional neural networks[J] . Computer Science, 2016, 43(11A):56-58. )
[17] 韩敏捷. 基于深度学习框架的多模态动作识别[J] . 计算机与现代化, 2017(7):48-52. (Han Minjie. Multi-modal action recognition based on deep learning framework[J] . Computer and Modernization, 2017(7):48-52. )
[18] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL] . 2014-09-04. [2015-04-10] . https://arxiv. org/abs/1409. 1556.
[19] Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C] //Proc of Computer Vision and Pattern Recognition Conference. Piscataway, NJ:IEEE Press, 2015.
[20] LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J] . Neural Computation, 1989, 1(4):541-551.
[21] Shimodaira H. Improving predictive inference under covariate shift by weighting the log-likelihood function[J] . Journal of Statistical Planning and Inference, 2000, 90(2):227-244.
[22] Wiesler S, Richard A, Schluter R, et al. Mean-normalized stochastic gradient for large-scale deep learning[C] //Proc of IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE Press, 2014:180-184.
[23] Raiko T, Valpola H, LeCun Y. Deep learning made easier by linear transformations in perceptrons[C] //Proc of the 15th International Conference on Artificial Intelligence and Statistics. 2012:924-932.
[24] Povey D, Zhang Xiaohui, Khudanpur S. Parallel training of deep neural networks with natural gradient and parameter averaging[EB/OL] . (2014-10-29). https://hgpu. org/?p=12992.
[25] Desjardins G, Simonyan K, Pascanu R, et al. Natural neural networks[J] . Computer Science, 2015, 22(8):847-856. [26] Ioffe S, Szegedy C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C] //Proc of the 32nd International Conference on Machine Learning. 2015:448-456.
[27] LéCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J] . Proceedings of the IEEE, 1998, 86(11):2278-2324.
[28] Deng Jia, Dong Wei, Socher R, et al. ImageNet:a large-scale hierarchical image database[C] //Proc of Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE Press, 2009:248-255.
[29] Pérez J S. TV-L1 optical flow estimation[J] . Image Processing on Line, 2013, 2(4):137-150.
[30] Soomro K, Zamir A R, Shah M. UCF101:a dataset of 101 human actions classes from videos in the wild[EB/OL] . 2012-12-04. https://arxiv. org/abs/1212. 0402.
[31] Kuehne H, Jhuang H, Stiefelhagen R, et al. HMDB51:a large video database for human motion recognition[C] //Proc of IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE Press, 2012:2556-2563.
[32] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J] . Computer Science, 2012, 3(4):212-223.
[33] Wang Heng, Schmid C. LEAR-INRIA submission for the thumos workshop[C] //Proc of ICCV Workshop on THUMOS Challenge. Piscataway, NJ:IEEE Press, 2013.
[34] Peng Xiaojiang, Wang Limin, Wang Xingxing, et al. Bag of visual words and fusion methods for action recognition:comprehensive study and good practice[J] . Computer Vision and Image Understanding, 2016, 150(C):109-125.
[35] Zhu Wangjiang, Hu Jie, Sun Gang, et al. A key volume mining deep framework for action recognition[C] //Proc of Computer Vision and Pattern Recognition Conference. Piscataway, NJ:IEEE Press, 2016:1991-1999.
收稿日期 2017/10/23
修回日期 2017/12/11
页码 945-949,953
中图分类号 TP391.4
文献标志码 A