A Guided Method for Improving the Video Human Action Classification in Convolutional Neural Networks

MAO Lin; CHEN Siyu; YANG Dawei

doi:10.13203/j.whugis20190101

Volume 46 Issue 8

Aug. 2021

Turn off MathJax

Article Contents

Abstract

References

Geomatics and Information Science of Wuhan University > 2021 > 46(8): 1241-1246. > DOI: 10.13203/j.whugis20190101

MAO Lin, CHEN Siyu, YANG Dawei. A Guided Method for Improving the Video Human Action Classification in Convolutional Neural Networks[J]. Geomatics and Information Science of Wuhan University, 2021, 46(8): 1241-1246. DOI: 10.13203/j.whugis20190101

Citation:

PDF (1865 KB)

A Guided Method for Improving the Video Human Action Classification in Convolutional Neural Networks

1.
College of Mechanical and Electronic Engineering, Dalian Minzu University, Dalian 116600, China

Funds:

The Natural Science Foundation of Liaoning Province 20170540192

The Natural Science Foundation of Liaoning Province 20180550866

More Information

Author Bio:
MAO Lin, PhD, associate professor, specializes in the multi-sensor information fusion and target tracking.maolin@dlnu.edu.cn
Received Date: May 12, 2019
Published Date: August 04, 2021

Graphical Abstract

Abstract

Abstract

Objectives In order to improve the ability of convolutional neural networks (CNNs) of understanding temporal dynamic information, this paper proposes a dominant layer optimization module.
Methods The new module uses the dominant layer to guide and optimize the update gradient of convolutional layer weights, and assist the difference estimation with the maximum mean difference algorithm of a reproducing Hilbert space.
Results In continuous training, the network can improve the learning ability of temporal dynamic information, and the dynamic information similarity between the features learned by convolutional layer and the input data is also increased.
Conclusions This module enhances the performance of the CNNs model on video human action classification and achieves improvements to the network.

FullText(HTML)

References (21)

References

[1]	Tran D, Bourdev L, Fergus R, et al. Learning Spatiotemporal Features with 3D Convolutional Networks[C]// 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015
[2]	Tran D, Wang H, Torresani L, et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018
[3]	裴颂文, 杨保国, 顾春华. 融合的三维卷积神经网络的视频流分类研究[J]. 小型微型计算机系统, 2018, 39(10): 2 266-2 270 https://www.cnki.com.cn/Article/CJFDTOTAL-XXWX201810027.htm Pei Songwen, Yang Baoguo, Gu Chunhua. Research on Video Stream Classification Using 3D ConvNet Ensemble Fusion Model[J]. Journal of Chinese Computer Systems, 2018, 39(10): 2 266-2 270 https://www.cnki.com.cn/Article/CJFDTOTAL-XXWX201810027.htm
[4]	吴培良, 杨霄, 毛秉毅, 等. 一种视角无关的时空关联深度视频行为识别方法[J]. 电子与信息学报, 2019, 41(4): 904-910 https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201904020.htm Wu Peiliang, Yang Xiao, Mao Bingyi, et al. A Perspective-Independent Method for Behavior Recognition in Depth Video via Temporal-Spatial Correlating[J]. Journal of Electronics and Information Technology, 2019, 41(4): 904-910 https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201904020.htm
[5]	Simonyan K, Zisserman A. Two-Stream Convolutional Networks for Action Recognition in Videos[C]// Advances in Neural Information Processing Systems, Montreal, Canada, 2014
[6]	Sevilla-Lara L, Liao Y, Güney F, et al. On the Integration of Optical Flow and Action Recognition[C]// German Conference on Pattern Recognition, Springer, Cham, 2018
[7]	Huang D A, Ramanathan V, Mahajan D, et al. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets[C]// IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018
[8]	熊汉江, 郑先伟, 丁友丽, 等. 基于2D-3D语义传递的室内三维点云模型语义分割[J]. 武汉大学学报·信息科学版, 2018, 43(12): 2 303-2 309 doi: 10.13203/j.whugis20180190 Xiong Hanjiang, Zheng Xianwei, Ding Youli, et al. Semantic Segmentation of Indoor 3D Point Cloud Model Based on 2D-3D Semantic Transfer[J]. Geomatics and Information Science of Wuhan University, 2018, 43(12): 2 303-2 309 doi: 10.13203/j.whugis20180190
[9]	Luo Z, Hsieh J T, Jiang L, et al. Graph Distillation for Action Detection with Privileged Modalities[C]// European Conference on Computer Vision, Munich, Germany, 2018
[10]	Wang X, Girshick R, Gupta A, et al. Non-local Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018
[11]	Diba A, Sharma V, Van Gool L. Deep Temporal Linear Encoding Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017
[12]	He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks[C]//European Conference on Computer Vision, Amsterdam, Netherlands, 2016
[13]	Hara K, Kataoka H, Satoh Y. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and Imagenet?[C]// IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018
[14]	Khurram S, Amir Z, Mubarak S. UCF-101: A Dataset of 101 Human Action Classes from Videos in the Wild[EB/OL]. (2012-12-01)[2019-05‍-13]. https://www.crcv.ucf.edu/papers/UCF101_CRCV-TR-12-01.pdf
[15]	Kuehne H, Jhuang H, Garrote E, et al. HMDB: A Large Video Database for Human Motion Recognition[C]// International Conference on Computer Vision, Barcelona, Spain, 2011
[16]	李锐, 沈雨奇, 蒋捷, 等. 公共地图服务中访问热点区域的时空规律挖掘[J]. 武汉大学学报·信息科学版, 2018, 43(9): 1 408-1 415 doi: 10.13203/j.whugis20160424 Li Rui, Shen Yuqi, Jiang Jie, et al. Temporal and Spatial Characteristics of Hotspots in Public Map Service[J]. Geomatics and Information Science of Wuhan University, 2018, 43(9): 1 408-1 415 doi: 10.13203/j.whugis20160424
[17]	胡涛, 朱欣焰, 呙维, 等. 融合颜色和深度信息的运动目标提取方法[J]. 武汉大学学报·信息科学版, 2019, 44(2): 276-282 doi: 10.13203/j.whugis20160535 Hu Tao, Zhu Xinyan, Guo Wei, et al. A Moving Object Detection Method Combining Color and Depth Data[J]. Geomatics and Information Science of Wuhan University, 2019, 44(2): 276-282 doi: 10.13203/j.whugis20160535
[18]	Borgwardt K M, Gretton A, Rasch M J, et al. Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy[J]. Bioinformatics, 2006, 22(14): e49-e57 doi: 10.1093/bioinformatics/btl242
[19]	Long M, Cao Y, Wang J, et al. Learning Transferable Features with Deep Adaptation Networks[C]// The 32nd International Conference on Machine Learning, Lille, France, 2015
[20]	Long M, Zhu H, Wang J, et al. Deep Transfer Learning with Joint Adaptation Networks[C]// The 34th International Conference on Machine Learning, Sydney, Australia, 2017
[21]	Xie S, Girshick R, Dollár P, et al. Aggregated Residual Transformations for Deep Neural Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017

[1]	LI Qin, YOU Xiong, LI Ke, TANG Fen, WANG Weiqi. Image Matching Based on Local Object Matching[J]. Geomatics and Information Science of Wuhan University, 2022, 47(3): 419-427. DOI: 10.13203/j.whugis20190364
[2]	LU Xiao, ZHU Yiwei, YANG Muhua, ZHOU Xuanyu, WANG Yaonan. Reinforcement Learning Based End-to-End Autonomous Driving Decision-Making Method by Combining Image and Monocular Depth Features[J]. Geomatics and Information Science of Wuhan University, 2021, 46(12): 1862-1871. DOI: 10.13203/j.whugis20210409
[3]	ZOU Jing, CHEN Yonggang, GONG Jinqi, DONG Wanhu, SUN Yanfei, WANG Zhilin. An Efficient Matching Algorithm Based on Vector Graphics Using Multi-dimensional Object Segmentation Ratio[J]. Geomatics and Information Science of Wuhan University, 2020, 45(10): 1626-1632. DOI: 10.13203/j.whugis20190009
[4]	XU Yaming, SHI Juan, AN Dongdong, MA Xudong. Change Detection Based on Segmentation and Matched Features Points for UAV Images[J]. Geomatics and Information Science of Wuhan University, 2016, 41(10): 1286-1291. DOI: 10.13203/j.whugis20140873
[5]	ZHANG Chunsen, FAN Jinjian. Image Line Feature Relationship Matching with Object Structural Information[J]. Geomatics and Information Science of Wuhan University, 2012, 37(9): 1059-1063.
[6]	ZHAO Binbin, DENG Min, XU Zhen, LIU Huimin. Development of General Rules for Matching Multi-scale Area Objects[J]. Geomatics and Information Science of Wuhan University, 2011, 36(8): 991-994.
[7]	XIA Linyuan, XIAO Jun, LIN Liqun. Segment-based Stereo Matching Using Edge Dynamic Programming[J]. Geomatics and Information Science of Wuhan University, 2011, 36(7): 767-770.
[8]	ZHENG Shunyi, ZHANG Zuxun, ZHAI Ruifang. 3D Reconstruction of Complex Objects Based on Non-metric Image[J]. Geomatics and Information Science of Wuhan University, 2008, 33(5): 446-449.
[9]	LI Jiansong. Evolutions and Key Techniques for 3D Object Surface Vision-Measurement in Industry[J]. Geomatics and Information Science of Wuhan University, 2001, 26(4): 337-342.
[10]	Guo Renzhong. Spatial Object Classification and Spatial Object Construction[J]. Geomatics and Information Science of Wuhan University, 1994, 19(1): 22-28.

Cited By

Get Citation

PDF

XML

Article views (967) PDF downloads (64)

A Guided Method for Improving the Video Human Action Classification in Convolutional Neural Networks

Abstract

References

Related Articles

Catalog

Related

A Guided Method for Improving the Video Human Action Classification in Convolutional Neural Networks

Abstract

References

Related Articles

Catalog

Related

Export File

Citation

Format

Content