Extracting Spatio-temporal Features via Multi-layer Independent Subspace Analysis for Action Recognition
-
Graphical Abstract
-
Abstract
Human action recognition plays an important role in the field such as video supervision and medical diagnosis. Current methods are based on the expansion from two-dimension artificial design features to three-dimensions, ones or extracting spatio-temporal features via trajectories. Based on deep learning methods, this paper proposes a multilayer neural network in three-dimensional space, learning rich spatio-temporal features from large amount of videos. First, we use independent subspace analysis to build a two layer stacked convolutional neural network, obtaining weights from training database. Spatio-temporal features are then quantized into visual words with K-means clustering. Non-linear support vector machine(SVM) were used to classify frequency histograms of visual words into different action groups. We apply our algorithm to Hollywood2 database, extracting spatio-temporal features from 12 human action groups. Result shows that the feature weights trained by ISA network are similar with those by Gabor filter, which have obvious selectivity of frequency and direction, robustness to phase variation, conforming to the human visual system.
-
-