Abstract:
Objectives Object tracking is a research focus in the field of computer vision. The method based on correlation filters performs well in object tracking, but artificial feature description of images has certain limitations in the process of feature extraction. Convolutional neural network (CNN) has been widely used in computer vision, natural language processing and other fields, and they can tune the weights of network parameters by learning training samples to extract depth features of images. In order to obtain more robust feature expression of images, CNN is used to extract the features of images in object tracking.
Methods Combining CNN with correlation filters, we propose an object tracking method based on multi-stage features and asymmetric-dilated convolution. The ResNet50 network embedded with asymmetric-dilated convolution block is used as the network of feature extraction and it can respectively output the feature maps from multiple stages of the network for correlation filters to achieve object detection and localization.
Results The proposed method is tested on OTB100 video dataset. The distance precision can reach 85.38% if the distance threshold is set as 20 pixels, and the overlap precision can reach 80.42% if the overlap threshold is set as 50%.
Conclusions The experimental results verify the accuracy of the proposed method which is relatively robust under certain conditions such as complexity background, occlusion and rotational deformation.