Abstract:
Objectives RGB(red green blue) and motion features are very important for ship video abnormal event detection. We need to extract these features in video more accurately and apply them to the detection of abnormal events in ship video. Meanwhile, due to the huge cost of frame-level annotations, we also need to solve the problem of not providing frame-level annotations in the model training stage, but using video-level annotations for model training. In addition, we also need to solve the problem of the scarcity of ship video abnormal event database.
Methods We acquire a large number of surveillance videos of ships on the sea surface and construct a data set of abnormal events of ship video after processing. Also, we propose a ship abnormal event detection model based on deep spatiotemporal autoencoder network and multi-instance learning, using a deep multi-instance ranking framework, without obtaining frame-level annotations, only video-level information is needed. In addition, in the spatial autoencoder, in order to obtain more accurate RGB features, the deconvolution layer in the decoder is replaced by a pixel shuffle layer. In the temporal autoencoder, in order to highlight the regions with large motion variation, a variance-based attention mechanism is introduced, so that fast-moving objects have a larger motion loss.
Results We compare the proposed method with the two benchmark methods and a previous state-of-the-art method. The experimental results show that the proposed method has higher detection accuracy. In addition, we observe that variance-based attention can significantly improve the detection effect of fast motion, such as unexpected stopping and overspeed.
Conclusions This shows that RGB and motion features play an important role in ship abnormal event detection and also proves the necessity of multi-instance ranking model.