基于图像和事件的无监督学习稠密连续光流估计

胡建朗; 郭迟; 罗亚荣

doi:10.13203/j.whugis20230390

基于图像和事件的无监督学习稠密连续光流估计

Unsupervised Dense and Continuous Optical Flow Estimation Based on Image and Event Data

摘要

摘要: 为了获得稠密且连续的光流，并实现长时间间隔光流估计，本文提出了一种基于图像和事件的多模态多尺度递归光流估计网络，它能够融合从输入的单个图像和对应事件流中提取的多尺度特征，并以从粗糙到精细和迭代递归的方式对输出的光流进行优化。为了摆脱对光流标注数据的依赖，本文采用无监督学习的方式对网络进行训练，并设计了动态损失过滤机制，该机制能够自适应地过滤训练过程中的不可靠监督梯度信号，进而实现更加有效的网络训练。本文采用MVSEC数据集对本文提出的网络和策略进行综合对比分析，结果表明，本文方法具有更高的光流估计精度，尤其是在长时间间隔稠密光流估计方面，本文方法在三个室内序列上进行测试，并在平均端点误差和异常值百分比这两项指标上取得了最优结果，分别为1.43、 1.87、 1.68以及7.54%、 14.36%、 11.46%，均优于DCEIFlow方法，这说明本文方法不仅能够实现稠密且连续的光流估计，而且在长时间间隔光流估计方面也更加具有优势。

Abstract: Objectives: Dense and continuous optical flow plays an important role in many applications, including robot navigation, autonomous driving, motion planning, visual odometry, etc. Current related works mainly utilize shutter cameras and event cameras to output optical flow. However, dense and continuous flow estimation is still a challenge due to the fixed frame rate of shutter camera and the sparse event data. In addition, existing related approaches focus on the way how to integrate images and event data, but neglect to deal with the long-time-interval optical flow estimation. Methods: To this end, we propose a multi-scale recurrent optical flow estimation framework fusing events and images. The network architecture contains three components: multi-scale feature extractor, image-event feature fusion module and flow recurrent updater. The multi-scale feature extractor is a CNN-based downsampler capable of mapping input image and event data into features at different scales. The image-event feature fusion module is applied to fuse features from two different modalities of data. The flow recurrent updater is a recurrent residual flow optimizer, incorporated the pyramid methods, estimating flow in coarse-to-fine way as well as performing flow feature refinement. Furthermore, to avoid expensive flow annotations and perform effective network training, we train network in the unsupervised way and design a novel training strategy, namely dynamic loss filtering mechanism, to filter out redundant and unreliable supervisory signals. Results: We conduct a series of experiments on the MVSEC dataset. The results show the proposed method performs well in both indoor and outdoor sequences. In particular, for long-time-interval dense optical flow estimation, the proposed method which tested on three indoor sequences achieves optimal performance in mean endpoint error and the percentage of anomalies, which are 1.43, 1.87, and 1.68, as well as 7.54, 14.36, and 11.46%, respectively. Conclusions: The proposed method not only can perform dense and continuous optical flow estimation, but also has a remarkable advantage on long-time-interval optical flow estimation.

HTML全文

参考文献(0)

施引文献

资源附件(0)