基于图像和事件的无监督学习稠密连续光流估计

胡建朗; 郭迟; 罗亚荣

doi:10.13203/j.whugis20230390

基于图像和事件的无监督学习稠密连续光流估计

胡建朗^1,2,
郭迟^1,2,3,
罗亚荣^1, ,

1 武汉大学卫星导航定位技术研究中心, 湖北武汉, 430079;
2 武汉大学人工智能研究院, 湖北武汉, 430079;
3 湖北珞珈实验室, 湖北武汉, 430079

基金项目:

湖北重大科技专项（2022AAA009）；中国博士后科学基金资助（2023TQ0248）；湖北珞珈实验室开放基金（230100007）。

详细信息

作者简介:
胡建朗，博士生，主要从事智能导航以及光流估计方面的研究。 hujianlang123@whu.edu.cn

通讯作者:
罗亚荣，博士后。 yarongluo@whu.edu.cn

计量
- 文章访问数: 167
- HTML全文浏览量: 8
- PDF下载量: 30
出版历程
- 收稿日期: 2024-05-21
- 网络出版日期: 2024-06-23

Unsupervised Dense and Continuous Optical Flow Estimation Based on Image and Event Data

HU Jianlang^1,2,
GUO Chi^1,2,3,
LUO Yarong^1, ,

1. GNSS Research Center, Wuhan University, Wuhan 430079, China;
2. Artificial Intelligence Institute, Wuhan University, Wuhan 430079, China;
3. Hubei Luojia Laboratory, Wuhan 430079, China

摘要

摘要: 为了获得稠密且连续的光流，并实现长时间间隔光流估计，本文提出了一种基于图像和事件的多模态多尺度递归光流估计网络，它能够融合从输入的单个图像和对应事件流中提取的多尺度特征，并以从粗糙到精细和迭代递归的方式对输出的光流进行优化。为了摆脱对光流标注数据的依赖，本文采用无监督学习的方式对网络进行训练，并设计了动态损失过滤机制，该机制能够自适应地过滤训练过程中的不可靠监督梯度信号，进而实现更加有效的网络训练。本文采用MVSEC数据集对本文提出的网络和策略进行综合对比分析，结果表明，本文方法具有更高的光流估计精度，尤其是在长时间间隔稠密光流估计方面，本文方法在三个室内序列上进行测试，并在平均端点误差和异常值百分比这两项指标上取得了最优结果，分别为1.43、 1.87、 1.68以及7.54%、 14.36%、 11.46%，均优于DCEIFlow方法，这说明本文方法不仅能够实现稠密且连续的光流估计，而且在长时间间隔光流估计方面也更加具有优势。
- 光流估计 /
- 事件相机 /
- 多模态 /
- 无监督学习
Abstract: Objectives: Dense and continuous optical flow plays an important role in many applications, including robot navigation, autonomous driving, motion planning, visual odometry, etc. Current related works mainly utilize shutter cameras and event cameras to output optical flow. However, dense and continuous flow estimation is still a challenge due to the fixed frame rate of shutter camera and the sparse event data. In addition, existing related approaches focus on the way how to integrate images and event data, but neglect to deal with the long-time-interval optical flow estimation. Methods: To this end, we propose a multi-scale recurrent optical flow estimation framework fusing events and images. The network architecture contains three components: multi-scale feature extractor, image-event feature fusion module and flow recurrent updater. The multi-scale feature extractor is a CNN-based downsampler capable of mapping input image and event data into features at different scales. The image-event feature fusion module is applied to fuse features from two different modalities of data. The flow recurrent updater is a recurrent residual flow optimizer, incorporated the pyramid methods, estimating flow in coarse-to-fine way as well as performing flow feature refinement. Furthermore, to avoid expensive flow annotations and perform effective network training, we train network in the unsupervised way and design a novel training strategy, namely dynamic loss filtering mechanism, to filter out redundant and unreliable supervisory signals. Results: We conduct a series of experiments on the MVSEC dataset. The results show the proposed method performs well in both indoor and outdoor sequences. In particular, for long-time-interval dense optical flow estimation, the proposed method which tested on three indoor sequences achieves optimal performance in mean endpoint error and the percentage of anomalies, which are 1.43, 1.87, and 1.68, as well as 7.54, 14.36, and 11.46%, respectively. Conclusions: The proposed method not only can perform dense and continuous optical flow estimation, but also has a remarkable advantage on long-time-interval optical flow estimation.
- optical flow estimation /
- event camera /
- multimodal /
- unsupervised learning