Abstract:
Objectives: Depth estimation is a fundamental task in computer vision and 3D reconstruction, with extensive applications in autonomous driving, robotics, and virtual reality. Traditional stereo matching algorithms often fail in weakly textured or reflective regions, leading to inaccurate disparity estimation. Active stereo matching, which projects structured light such as speckle patterns onto the scene, can alleviate these issues by enriching image textures. However, existing active stereo methods that rely on guided cost volumes still suffer from two main limitations: sparse and unevenly distributed guidance signals, and redundant information in the constructed cost volumes. To address these challenges, a novel active stereo matching framework named SGDM (Speckle-Guided Diffusion Matching) is proposed.
Method: The proposed SGDM framework integrates three key modules to enhance the utilization of sparse guidance and suppress redundant cost-volume information. (1) A lightweight sparse disparity filling module expands the initially sparse disparity map derived from stereo pairs illuminated by speckle structured light, increasing the number and spatial uniformity of effective disparity points. (2) A variable-weight Gaussian guidance module adaptively adjusts the modulation strength according to the confidence of disparity points, thereby improving the reliability and precision of the guided cost volume. (3) A diffusion filter is employed to iteratively refine the cost volume through a diffusion-based denoising process, effectively suppressing redundant information and noise to yield a cleaner and more distinctive disparity estimation. These components can be seamlessly integrated into existing stereo matching networks without modifying their original architectures.
Results: Comprehensive experiments were conducted on both synthetic and real-world datasets, including Scene Flow, KITTI, SimStereo, ETH3D, and Middlebury. The results show that the proposed SGDM framework substantially enhances the accuracy and robustness of stereo matching networks. Specifically, when the proposed SGDM is integrated into ACVNet, the average end-point error (EPE) on the Scene Flow dataset decreases by approximately 16%, and the proportion of pixels with disparity errors greater than one pixel on SimStereo is reduced from 16.60% to 4.64%. Furthermore, incorporating SGDM into RAFT leads to an average error reduction of 26% on ETH3D and Middlebury datasets, demonstrating remarkable cross-domain generalization. Visual comparisons further confirm that SGDM effectively improves disparity estimation in weakly textured and occluded regions, yielding sharper and more detailed disparity maps.
Conclusion: The proposed SGDM framework effectively addresses two critical limitations in active stereo matching: sparse and uneven guidance signals, and redundant information within the cost volume. By integrating speckle-guided semi-dense disparity priors, a variable-weight Gaussian guidance mechanism, and a diffusion-based denoising filter, SGDM achieves highly accurate, noise-resilient, and generalizable depth estimation. Experimental evaluations across multiple datasets verify that SGDM consistently enhances both matching precision and cross-domain robustness when embedded into mainstream stereo matching networks. In addition to its algorithmic contributions, SGDM offers a scalable and flexible design that can be readily incorporated into various 3D perception systems. Overall, the proposed framework provides a unified and effective solution for advancing active stereo matching, paving the way for broader applications in autonomous driving, robotic perception, and precision 3D reconstruction.