基于3D高斯溅射的大规模场景重定位方法

Large-Scale Scene Relocation via 3D Gaussian Splatting in Urban Enviroment

  • 摘要: 为了获取复杂环境下精确位姿信息,提出一种利用3D高斯溅射技术(3D Gaussian splatting,3DGS)的大规模重定位框架,使用单目相机实现厘米级定位。首先,利用多传感器数据构建城市复杂环境高精度3DGS地图,然后,利用基于深度学习的特征点提取与匹配方法和的视觉前端结合,实现了高效、鲁棒的特征点提取和匹配,最后通过求解3D到2D点对的方法,迭代渲染优化得到精确的位姿信息。在开源街景数据集KITTI上选取3种场景道路对提出的方法进行了测试,实验结果表明,在城镇道路、林荫大道和交通密集型公路上的平均定位精度分别为0.026 m、0.029 m和0.081 m,优于其他常规方法方法。

     

    Abstract: Objectives: In the field of autonomous driving, acquiring high-precision pose information is a critical factor in ensuring its safety and reliability. Although the global navigation satellite system (GNSS) can provide absolute pose references, its positioning accuracy significantly degrades in complex scenarios due to signal obstruction and multipath effects, making it difficult to meet the stringent requirements of autonomous driving systems. To address this challenge, we have conducted systematic exploration into precise pose acquisition in large-scale, complex, and dynamic environments. Methods: First, based on multisensor fusion technology, we developed a reconstruction system tailored for dynamic urban environments, focusing on three representative large-scale outdoor scenarios: urban roads, tree-lined avenues, and hightraffic highways. A high-precision 3D Gaussian splatter (3DGS) map was constructed for these scenarios. In the visual front-end, we integrated the SuperPoint feature extraction algorithm with the SuperGlue feature matching algorithm to achieve high-quality feature point detection and matching. Additionally, we introduced the perspective-n-point (PnP) algorithm, which estimates the initial camera pose by matching feature points with corresponding points in the 3D map.To optimize positioning results and improve the system's real-time performance, we proposed an iterative optimization strategy based on progressive rendering. This strategy utilizes the geometric structure and radiometric field characteristics of the 3DGS map, combining step-by-step rendering and optimization to continuously adjust the camera pose estimation, ultimately achieving progressive optimization of real-time high-precision positioning results. Results: The experimental validation further demonstrates the following results: (1) In dynamic urban environments, the reconstruction system exhibits exceptional performance, effectively rendering continuous and realistic lighting and texture effects in real time. Particularly in complex urban settings, the system handles issues such as lighting changes and occlusions caused by dynamic objects effectively. (2) The feature extraction and matching algorithms, SuperPoint and SuperGlue, significantly outperform traditional methods in terms of both the quantity and quality of feature points. Compared to traditional methods, the number of extracted feature points has been greatly increased, with a feature point distribution rate reaching 70%, more than twice that of conventional methods. (3) The system demonstrates excellent positioning accuracy in three typical outdoor scenarios. On urban roads, tree-lined avenues, and high-traffic highways, the system achieves average positioning accuracies of 0.026 m, 0.029 m, and 0.081 m, respectively, which are significantly better than other representative methods. Additionally, the system achieves a frequency of 96% for absolute displacement and rotation errors smaller than 10 cm and 1°, respectively, indicating high stability in positioning accuracy and the ability to provide reliable real-time positioning results in various environments. (4) Ablation experiments further validate the effectiveness of the system's design. The complete model achieves the best results in both typical scenarios, with average absolute translation and rotation errors of 3.22 cm and 0.69°, respectively. Compared to experiments where dynamic occlusion handling, SuperPoint feature extraction, and iterative optimization strategies were removed, the system's overall design demonstrates superior performance. (5) Finally, the system's real-time performance has been verified. Time analysis shows that the system processes each frame of images in approximately 1 s, which meets the real-time requirements and is suitable for real-time application scenarios. Conclusions: In large-scale outdoor environments, the visual relocalization technology based on the 3DGS map enables high-precision location information retrieval, effectively compensating for the limitations of existing positioning methods under GNSS signal occlusion or in complex environments. Through multi-sensor fusion and precise map construction, the system performs exceptionally well in dynamic urban scenarios. Particularly in complex urban environments, the issues of lighting changes and dynamic object occlusion are effectively handled, significantly enhancing the system's robustness and adaptability. The front-end feature extraction and matching methods greatly improve the system's stability and accuracy. In complex dynamic scenes, the distribution of feature points is more uniform, further enhancing positioning precision. Compared to various methods, the system demonstrates optimal performance. Ablation experiments further validate the necessity and effectiveness of these techniques. The system's processing speed meets real-time requirements, ensuring efficient and reliable real-time localization.

     

/

返回文章
返回