一种融合视觉状态空间模型与小波变换的车载图像轻量级道路要素提取网络

郑康; 任福

doi:10.13203/j.whugis20250175

一种融合视觉状态空间模型与小波变换的车载图像轻量级道路要素提取网络

郑康,
任福

A Lightweight Road Markings Network for Vehicle-Borne Images Combining Visual State Space Model and Wavelet Transform

ZHENG Kang,
REN Fu

摘要

摘要: 高效、准确地提取道路要素是车辆定位和环境感知的核心任务，也是构建高精度地图的基础。车载图像往往体量庞大且处理成本高，使得道路要素提取方法难以在精度与效率之间取得平衡。为此，提出了轻量级道路要素提取网络（lightweight road markings network，LRMSNet），该网络采用编码器-解码器框架，将卷积神经网络、注意力机制、小波变换和视觉状态空间模型有机融合。首先，轻量级瓶颈模块可捕捉多尺度语义信息并降低参数量；然后，提出高效视觉状态空间金字塔，结合多尺度特征融合与线性复杂度计算，增强全局上下文建模能力；最后，引入特征优化模块优化跳跃连接，提升细节保留能力。实验表明，LRMSNet在公开的CamVid数据集与自行采集的武汉道路要素数据集上平均交并比分别达到68.89%与63.11%，参数量仅1.37 MB，计算量7.98 GFLOPs，较主流模型参数量有所降低，且性能显著领先。本研究为道路要素的快速精准提取提供了高效解决方案，对高精地图、智慧交通具有重要应用价值。

Abstract: Objectives： Efficient and accurate road marking extraction is fundamental for vehicle localization, environmental perception, and high-definition map construction. However, vehicle- borne images are usually large in volume and costly to process, making it difficult for existing methods to achieve a favorable balance between segmentation accuracy and computational efficiency. To address this challenge, this paper proposes a lightweight road markings network (LRMSNet) to accurately extract road markings from vehicle-borne images with low parameter complexity and computational cost. The proposed method aims to provide fast and reliable road marking perception for high-definition mapping and intelligent transportation applications. Methods： LRMSNet adopts an encoder-decoder architecture and integrates convolutional neural networks, attention mechanisms, wavelet transform, and visual state space modeling into a unified framework. First, a lightweight bottleneck module is designed to capture multi-scale semantic information while reducing the number of model parameters. Second, an efficient visual state space pyramid is proposed to enhance global context modeling through multi-scale feature aggregation and linear-complexity computation. Third, a feature optimization module is introduced to refine skip connections between the encoder and decoder, thereby improving the preservation of spatial details and boundary information. These designs allow LRMSNet to simultaneously strengthen local detail representation and long-range dependency modeling under limited computational overhead. Results： Experiments are conducted on the public CamVid dataset and a self-collected Wuhan road marking dataset. LRMSNet achieves mean intersection over union scores of 68.89% and 63.11% on the two datasets, respectively. The model contains only 1.37 million parameters and requires 7.98 GFLOPs, demonstrating a lightweight computational profile. Compared with mainstream semantic segmentation models, LRMSNet substantially reduces the number of parameters while achieving superior segmentation performance, especially for fine-grained road markings. Conclusions： LRMSNet provides an efficient and accurate solution for road marking extraction from vehicle-borne images. By combining lightweight multi-scale representation, efficient global context modeling, and optimized feature fusion, the proposed network achieves a strong balance between segmentation accuracy and computational efficiency. This study offers practical technical support for rapid road marking extraction and has significant application potential in high-definition map generation, autonomous driving, and intelligent transportation systems.

HTML全文

参考文献(0)

施引文献

资源附件(0)