DFIF-LinkNet模型驱动的城市大规模路网信息自动生成

尹圆圆; 杨雪

doi:10.13203/j.whugis20250317

DFIF-LinkNet模型驱动的城市大规模路网信息自动生成

尹圆圆,
杨雪

Automatic Generation of Urban Large-Scale Road Networks Driven by the DFIF-LinkNet Model

摘要

摘要: 现有城市大规模路网自动生成方法主要面临两类问题：一是依赖单一数据源导致道路信息提取不完整；二是多源数据融合策略未能充分利用多模态数据间的互补信息。因此，提出一种双编码器双解码器结构的道路信息获取模型（Dual-modal Feature Interaction and Fusion LinkNet， DFIF-LinkNet），实现基于遥感影像与车辆轨迹融合的端到端道路网络生成。该模型在前端通过独立的影像自编码器和轨迹自编码器分别提取两种模态数据的特征信息；在中端设计多模态特征增强模块（Multimodal Feature Enhancement Moudle， MFEM）以连接两种自编码器，通过跨模态信息交互实现影像特征和轨迹特征的相互增强；在后端采用基于门控权重的融合机制集成两种自编码器输出的道路预测结果，生成最终路网地图。将所提模型应用于北京数据集和罗马数据集，结果表明，该模型能够有效挖掘遥感影像与轨迹数据的互补信息，显著提升道路网络生成质量，其在两数据集上的道路提取F1分数分别为78.36%和81.60%。

Abstract: Objectives: Existing methods for large-scale urban road network generation face significant limitations in both data utilization and feature representation, which can be broadly categorized into two main challenges. First, many approaches rely solely on a single data source—such as high-resolution remote sensing imagery or trajectory data—resulting in limited road information extraction. This limitation is especially pronounced in scenarios involving shadow occlusions, dense vegetation, or heavily built-up areas, where road discontinuities and omissions frequently occur. Second, although multi-source data fusion methods partially address the drawbacks of single-source approaches, they often fail to fully exploit the complementary relationships among heterogeneous modalities. Existing fusion strategies tend to be coarse-grained, yielding only limited enhancement of feature representations. Methods: To overcome these challenges, a dual-encoder-dual-decoder architecture termed Dual-modal Feature Interaction and Fusion LinkNet (DFIF-LinkNet) is proposed to achieve end-to-end road network generation by integrating high-resolution remote sensing imagery with trajectory data. The architecture is specifically designed to leverage the spatial and semantic complementarity of visual and trajectory information. At the front end, independent image and trajectory auto encoders extract modality-specific representations. The intermediate stage incorporates a Multimodal Feature Enhancement Module (MFEM), which establishes bidirectional feature interactions between the two encoders. Through cross-modal attention mechanisms, MFEM facilitates mutual reinforcement and enhancement of visual and trajectory features. At the back end, a gated fusion mechanism adaptively weights the road predictions from both encoders, producing a structurally coherent and spatially continuous final road network map. Results: Empirical evaluation on the Beijing and Rome datasets demonstrates that the proposed framework effectively harnesses complementary information from remote sensing and trajectory data, significantly improving both the accuracy and completeness of generated road networks. Specifically, the model achieves F1 scores of 78.36% and 81.60% on the Beijing and Rome datasets, respectively. Conclusions: Compared with existing road extraction approaches, DFIF-LinkNet not only reduces false positives in non-road regions but also substantially mitigates omission errors, particularly in areas obstructed by trees or complex urban features. The resulting road networks exhibit high consistency with ground-truth maps, offering robust and generalizable outputs that provide reliable support for urban traffic planning and high-precision map updating.

HTML全文

参考文献(0)

施引文献

资源附件(0)