Automatic Generation of Urban Large-Scale Road Networks Driven by the DFIF-LinkNet Model
-
Abstract
Objectives: Existing methods for large-scale urban road network generation face significant limitations in both data utilization and feature representation, which can be broadly categorized into two main challenges. First, many approaches rely solely on a single data source—such as high-resolution remote sensing imagery or trajectory data—resulting in limited road information extraction. This limitation is especially pronounced in scenarios involving shadow occlusions, dense vegetation, or heavily built-up areas, where road discontinuities and omissions frequently occur. Second, although multi-source data fusion methods partially address the drawbacks of single-source approaches, they often fail to fully exploit the complementary relationships among heterogeneous modalities. Existing fusion strategies tend to be coarse-grained, yielding only limited enhancement of feature representations. Methods: To overcome these challenges, a dual-encoder-dual-decoder architecture termed Dual-modal Feature Interaction and Fusion LinkNet (DFIF-LinkNet) is proposed to achieve end-to-end road network generation by integrating high-resolution remote sensing imagery with trajectory data. The architecture is specifically designed to leverage the spatial and semantic complementarity of visual and trajectory information. At the front end, independent image and trajectory auto encoders extract modality-specific representations. The intermediate stage incorporates a Multimodal Feature Enhancement Module (MFEM), which establishes bidirectional feature interactions between the two encoders. Through cross-modal attention mechanisms, MFEM facilitates mutual reinforcement and enhancement of visual and trajectory features. At the back end, a gated fusion mechanism adaptively weights the road predictions from both encoders, producing a structurally coherent and spatially continuous final road network map. Results: Empirical evaluation on the Beijing and Rome datasets demonstrates that the proposed framework effectively harnesses complementary information from remote sensing and trajectory data, significantly improving both the accuracy and completeness of generated road networks. Specifically, the model achieves F1 scores of 78.36% and 81.60% on the Beijing and Rome datasets, respectively. Conclusions: Compared with existing road extraction approaches, DFIF-LinkNet not only reduces false positives in non-road regions but also substantially mitigates omission errors, particularly in areas obstructed by trees or complex urban features. The resulting road networks exhibit high consistency with ground-truth maps, offering robust and generalizable outputs that provide reliable support for urban traffic planning and high-precision map updating.
-
-