基于几何先验约束的高点多视角损毁建筑物检测方法

Damaged Building Detection Method in Multi-view Scenes Based on Geometric Prior Constraints

  • 摘要: 建筑物是重要的受灾体之一,损毁建筑物检测与空间映射是应急救援的关键。高点多视角遥感装备为损毁建筑物信息提取提供了一种有效手段,但面临着多视角特征表达不一、建筑信息不全等问题,提出了一种几何先验约束的多视角场景下的损毁建筑物检测方法。首先,基于长距离高点监测摄像机、无人机以及网络搜集等方式构建多视角场景损毁建筑物实例分割数据集;其次,设计了一种结合阶段感知特征增强和细粒度特征对齐的跨视角域建筑特征同步模型,对不同视角下的建筑物进行微观特征同步;然后,提出了一种基于Canny边缘检测和熵混乱程度的实例分割模型,对损毁建筑物进行精准检测;最后,设计了一种基于垂直/水平视场切割的几何约束的空间映射方法,将目标检测信息映射到地理空间场景中。实验结果表明,与现有方法相比,所提方法对于不同视角下的建筑及损毁建筑检测效果更好,边界框平均精度均值和像素平均精度均值分别达到50.33%和46.69%;在预测边界框与真实边界框的交并比阈值为50%时,所提方法在目标检测和实例分割任务中的平均精度均值分别达到83.10%和81.91%。在空间映射方面,所提方法能够较为精准地将图像检测的损毁建筑映射到真实地理位置,为应急救援实时指挥提供技术支持。

     

    Abstract:
    Objectives Buildings constitute a critical component that is adversely impacted by disasters. The intelligent detection of damaged building information, coupled with the alignment of 2D to 3D scenes, is pivotal in emergency rescue efforts.
    Methods This paper proposes an innovative method for detecting damaged buildings from multiple perspectives, leveraging geometric prior constraints across various fields of view. First, a large amount of data collected from long-distance high-point monitoring cameras, drones and online gathering are generated by the simulated image generation to construct a multi-view scene data-set for damaged building instance segmentation. This dataset includes data from various perspectives, such as ground-level view, high-point ground view, and low-altitude high-point view. Then, a cross-view domain building feature synchronization model that combines phase-aware feature enhancement and fine-grained feature alignment is designed. This model performs micro-feature synchronization on buildings under different perspectives, and it consists of a multi-branch phase-aware feature enhancement module and a deep-shallow feature synchronization module that aggregates perception and spatial-channel attention. The aim is to enhance deep features rich in semantic information such as building shape and category, as well as deeper features, and to focus on shallow features rich in spatial information such as building edges, textures, and lines. It addresses the spatial and semantic differences brought by features from different perspectives. Subsequently, based on the transformer attention network, an instance segmentation model that takes into account Canny edge detection and entropy disorder is proposed, further enhancing the feature expression of damaged buildings and achieving precise detection of damaged structures. The network consists of three modules: a pixel-level feature extraction module, a transformer attention extraction module, and a detection module. The detection results include both the pixel-level segmentation categories of buildings and the bounding box target detection information for each building. It provides rich data support for subsequent 2D to 3D geographic scene matching and mapping. Finally, a geometric constraint-based position matching method using vertical and horizontal field of view segmentation is designed. This method projects target detection information onto a 3D geographic information scene. It is tailored for cameras rich in angular information, including ground cameras, high-point gimbal cameras, and drone cameras. By leveraging the actual geographic location of cameras and the horizontal and tilt angles of the observed targets, it infers and calculates the real geographic locations of detection targets, and then performs scene matching and mapping based on a 3D model using ray cutting techniques.
    Results The empirical findings reveal that the proposed method outperforms current approaches in detecting buildings and damaged structures from multiple perspectives. The mean average precision of bounding boxes (bbox_mAP) and mean average precision of segmentation (seg_mAP) achieve 50.33% and 46.69%, respectively. At intersection over union (IoU) threshold of 50%, bbox_mAP50 and seg_mAP50 are 83.10% and 81.91%, respectively, showing a signi-ficant enhancement in detection accuracy. In the case studies, high-point long-distance monitoring cameras are used to capture image data, and instance segmentation of damaged and normal buildings is performed to obtain detection results. The detected buildings are then matched to a 3D geographic scene using the proposed geographic matching method, which utilizes vertical and horizontal field of view segmentation and enables the precise matching and mapping of 2D detection results to 3D geographic scene.
    Conclusions The proposed method can not only effectively improve the detection effect of damaged buildings in the high-point multi-view images, but also accurately match with the 3D geographic scene to provide technical support for the emergency rescue command at the disaster site.

     

/

返回文章
返回