Abstract
Objectives: Buildings constitute a critical component that is adversely impacted by disasters. The intelligent detection of damaged building information, coupled with the alignment of two-dimensional to threedimensional scenes, is pivotal in emergency rescue efforts. Introduces an innovative method for extracting information about damaged buildings from multiple perspectives, leveraging geometric prior constraints across various fields of view. Methods: Data is collected through methods such as long-distance high-point monitoring cameras, drones, and online gathering. Additionally, a large amount of data is generated through simulated image generation to construct a multi-view scene (damage) building instance segmentation dataset. This dataset includes data from various perspectives, including ground-level views, high-point ground views, and low-altitude high-point views. Following this, a cross-view domain building feature synchronization model that combines phase-aware feature enhancement and fine-grained feature alignment is designed. This model performs micro-feature synchronization on buildings under different perspectives, consisting of two modules: a multi-branch phase-aware feature enhancement module and a deep-shallow feature synchronization module that aggregates perception and spatial-channel attention. The aim is to enhance deep features rich in semantic information such as building shape and category, as well as deeper features, and to focus on shallow features rich in spatial information such as building edges, textures, and lines, in the regions of interest. This addresses the spatial and semantic differences brought by features from different perspectives. Subsequently, based on the Transformer attention network, an instance segmentation model that takes into account Canny edge detection and entropy disorder is proposed, further enhancing the feature expression ability of damaged buildings and achieving precise detection of damaged structures. The network consists of three modules: a pixel-level feature extraction module, a Transformer attention extraction module, and a detection module. The detection results include both the pixel-level segmentation categories of buildings and the bounding box target detection information for each building. This provides rich data support for subsequent 2D to 3D geographic scene matching and mapping. To culminate, a geometric constraint-based position matching method using camera vertical/horizontal field of view segmentation is designed. This method projects target detection information onto a three-dimensional geographic information scene. It is tailored for cameras rich in angular information, including ground cameras, high-point gimbal cameras, and drone cameras. By leveraging the camera's actual geographic location and the horizontal and tilt angles of the observed targets (buildings), it infers and calculates the real geographic locations of detection targets. It then performs scene matching and mapping based on a three-dimensional model using ray cutting techniques. Results: The empirical findings reveal that the proposed method outperforms current approaches in detecting buildings and damaged structures from multiple perspectives. The proposed system achieves an overall bounding box mean Average Precision (bbox_mAP) and segmentation mean Average Precision (seg_mAP) of 50.33% and 46.69%, respectively. At an Intersection over Union (IoU) threshold of 50%, the respective bbox_mAP50 and seg_mAP50 scores are 83.10% and 81.91%, showcasing a significant enhancement in detection accuracy. Case studies were conducted in Yingxiu Town, Wenchuan County, Sichuan Province, and the Sichuan Disaster Prevention and Reduction Technology Experiment Service Base. High-point long-distance monitoring cameras were used to capture image data, and instance segmentation of damaged/normal buildings was performed to obtain detection results. The detected buildings were then matched to a three-dimensional geographic scene using the geometric constraint-based geographic matching method proposed in this paper, which utilizes vertical/horizontal field of view segmentation. This method enables the precise matching and mapping of two-dimensional detection results onto a three-dimensional geographic scene model. Conclusions: The method in this paper can not only effectively improve the detection effect of damaged buildings under the high-point multi-view image, but also accurately geometrically match with the 3D geographic scene to provide technical support for the emergency rescue command at the disaster site.