SHI Yongxin, ZHOU Weixun, SHAO Zhenfeng. Multi-view Remote Sensing Image Scene Classification by Fusing Multi-scale Attention[J]. Geomatics and Information Science of Wuhan University, 2024, 49(3): 366-375. DOI: 10.13203/j.whugis20220737
Citation: SHI Yongxin, ZHOU Weixun, SHAO Zhenfeng. Multi-view Remote Sensing Image Scene Classification by Fusing Multi-scale Attention[J]. Geomatics and Information Science of Wuhan University, 2024, 49(3): 366-375. DOI: 10.13203/j.whugis20220737

Multi-view Remote Sensing Image Scene Classification by Fusing Multi-scale Attention

  • Objectives Remote sensing scene classification provides new possibilities for the application of high-resolution images, and how to effectively realize scene recognition from high-resolution remote sensing images is still an important challenge. The existing scene classification methods only use remote sensing images from one viewpoint for scene recognition, which cannot accurately express the semantic information of complex high-resolution remote sensing images, and the accuracy of scene classification is difficult to be further improved.
    Methods To solve this problem, a multi-view scene classification method for remote sensing images is proposed. First, the aerial image and ground image are constructed into a positive and negative image pair, and divided into training dataset, validation dataset and test dataset. Second, a convolutional neural network with fusion multi-scale attention is constructed, and features with fusion attention and strong representation ability are obtained through feature fusion module, so as to integrate different feature information and realize multi-scale feature learning. Third, the trained multi-scale attention network is used to extract features from aerial image and ground image,respectively. Finally, the fused features are used to classify scenes based on the fused features using support vector machine. To demonstrate the performance of the proposed multi-scale attention network, we conduct experiments on two publicly available benchmark datasets - the AiRound and the CV-BrCT datasets.
    Results The proposed method achieves remarkable performance, with the highest accuracy of 93.13% in the AiRound dataset and 85.18% in the CV-BrCT dataset, which improves the accuracy of single-view scene classification.
    Conclusions The results demonstrate that the complementary information provided by multi⁃view images can further improve the performance of remote sensing scene classification.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return