融合多尺度注意力的多视角遥感影像场景分类

Multi-view Remote Sensing Image Scene Classification by Fusing Multi-scale Attention

  • 摘要: 针对现有场景分类方法特征表征能力差以及单视角遥感影像分类精度难以提升的问题,提出一种融合多尺度注意力的多视角遥感影像场景分类方法。首先,将航空图像和地面图像构造成正负图像对,并划分为训练集、验证集和测试集;其次,构建融合多尺度注意力的卷积神经网络并训练,通过特征融合模块得到融合注意力且表征能力更强的特征,实现多尺度特征学习;然后,利用训练的多尺度注意力网络分别提取航空图像和地面图像特征并进行融合;最后,基于融合后的特征使用支持向量机进行场景分类。实验结果表明,相比现有方法,所提方法在两个公开数据集上均取得了更高的分类精度,改善了单视角场景分类效果,同时也证明了多视角所提供的补充信息能进一步提升遥感场景分类的准确性。

     

    Abstract:
    Objectives Remote sensing scene classification provides new possibilities for the application of high-resolution images, and how to effectively realize scene recognition from high-resolution remote sensing images is still an important challenge. The existing scene classification methods only use remote sensing images from one viewpoint for scene recognition, which cannot accurately express the semantic information of complex high-resolution remote sensing images, and the accuracy of scene classification is difficult to be further improved.
    Methods To solve this problem, a multi-view scene classification method for remote sensing images is proposed. First, the aerial image and ground image are constructed into a positive and negative image pair, and divided into training dataset, validation dataset and test dataset. Second, a convolutional neural network with fusion multi-scale attention is constructed, and features with fusion attention and strong representation ability are obtained through feature fusion module, so as to integrate different feature information and realize multi-scale feature learning. Third, the trained multi-scale attention network is used to extract features from aerial image and ground image,respectively. Finally, the fused features are used to classify scenes based on the fused features using support vector machine. To demonstrate the performance of the proposed multi-scale attention network, we conduct experiments on two publicly available benchmark datasets - the AiRound and the CV-BrCT datasets.
    Results The proposed method achieves remarkable performance, with the highest accuracy of 93.13% in the AiRound dataset and 85.18% in the CV-BrCT dataset, which improves the accuracy of single-view scene classification.
    Conclusions The results demonstrate that the complementary information provided by multi⁃view images can further improve the performance of remote sensing scene classification.

     

/

返回文章
返回