结合空洞卷积的FuseNet变体网络高分辨率遥感影像语义分割

Semantic Segmentation of High-Resolution Remote Sensing Images Based on Improved FuseNet Combined with Atrous Convolution

  • 摘要: 针对多模态、多尺度的高分辨率遥感影像分割问题,提出了结合空洞卷积的FuseNet变体网络架构对常见的土地覆盖对象类别进行语义分割。首先,采用FuseNet变体网络将数字地表模型(digital surface model,DSM)图像中包含的高程信息与红绿蓝(red green blue,RGB)图像的颜色信息融合;其次,在编码器和解码器中分别使用空洞卷积来增大卷积核感受野;最后,对遥感影像逐像素分类,输出遥感影像语义分割结果。实验结果表明,所提算法在国际摄影测量与遥感学会(International Society for Photogrammetry and Remote Sensing, ISPRS)提供的Potsdam、Vaihingen数据集上的mF1得分分别达到了91.6%和90.4%,优于已有的主流算法。

     

    Abstract:
      Objectives  With the development and popularization of deep learning theory, deep neural networks are widely used in image analysis and interpretation. The high-resolution remote sensing images have the characteristics of a large amount of information, complex data, and rich feature information, and most of the current semantic segmentation neural networks of the natural image are not completely designed for the characteristics of high-resolution remote sensing images, so it cannot effectively extract the detailed features of the ground objects in remote sensing images, and the segmentation accuracy needs to be improved.
      Methods  We propose the process of improved FuseNet with the atrous convolution-convolutional neural network(IFA-CNN). Firstly, we use the improved FuseNet to fuse the elevation information of DSM(digital surface model) images with the color information of RGB(red green blue) images. At the same time, we propose a multimodal data fusion scheme to solve the problem of poor fusion of the RGB branch and DSM branch. Secondly, multiscale features are captured through flexibly adjusting the receptive field by the atrous convolution. Through deconvolution and upsampling, a decoder that increases the feature maps is formed. Finally, the Softmax classifier is used to procure the semantic segmentation results.
      Results  Compared with relevant algorithms, IFA-CNN effectively improves the edge burr and thinning boundaries in segmented images, and is more accurate for segmentation of larger objects such as buildings and trees, it also reduces the miss segmentation condition with effect, the segmentation of the shadow covered areas is close to being perfect.The mF1 score achieved when our model is applied to the open ISPRS(International Society for Photogrammetry and Remote Sensing) Potsdam and Vaihingen dataset are 91.6% and 90.4% respectively, exceeding by a considerable margin of relevant algorithms.
      Conclusions  (1) The virtual fusion(V-Fusion) unit used for segmentation by the multimodal data fusion strategy is more accurate than the one used by the FuseNet network.(2) The encoder-decoder structure is arranged in such a way that the effective improvement of the segmentation accuracy of small target features is guaranteed. So, the loss of detailed information can be decreased. (3) While the multimodal data fusion is being carried out by IFA-CNN, the atrous convolution expands the receptive field accordingly to extract the multiscale information.

     

/

返回文章
返回