Abstract:
Objectives With the development and popularization of deep learning theory, deep neural networks are widely used in image analysis and interpretation. The high-resolution remote sensing images have the characteristics of a large amount of information, complex data, and rich feature information, and most of the current semantic segmentation neural networks of the natural image are not completely designed for the characteristics of high-resolution remote sensing images, so it cannot effectively extract the detailed features of the ground objects in remote sensing images, and the segmentation accuracy needs to be improved.
Methods We propose the process of improved FuseNet with the atrous convolution-convolutional neural network(IFA-CNN). Firstly, we use the improved FuseNet to fuse the elevation information of DSM(digital surface model) images with the color information of RGB(red green blue) images. At the same time, we propose a multimodal data fusion scheme to solve the problem of poor fusion of the RGB branch and DSM branch. Secondly, multiscale features are captured through flexibly adjusting the receptive field by the atrous convolution. Through deconvolution and upsampling, a decoder that increases the feature maps is formed. Finally, the Softmax classifier is used to procure the semantic segmentation results.
Results Compared with relevant algorithms, IFA-CNN effectively improves the edge burr and thinning boundaries in segmented images, and is more accurate for segmentation of larger objects such as buildings and trees, it also reduces the miss segmentation condition with effect, the segmentation of the shadow covered areas is close to being perfect.The mF1 score achieved when our model is applied to the open ISPRS(International Society for Photogrammetry and Remote Sensing) Potsdam and Vaihingen dataset are 91.6% and 90.4% respectively, exceeding by a considerable margin of relevant algorithms.
Conclusions (1) The virtual fusion(V-Fusion) unit used for segmentation by the multimodal data fusion strategy is more accurate than the one used by the FuseNet network.(2) The encoder-decoder structure is arranged in such a way that the effective improvement of the segmentation accuracy of small target features is guaranteed. So, the loss of detailed information can be decreased. (3) While the multimodal data fusion is being carried out by IFA-CNN, the atrous convolution expands the receptive field accordingly to extract the multiscale information.