Objectives The scale of buildings and their distribution is key indicators to measure the economic and social development of a region. Therefore, it is significant to study the extraction of buildings based on remote sensing images. Existing neural network methods still have shortcomings in the completeness of building extraction and the accuracy of building edges. To solve the above problems, this paper proposes a multi-level feature fusion network (MFFNet) based on high-resolution images.
Methods Firstly, we use edge detection operators to improve the ability of the network to recognize the boundaries of buildings. Secondly, we use a multi-path convolution fusion module to extract building features from multiple dimensions, and introduce a large receptive field convolution module to break through feature extraction. The process is limited by the size of the receptive field. After fusing the extracted features, the convolutional attention module is used to compress them, and the global features are further mined by pyramid pooling, so as to achieve high-precision extraction of buildings.
Results The current mainstream UNet, pyramid scene parsing network (PSPNet), multi attending path neural network (MAPNet) and multiscale-feature fusion deep neural networks with dilated convolution (MDNNet)are used as the comparison methods, and we use Wuhan University Aerial Image Dataset, Satellite Dataset II (East Asia) and Inria Aerial Image Dataset as experimental data for testing. Compared with the other four methods, MFFNet improves intersection over union, precision, recall, F1-score and mean average precision by 1.53%, 2.65%, 2.41%, 3.32% and 1.19% on average, achieves a better effect.
Conclusions MFFNet not only accurately captures the detail features of buildings, but also strengthens the extraction and utilization of global features. It has better extraction effect on large buildings and buildings in complex environment.