面向高分影像建筑物提取的多层次特征融合网络

李星华; 白学辰; 李正军; 左芝勇

doi:10.13203/j.whugis20210506

面向高分影像建筑物提取的多层次特征融合网络

High-Resolution Image Building Extraction Based on Multi-level Feature Fusion Network

摘要

摘要: 建筑物规模及其分布是衡量一个地区经济社会发展状况的关键指标，因此研究基于遥感影像的建筑物提取具有重要意义。现有神经网络方法在建筑物提取的完整度、边缘精确度等方面仍存在不足，由此提出一种基于高分遥感影像的多层次特征融合网络(multi-level feature fusion network，MFFNet)。首先，利用边缘检测算子提升网络对建筑物边界的识别能力，同时借助多路径卷积融合模块多个维度提取建筑物特征，并引入大感受野卷积模块解决感受野大小对特征提取的限制问题；然后，对提取的特征进行融合，利用卷积注意力模块进行压缩，经金字塔池化进一步挖掘全局特征，从而实现建筑物的高精度提取。并与当前主流的UNet、PSPNet（pyramid scene parsing network）、多路径特征融合网络(multi attending path neural network, MAPNet)和MDNNet（multiscale-feature fusion deep neural networks with dilated convolution）方法进行对比，使用亚米级的武汉大学航空影像数据集、卫星数据集II（东亚）与Inria航空影像数据集作为实验数据进行测试, 结果发现，所提方法提取出的建筑物更为完整，边界更加精确。

Abstract:
Objectives The scale of buildings and their distribution is key indicators to measure the economic and social development of a region. Therefore, it is significant to study the extraction of buildings based on remote sensing images. Existing neural network methods still have shortcomings in the completeness of building extraction and the accuracy of building edges. To solve the above problems, this paper proposes a multi-level feature fusion network (MFFNet) based on high-resolution images.
Methods Firstly, we use edge detection operators to improve the ability of the network to recognize the boundaries of buildings. Secondly, we use a multi-path convolution fusion module to extract building features from multiple dimensions, and introduce a large receptive field convolution module to break through feature extraction. The process is limited by the size of the receptive field. After fusing the extracted features, the convolutional attention module is used to compress them, and the global features are further mined by pyramid pooling, so as to achieve high-precision extraction of buildings.
Results The current mainstream UNet, pyramid scene parsing network (PSPNet), multi attending path neural network (MAPNet) and multiscale-feature fusion deep neural networks with dilated convolution (MDNNet)are used as the comparison methods, and we use Wuhan University Aerial Image Dataset, Satellite Dataset II (East Asia) and Inria Aerial Image Dataset as experimental data for testing. Compared with the other four methods, MFFNet improves intersection over union, precision, recall, F1-score and mean average precision by 1.53%, 2.65%, 2.41%, 3.32% and 1.19% on average, achieves a better effect.
Conclusions MFFNet not only accurately captures the detail features of buildings, but also strengthens the extraction and utilization of global features. It has better extraction effect on large buildings and buildings in complex environment.

HTML全文

参考文献(21)

施引文献

资源附件(0)