Volume 47 Issue 8
Aug.  2022
LI Xinghua, BAI Xuechen, LI Zhengjun, ZUO Zhiyong. High-Resolution Image Building Extraction Based on Multi-level Feature Fusion Network[J]. Geomatics and Information Science of Wuhan University, 2022, 47(8): 1236-1244. doi: 10.13203/j.whugis20210506
The National Natural Science Foundation of China 42171302

LI Xinghua, PhD, associate professor, specializes in multi-temporal remote sensing. E-mail: lixinghua5540@whu.edu.cn

• Corresponding author: ZUO Zhiying, PhD, senior engineer. E-mail: azuo19850524@163.com
• Publish Date: 2022-08-05
•   Objectives  The scale of buildings and their distribution is key indicators to measure the economic and social development of a region. Therefore, it is significant to study the extraction of buildings based on remote sensing images. Existing neural network methods still have shortcomings in the completeness of building extraction and the accuracy of building edges. To solve the above problems, this paper proposes a multi-level feature fusion network (MFFNet) based on high-resolution images.  Methods  Firstly, we use edge detection operators to improve the ability of the network to recognize the boundaries of buildings. Secondly, we use a multi-path convolution fusion module to extract building features from multiple dimensions, and introduce a large receptive field convolution module to break through feature extraction. The process is limited by the size of the receptive field. After fusing the extracted features, the convolutional attention module is used to compress them, and the global features are further mined by pyramid pooling, so as to achieve high-precision extraction of buildings.  Results  The current mainstream UNet, pyramid scene parsing network (PSPNet), multi attending path neural network (MAPNet) and multiscale-feature fusion deep neural networks with dilated convolution (MDNNet)are used as the comparison methods, and we use Wuhan University Aerial Image Dataset, Satellite Dataset II (East Asia) and Inria Aerial Image Dataset as experimental data for testing. Compared with the other four methods, MFFNet improves intersection over union, precision, recall, F1-score and mean average precision by 1.53%, 2.65%, 2.41%, 3.32% and 1.19% on average, achieves a better effect.  Conclusions  MFFNet not only accurately captures the detail features of buildings, but also strengthens the extraction and utilization of global features. It has better extraction effect on large buildings and buildings in complex environment.
•  [1] 叶敏, 王斌, 王思远, 等. 多特征分量结合的WorldView-3影像建筑容积率分类提取[J]. 武汉大学学报·信息科学版, 2019, 44(11): 1674-1684 Ye Min, Wang Bin, Wang Siyuan, et al. Extracting Floor Area Ratio of the Classified Buildings from Very High Resolution Satellite Image Using Multiple Features[J]. Geomatics and Information Science of Wuhan University, 2019, 44(11): 1674-1684 [2] 吕凤华, 舒宁, 龚龑, 等. 利用多特征进行航空影像建筑物提取[J]. 武汉大学学报·信息科学版, 2017, 42(5): 656-660 Lü Fenghua, Shu Ning, Gong Yan, et al. Regular Building Extraction from High Resolution Image Based on Multilevel-Features[J]. Geomatics and Information Science of Wuhan University, 2017, 42 (5): 656-660 [3] 高贤君, 郑学东, 沈大江, 等. 城郊高分影像中利用阴影的建筑物自动提取[J]. 武汉大学学报·信息科学版, 2017, 42(10): 1350-1357 Gao Xianjun, Zheng Xuedong, Shen Dajiang, et al. Automatic Building Extraction Based on Shadow Analysis from High Resolution Images in Suburb Areas[J]. Geomatics and Information Science of Wuhan University, 2017, 42(10): 1350-1357 [4] 林祥国, 张继贤. 面向对象的形态学建筑物指数及其高分辨率遥感影像建筑物提取应用[J]. 测绘学报, 2017, 46(6): 724-733 https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB201706009.htm Lin Xiangguo, Zhang Jixian. Object-Based Morphological Building Index for Building Extraction from High Resolution Remote Sensing Imagery[J]. Acta Geodaetica et Cartographica Sinica, 2017, 46(6): 724-733 https://www.cnki.com.cn/Article/CJFDTOTAL-CHXB201706009.htm [5] 舒国栋, 刘传杰, 王露. 机载LiDAR点云的城市平顶建筑物提取方法研究[J]. 现代测绘, 2019, 42(1): 21-23 Shu Guodong, Liu Chuanjie, Wang Lu. Extraction Algorithm Study of Urban Flat-Topped Buildings Based on Airborne LiDAR Point Cloud[J]. Modern Surveying and Mapping, 2019, 42(1): 21-23 [6] 曾齐红, 毛建华, 李先华, 等. 建筑物LiDAR点云的屋顶边界提取[J]. 武汉大学学报·信息科学版, 2009, 34(4): 383-386 http://ch.whu.edu.cn/article/id/1216 Zeng Qihong, Mao Jianhua, Li Xianhua, et al. Building Roof Boundary Extraction from LiDAR Point Cloud[J]. Geomatics and Information Science of Wuhan University, 2009, 34(4): 383-386 http://ch.whu.edu.cn/article/id/1216 [7] Mnih V, Hinton G. Machine Learning for Aerial Image Labeling[D]. Toronto: University of Toronto, 2013 [8] He K M, Zhang X Y, Ren S Q, et al. Deep Residual Learning for Image Recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016 [9] Paisitkriangkrai S, Sherrah J, Janney P, et al. Effective Semantic Pixel Labelling with Convolutional Networks and Conditional Random Fields[C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015 [10] Maggiori E, Tarabalka Y, Charpiat G, et al. High-Resolution Aerial Image Labeling with Convolutional Neural Network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(12): 7092-7103 [11] Long J, Shelhamer E, Darrell T. Fully Convolutional Networks for Semantic Segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015 [12] Kipf T N, Welling M. Semi-supervised Classification with Graph Convolutional Networks[J]. arXiv, 2016, DOI: 1609.02907 [13] Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495 [14] Li X H, He M Z, Li H F, et al. A Combined Loss-Based Multiscale Fully Convolutional Network for High-Resolution Remote Sensing Image Change Detection[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5 [15] Zhu Q, Liao C, Hu H, et al. MAPNet: Multiple Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(7): 6169-6181 [16] Ferrari V, Hebert M, Sminchisescu C, et al. Com puter Vision[C]//The 15th European Conference, Munich, Germany, 2018 [17] Ji S P, Wei S Q, Lu M. Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(1): 574-586 [18] Maggiori E, Tarabalka Y, Charpiat G, et al. Can Semantic Labeling Methods Generalize to any City? The Inria Aerial Image Labeling Benchmark [C]//IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, TX, USA, 2017 [19] Navab N, Hornegger J, Wells W M, et al. Medical Image Computing and Computer-Assisted Intervention[C]//The 18th International Conference, Munich, Germany, 2015 [20] Zhao H S, Shi J P, Qi X J, et al. Pyramid Scene Parsing Network[C]// IEEE Conference on Com puter Vision and Pattern Recognition, Honolulu, HI, USA, 2017 [21] 徐胜军, 欧阳朴衍, 郭学源, 等. 基于多尺度特征融合模型的遥感图像建筑物分割[J]. 计算机测量与控制, 2020, 28(7): 214-219 Xu Shengjun, Ouyang Puyan, Guo Xueyuan, et al. Building Segmentation of Remote Sensing Images Based on Multiscale-Feature Fusion Model[J]. Computer Measurement & Control, 2020, 28(7): 214-219
1. School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China2. CCCC Second Highway Consultants Co. Ltd, Wuhan 430056, China3. Norla Institute of Technical Physics, Chengdu 610041, China
The National Natural Science Foundation of China 42171302

• Corresponding author:ZUO Zhiying, PhD, senior engineer. E-mail: azuo19850524@163.com

Abstract:   Objectives  The scale of buildings and their distribution is key indicators to measure the economic and social development of a region. Therefore, it is significant to study the extraction of buildings based on remote sensing images. Existing neural network methods still have shortcomings in the completeness of building extraction and the accuracy of building edges. To solve the above problems, this paper proposes a multi-level feature fusion network (MFFNet) based on high-resolution images.  Methods  Firstly, we use edge detection operators to improve the ability of the network to recognize the boundaries of buildings. Secondly, we use a multi-path convolution fusion module to extract building features from multiple dimensions, and introduce a large receptive field convolution module to break through feature extraction. The process is limited by the size of the receptive field. After fusing the extracted features, the convolutional attention module is used to compress them, and the global features are further mined by pyramid pooling, so as to achieve high-precision extraction of buildings.  Results  The current mainstream UNet, pyramid scene parsing network (PSPNet), multi attending path neural network (MAPNet) and multiscale-feature fusion deep neural networks with dilated convolution (MDNNet)are used as the comparison methods, and we use Wuhan University Aerial Image Dataset, Satellite Dataset II (East Asia) and Inria Aerial Image Dataset as experimental data for testing. Compared with the other four methods, MFFNet improves intersection over union, precision, recall, F1-score and mean average precision by 1.53%, 2.65%, 2.41%, 3.32% and 1.19% on average, achieves a better effect.  Conclusions  MFFNet not only accurately captures the detail features of buildings, but also strengthens the extraction and utilization of global features. It has better extraction effect on large buildings and buildings in complex environment.

• 建筑物自动提取对于城市经济发展、土地利用分析、空间布局规划具有极为重要的价值。近年来，高分辨率星载/机载技术愈发成熟，从影像上进行建筑物高精度的位置确定、范围勾画逐渐成为可能。然而，由于建筑材料及其形状的不同，高分影像中建筑物内部特征复杂且差异较大，难以完整提取。另外，建筑物周边环境复杂，其他人工地物干扰较多，容易造成误分类。因此，建筑物的准确提取仍然面临巨大挑战。

从遥感影像上提取建筑物的研究起步较早，传统方法主要包括基于像元方法与面向对象方法两类。前者聚焦于单一或者少量邻近像元，通过分析地物光谱、纹理等特征进行分类，其中贝叶斯分类[1]、支持向量机（support vector machines，SVM）[2]等方法应用较为广泛。后者的最小分析单位为地物斑块，利用地物斑块的形状、纹理、拓扑关系等特征提取建筑物。例如，文献[3]利用阴影与建筑物的空间关系提取建筑物，效果较好；文献[4]提出了形态学建筑物指数，提取的完整度更高。然而，在高空间分辨率影像中，建筑物内部特征的空间破碎度更加明显，类内差异增大、类间特征混杂，传统方法难以解决不同光照、成像条件下的建筑物提取问题。随着激光雷达（light detection and ranging，LiDAR）技术的发展，从点云中获取地物信息也成为一种重要途径。文献[5-6]从点云中提取的建筑物轮廓线完整、精度高，但是LiDAR数据获取成本高、难度大，难以应用在大范围建筑物提取上。

近年来，深度学习发展迅速，在建筑物提取上也取得了较好的效果，主要有卷积神经网络（convolutional neural networks，CNN）、全卷积神经网络（fully convolutional networks，FCN）及其改进方法。文献[7]首次将CNN用于建筑物提取，随后其改进方法不断涌现。文献[8]提出ResNet网络结构，解决了深层CNN中梯度爆炸的问题，为更深层次网络提供了方案；文献[9]使用条件随机场（conditional random fields，CRF）在后处理中优化建筑物边缘，改善了建筑物边界提取效果；文献[10]改进了CNN语义密集的问题，设计了一个语义分割框架，可以适应不同分辨率的特征；文献[11]在CVPR（computer vision and pattern recognition conference）会议上提出FCN，开辟了语义分割领域的新道路；文献[12]对FCN进行改进，利用拓扑图的空间特征进行语义分割，提出图卷积神经网络（graph convolutional networks，GCN）；由编码器、解码器与像元级分类器组成的深度卷积网络SegNet[13]在效率与效果上均取得了较大提升。

神经网络方法众多，目前使用CNN提取建筑物仍是主流之一。在以往的研究中，利用CNN提取建筑物主要基于编码器-解码器结构，编码器阶段提取影像特征，解码器阶段恢复影像细节，但是浅层特征不足以支持对小型建筑物的提取，同时也难以准确划定建筑物边界，即特征利用效率低，而多尺度网络思想[14]大大提升了图像特征的利用率。文献[15]提出的多路径特征融合网络（multi attending path neural network，MAPNet）较好地解决了该问题，但因其感受野尺寸单一，提取效果仍然会受大型建筑物内部丰富细节的影响，使网络过多关注局部特征，难以从全局感知特征，从而导致提取的大型建筑物出现孔洞，连续性、完整性较差。另外，在多路径融合阶段，该网络在所有路径上进行特征融合，存在融合跨度过大的问题，稀释了路径内部提取的特征，对建筑物的准确识别造成不利影响。

为解决上述问题，本文使用多路径卷积融合模块与大感受野特征感知模块，设计一种多层次特征融合网络（multi-level feature fusion network，MFFNet），提升高分辨率遥感影像的建筑物提取精度。

