Remote Sensing Image Retrieval Using Pre-trained Convolutional Neural Networks Based on ImageNet
-
摘要: 高分辨率遥感图像内容复杂,细节信息丰富,传统的浅层特征在描述这类图像上存在一定难度,容易导致检索中存在较大的语义鸿沟。本文将大规模数据集ImageNet上预训练的4种不同卷积神经网络用于遥感图像检索,首先分别提取4种网络中不同层次的输出值作为高层特征,再对高层特征进行高斯归一化,然后采用欧氏距离作为相似性度量进行检索。在UC-Merced和WHU-RS数据集上的一系列实验结果表明,4种卷积神经网络的高层特征中,以CNN-M特征的检索性能最好;与视觉词袋和全局形态纹理描述子这两种浅层特征相比,高层特征的检索平均准确率提高了15.7%~25.6%,平均归一化修改检索等级减少了17%~22.1%。因此将ImageNet上预训练的卷积神经网络用于遥感图像检索是一种有效的方法。Abstract: High resolution remote sensing images have complicated content and abundant detail information. Large semantic gaps will occur as such images are difficult to describe using traditional shallow features. This paper proposes a method using four different CNNs pre-trained on ImageNet to in remote sensing image retrieval. High-level features are extracted from different layers of four CNNs. A Gaussian normalization method is adopted to normalize high-level features, and Euclidean distance is used as the similarity measurement. A serial of experiments carried on the UC-Merced and WHU-RS datasets show that CNN-M feature achieves the best retrieval performance with CNN features. Compared with the visual bag of words and global morphological texture descriptors, the mean average precision of CNN features was 15.7%-25.6% higher than that of shallow features. The average normalizedmodified retrieval rank of CNN features was 17%-22.1% lower than that of shallow features. Therefore the pre-trained convolutional neural network is effective for high-resolution remote sensing image retrieval.
-
Keywords:
- remote sensing image /
- retrieval /
- convolutional neural networks /
- pre-trained
-
-
表 1 不同卷积神经网络的结构
Table 1 Different CNN Architectures
CNN-Alex CNN-M CNN-16 CNN-19 conv1 96×11×11 conv1 96×7×7 conv1-1 64×3×3
conv1-2 64×3×3conv1-1 64×3×3
conv1-2 64×3×3pool1 pool1 pool1 pool1 conv2 256×5×5 conv2 256×5×5 conv2-1 128×3×3
conv2-2 128×3×3conv2-1 128×3×3
conv2-2 128×3×3pool2 pool2 pool2 pool2 conv3 384×3×3 conv3 512×3×3 conv3-1 256×3×3
conv3-2 256×3×3
conv3-3 256×3×3conv3-1 256×3×3
conv3-2 256×3×3
conv3-3 256×3×3
conv3-4 256×3×3pool3 pool3 conv4 384×3×3 conv4 512×3×3 conv4-1 512×3×3
conv4-2 512×3×3
conv4-3 512×3×3conv4-1 512×3×3
conv4-2 512×3×3
conv4-3 512×3×3
conv4-4 512×3×3pool4 pool4 conv5 256×3×3 conv5 512×3×3 conv5-1 512×3×3
conv5-2 512×3×3
conv5-3 512×3×3conv5-1 512×3×3
conv5-2 512×3×3
conv5-3 512×3×3
conv5-4 512×3×3pool5 fc6 4096 fc7 4096 fc8 1000 表 2 UC-Merced不同特征的mAP /%
Table 2 mAPs for Different Features on the UC-Merced Dataset/%
类别 pool5 fc6 fc7 CNN-Alex 45.9 52.4 49.3 CNN-M 50.6 55.8 54.9 CNN-16 53.6 55.3 53.3 CNN-19 52.3 54.6 52.0 BoVW[6] 30.2 表 3 WHU-RS不同特征的mAP/ %
Table 3 mAPs for Different Features on the WHU-RS Dataset/%
类别 pool5 fc6 fc7 CNN-Alex 55.1 62.3 62.2 CNN-M 59.2 65.6 64.6 CNN-16 58.1 64.5 63.3 CNN-19 56.6 62.5 60.8 BoVW [6] 38.9 -
[1] Aptoula E. Remote Sensing Image Retrieval with Global Morphological Texture Descriptors[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(5):3023-3034 doi: 10.1109/TGRS.2013.2268736
[2] Bretschneider T, Cavet R, Kao O. Retrieval of Remotely Sensed Imagery Using Spectral Information Content[C]. The 22nd IEEE International Conference of Geoscience and Remote Sensing Symposium, Toronto, Canada, 2002 http://ieeexplore.ieee.org/xpl/abstractKeywords.jsp?tp=&arnumber=1026510
[3] Scott G, Klaric M, Davis C, et al. Entropy-Balanced Bitmap Tree for Shape-based Object Retrieval from Large-Scale Satellite Imagery Databases[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(5):1603-1616 doi: 10.1109/TGRS.2010.2088404
[4] Demir B, Bruzzone L. A Novel Active Learning Method in Relevance Feedback for Content-based Remote Sensing Image Retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(9):2323-2334 https://www.researchgate.net/publication/271426283_An_effective_active_learning_method_for_interactive_content-based_retrieval_in_remote_sensing_images
[5] Liu T, Zhang L, Li P, et al. Remotely Sensed Image Retrieval Based on Region-Level Semantic Mining[J].EURASIP Journal on Image and Video Preocessing, 2012, 4(1):1-11 doi: 10.1186/1687-5281-2012-4
[6] Yang Y, Newsam S. Geographic Image Retrieval Using Local Invariant Features[J].IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(2):818-832 doi: 10.1109/TGRS.2012.2205158
[7] 杨进, 刘建波, 戴芹.一种改进包模型的遥感图像检索方法[J].武汉大学学报·信息科学版, 2014, 39(9):1109-1113 http://ch.whu.edu.cn/CN/abstract/abstract3080.shtml Yang Jin, Liu Jianbo, Dai Qin. An Improved Remote Sensing Image Retrieval Method Based on Bag of Word Framework[J]. Geomatics and Information Science of Wuhan University, 2014, 39(9):1109-1113 http://ch.whu.edu.cn/CN/abstract/abstract3080.shtml
[8] Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[C]. The 26th Conference on Neural Information Processing Systems, Nevada, US, 2012 http://dl.acm.org/citation.cfm?id=3065386
[9] Zeiler M D, Fergus R. Visualizing and Understanding Convolutional Networks[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014
[10] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[C]. The 3rd International Conference on Learning Representations, San Diego, Canada, 2015 http://arxiv.org/abs/1409.1556
[11] Donahue J, Jia Y, Vinyals O, et al. Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition[C]. The 31st International Conference on Machine Learning, Beijing, China, 2014 http://dl.acm.org/citation.cfm?id=3044879
[12] Oquab M, Bottou L, Laptev I, et al. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks[C]. The 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6909618
[13] Chatfield K, Simonyan K, Vedaldi A, et al. Return of the Devil in the Details: Delving Deep into Convolutional Networks[C]. The 25th British Machine Vision Conference, Nottingham, England, 2014 http://www.oalib.com/paper/4045769
[14] Penatti O A B, Nogueira K, Santos J A D. Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains?[C]. The IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, 2015 doi: 10.1109/CVPRW.2015.7301382
[15] Hu F, Xia G S, Hu J, et al.Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery[J]. Remote Sensing. 2015, 7(11):14680-14707 doi: 10.3390/rs71114680
[16] Ng J, Yang F, Davis L. Exploiting Local Features from Deep Networks for Image[C]. The IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, 2015 http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7301272
[17] Babenko A, Slesarev A, Chigorin A, et al. Neural Codes for Image Retrieval[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014
[18] Vedaldi A, Lenc K. MatConvNet: Convolutional Neural Networks for MATLAB[C]. The 23rd ACM International Conference on Multimedia, Brisbane, Austrialia, 2015 doi: 10.1145/2733373.2807412
[19] Yang Y, Newsam S. Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification[C]. The 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, US, 2010 http://dl.acm.org/citation.cfm?id=1869829
[20] Xia G S, Yang W, Delon J, et al. Structrual High-Resolution Satellite Image Indexing. In Processings of the ISPRS, TC Ⅶ Symposium Part A: 100 Years ISPRS-Advancing Remote Sensing Science[C]. ISPRS TC Ⅶ Symposium-100 Years ISPRS 38, Vienna, Austria, 2010