利用模拟数据进行遥感图像目标检测模型训练

罗诗琦; 罗斌; 苏鑫; 张婧; 刘军

doi:10.13203/j.whugis20230149

利用模拟数据进行遥感图像目标检测模型训练

Object Detection in Remote Sensing Images by Utilizing Synthetic Data

摘要

摘要: 遥感图像目标检测技术应用广泛，但是训练数据获取受到天气、政策等限制且标注成本高昂，然而由计算机渲染得到的模拟数据生成速度快且成本低，因此将模拟数据应用于遥感图像目标检测模型训练。首先，基于《侠盗猎车手5》提出了自动获取图像及其标注的模拟数据采集系统，快速构建大规模遥感图像目标检测模拟样本集；然后，通过构建循环生成对抗网络将模拟遥感图像风格迁移至真实遥感图像；最后，在真实数据集UCAS-AOD和NWPU VHR-10上评估了所提方法的有效性。结果表明，风格迁移减小了模拟数据与真实数据之间的域差异，将迁移后的模拟样本集用于预训练可提升检测精度，在Faster R-CNN和YOLOv8上的平均精度最高分别提升了27.9%和18.5%。所提方法减小了遥感图像目标检测训练集的构建成本，可以为真实标注数据不足的场景提供有价值的解决方案。

Abstract:
Objectives Object detection in remote sensing images is widely used, but the cost of training set labeling is high and the image acquisition is limited by weather conditions and policies. However, synthetic data generated by computer rendering is fast and low-cost. Therefore, this paper proposes an object detection method in remote sensing images by utilizing synthetic data.
Methods First, a synthetic data collection system is developed based on Grand Theft Auto V (GTA5) to automatically obtain images and their annotations. And we construct a large-scale synthetic dataset named GTA5-Vehicle, which includes 29 657 instances of vehicles for object detection in remote sensing images. Then, the synthetic image style istransferred to real image style by constructing a cycle-consistent adversarial network while preserving their content. Finally, the effectiveness of the proposed method is evaluated on real datasets, including UCAS-AOD and NWPU VHR-10. To validate the generalization capability of the proposed method, both faster region-based convolutional neural network （Faster R-CNN) and YOLOv8 models are utilized in the object detection experiments.
Results The results demonstrate that style transfer can reduce the domain discrepancy between synthetic and real data, and yield an enhancement in detection accuracy by utilizing the transferred synthetic dataset for pre-training. Specifically, in the absence of real annotated data, the application of style transfer to the Faster R-CNN model yields an average precision improvement of 8.7% across both real datasets. By training the YOLOv8 model on transferred synthetic dataset, it also produces remarkable average precision scores, reaching 80.9% and 66.5% on the respective datasets with intersection over union set to 0.5. Moreover, when a limited amount of real annotated data is available, utilizing the transferred simulated sample set for pre-training can enhance the detection accuracy, resulting in the highest average precision improvements of 27.9% and 18.5% for Faster R⁃CNN and YOLOv8, respectively.
Conclusions The proposed method not only reduces the cost associated with constructing the training set for object detection in remote sensing images, but also offers a valuable solution for settings with limited real annotated data. The synthetic data and code are available at:https://lsq210.github.io/GTAVDataCollection/.

HTML全文

参考文献(41)

施引文献

资源附件(0)