Abstract:
Objectives Object detection in remote sensing images is widely used, but the cost of training set labeling is high and image acquisition is limited by weather conditions and policies. However, synthetic data generated by computer rendering is fast and low-cost. Therefore, we propose a method for object detection in remote sensing images by utilizing synthetic data.
Methods Firstly, a synthetic data collection system was developed based on GTA5(Grand Theft Auto V) to automatically obtain images and their annotations. Using this system, we construct a large-scale synthetic dataset named GTA5-Vehicle, which includes 29,657 instances of vehicles for remote sensing image object detection. Secondly, we transfer the synthetic image style to real image style by constructing a cycle-consistent adversarial network while preserving their content. Finally, the effectiveness of the proposed approach is evaluated on real datasets, namely UCAS-AOD and NWPU VHR-10. To validate the generalization capability of the approach, both Faster RCNN and YOLOv8 models are utilized for object detection experiments.
Results The results demonstrate that style transfer reduces the domain discrepancy between synthetic and real data, and leveraging the transferred synthetic dataset for pre-training yields an enhancement in detection accuracy. Specifically, in the absence of real annotated data, the application of style transfer to the Faster RCNN model yields an average precision improvement of 8.7% across both real datasets. Training the YOLOv8 model on transferred synthetic dataset produces remarkable average precision scores, reaching 80.9% and 66.5% on the respective datasets with IoU set to 0.5. Moreover, when a limited amount of real annotated data is available, utilizing the transferred simulated sample set for pre-training enhances detection accuracy, resulting in the highest average precision improvements of 27.9% and 18.5% for Faster RCNN and YOLOv8, respectively.
Conclusions The proposed method not only reduces the cost associated with constructing the training set for object detection in remote sensing images but also offers a valuable solution for settings with limited real annotated data. The synthetic data and code are available at: https://lsq210.github.io/GTAVDataCollection/ .