Objectives Object detection in remote sensing images is widely used, but the cost of training set labeling is high and the image acquisition is limited by weather conditions and policies. However, synthetic data generated by computer rendering is fast and low-cost. Therefore, this paper proposes an object detection method in remote sensing images by utilizing synthetic data.
Methods First, a synthetic data collection system is developed based on Grand Theft Auto V (GTA5) to automatically obtain images and their annotations. And we construct a large-scale synthetic dataset named GTA5-Vehicle, which includes 29 657 instances of vehicles for object detection in remote sensing images. Then, the synthetic image style istransferred to real image style by constructing a cycle-consistent adversarial network while preserving their content. Finally, the effectiveness of the proposed method is evaluated on real datasets, including UCAS-AOD and NWPU VHR-10. To validate the generalization capability of the proposed method, both faster region-based convolutional neural network (Faster R-CNN) and YOLOv8 models are utilized in the object detection experiments.
Results The results demonstrate that style transfer can reduce the domain discrepancy between synthetic and real data, and yield an enhancement in detection accuracy by utilizing the transferred synthetic dataset for pre-training. Specifically, in the absence of real annotated data, the application of style transfer to the Faster R-CNN model yields an average precision improvement of 8.7% across both real datasets. By training the YOLOv8 model on transferred synthetic dataset, it also produces remarkable average precision scores, reaching 80.9% and 66.5% on the respective datasets with intersection over union set to 0.5. Moreover, when a limited amount of real annotated data is available, utilizing the transferred simulated sample set for pre-training can enhance the detection accuracy, resulting in the highest average precision improvements of 27.9% and 18.5% for Faster R⁃CNN and YOLOv8, respectively.
Conclusions The proposed method not only reduces the cost associated with constructing the training set for object detection in remote sensing images, but also offers a valuable solution for settings with limited real annotated data. The synthetic data and code are available at:
https://lsq210.github.io/GTAVDataCollection/.