Abstract:
In recent years, significant advancements in large language models and visual foundation models in the field of artificial intelligence have attracted scholars' attention to the potential of general artificial intelligence technology in remote sensing. These studies have propelled a new paradigm in the research of large models for remote sensing information processing. Large remote sensing models, also known as pre-trained foundation remote sensing models, are a kind of methodology that employs a vast amount of unlabeled remote sensing images to train large-scale deep learning models. The goal is to extract universal feature representations from remote sensing images, thereby enhancing the performance, efficiency, and versatility of remote sensing image analysis tasks. Research on large remote sensing models involves three key factors, including pre-training datasets, model parameters, and pre-training techniques. Among them, pre-training datasets and model parameters can be flexibly expanded with the increase in data and computational resources, while pre-training techniques are critical for improving the performance of large remote sensing models. This review focuses on the pre-training techniques of large remote sensing models and systematically analyzes the existing supervised single-modal pre-trained large remote sensing models, unsupervised single-modal pre-trained large remote sensing models, and visual-text joint multimodal pre-trained large remote sensing models. The conclusion section provides prospects for large remote sensing models in terms of integrating domain knowledge and physical constraint, enhancing data generalization, expanding application scenarios, and reducing data costs.