Abstract:
Objectives: The rapid advancement of remote sensing Earth observation technology has triggered an explosive growth in data volume, imposing stringent demands on data organization and processing efficiency. There is a pressing need for more suitable data models and enhanced computational power to provide robust support for the organization, processing, and analysis of massive remote sensing datasets. The Discrete Global Grid System (DGGS), characterized by its multi-resolution, discrete, and hierarchical structure formed through recursive partitioning of Earth's space, offers a promising solution. It facilitates a novel data association model with spatial location as the primary key, representing an innovative paradigm for the unified organization and integrated processing of Earth observation data. Concurrently, domestic supercomputers, as high-performance computing facilities with fully independent intellectual property rights, possess unique architectures that deliver formidable computational capabilities. This creates favorable conditions for the efficient processing of massive remote sensing data. This study aims to bridge these two frontiers by implementing and parallelizing the organization of remote sensing data based on the Rhombic Triacontahedron Leeuwen Equal-area Aperture 4 Hexagonal Discrete Global Grid (RTLEA4HDGGS) on a domestic supercomputing platform.
Methods: We designed and implemented a parallel processing framework for organizing remote sensing data within the RTLEA4H DGGS on a domestic supercomputing platform. First, the overall strategy for grid-based remote sensing data organization on the supercomputing platform was formulated, with a focused analysis on the efficiency bottlenecks and computational hotspots of the core algorithms. Subsequently, to meet the requirements of the platform's hardware and heterogeneous programming models, we conducted parallel restructuring and code adaptation of these algorithmic hotspots. This involved optimizing data structures, refining parallel granularity, and efficiently managing data transfer between the host (CPU) and accelerators. The goal was to achieve large-scale, grid-based processing of remote sensing imagery. Finally, the effectiveness of the algorithm migration and optimization was validated, and the parallel computing performance was assessed using computing nodes of the supercomputer. Evaluations were conducted for scenarios utilizing a single Deep Computing Unit (DCU) and multiple DCU acceleration cards, respectively.
Results: Experimental results demonstrate significant performance gains after parallel restructuring. The resampling algorithm for remote sensing images achieved a speedup ratio ranging from 50 to 200 times. The theoretical acceleration limit for a single DCU was approximately 590 times. Employing multiple DCUs further reduced the algorithm's execution time, enhancing its adaptability to high-resolution and large-scale application scenarios. Analysis revealed that data transfer between the CPU and DCU constitutes an overhead that must be considered in algorithm parallelization. However, as the processing hierarchy increases, the relative impact of this overhead diminishes compared to the total computational time on a single CPU. Consequently, the overall algorithm speedup ratio gradually increases with higher processing levels, highlighting the advantage of the parallelized approach for complex, multi-scale tasks.
Conclusions: The successful parallel implementation of the RTLEA4HDGGS based data organization method on the domestic supercomputing platform validates its high efficiency and scalability. The performance advantages are especially pronounced for large-scale and complex data processing tasks. This research contributes to promoting the deep integration of advanced remote sensing data management frameworks with autonomous, controllable domestic high-performance computing infrastructure. It provides a technical pathway for leveraging powerful national computing resources to address the challenges posed by massive Earth observation data, thereby supporting applications in areas such as environmental monitoring, resource management, and climate studies.