Abstract:
Objectives The spatial resolution of the time-varying gravity field model provided by gravity recovery and climate experiment(GRACE) satellite gravity for retrieving terrestrial water storage anomaly (TWSA) is limited, which restricts its application potential in the study of regional water cycle and climate change. The current machine learning downscaling methods have been effective in improving the spatial resolution of GRACE TWSA data, but further exploration is needed on the reasonable selection of predictive factors and their impacts on the performance of machine learning models, as well as the accurate evaluation of downscaling results.
Methods A hydrological model downscaling method and three machine learning model downscaling methods, including random forest (RF), extreme gradient boosting (XGBoost), and artificial neural network (ANN), were adopted to downscale the TWSA data obtained from GRACE inversion of the Yangtze River Basin with the spatial resolutions from 1°×1° to 0.25°×0.25° and 0.1°×0.1°, respectively. To evaluate different downscaling methods, a closed-loop simulation experiment was conducted using TWSA data from global land data assimilation system hydrological model to evaluate the performance of different downscaling methods. Subsequently, the TWSA data obtained from GRACE was downscaled, and the performance and results of different downscaling methods were comprehensively evaluated using measured water level data.
Results The performance of machine-learning-models-based downscaling method is sensitive to the number of predictive factors. As the number of predictors increases, the overall downscaling performance improves, although the magnitude of improvement varies among models. Partial least squares regression analysis indicates that the six most important variables ,normalized difference vegetation index (NDVI), soil moisture, precipitation, temperature, runoff, and U-wind, are sufficient to achieve robust performance, while the inclusion of additional predictors yields only marginal gains. Closed-loop simulation experiments further reveal differences among models. Compared with RF and XGBoost models, ANN model shows relatively lower performance metrics; however, it produces the most favorable downscaled spatial patterns. When evaluated against water level observations in the Yangtze River Basin, TWSA remains consistent with observed water levels both before and after downscaling. In particular, RF-based results show a marked improvement in correlation, with all coefficients exceeding 0.7. Using Poyang Lake as a representative case, the long-term trends of the machine learning downscaling results were compared with those of the three predictors with the highest variable importance in projection scores, including NDVI, soil moisture, and precipitation. The results indicate that the downscaled outputs exhibit long-term trends highly consistent with NDVI and soil moisture, and their spatial variations closely correspond to the extent of water-covered areas in Poyang Lake. In contrast, precipitation does not display a similar trend pattern.
Conclusions The best downscaling methods are RF and XGBoost, while the downscaling methods of ANN and hydrological model perform poorly. Additionally, the results of hydrological model downscaling depend on the correlation between the hydrological model and GRACE data. However, machine learning downscaling methods can better integrate the changing characteristics of different auxiliary data such as hydrology, meteorology, and vegetation (especially important predictive factors), thus enabling better recovery of detailed TWSA signals in the watershed.