Open AccessArticle

Satellite-Based Reconstruction of Atmospheric CO₂ Concentration over China Using a Hybrid CNN and Spatiotemporal Kriging Model

Yiying Hua

Xuesheng Zhao

Wenbin Sun

and

Qiwen Sun

College of Geoscience and Surveying Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(13), 2433; https://doi.org/10.3390/rs16132433

Submission received: 9 May 2024 / Revised: 30 June 2024 / Accepted: 1 July 2024 / Published: 2 July 2024

(This article belongs to the Special Issue Satellite-Based Climate Change and Sustainability Studies)

Download

Browse Figures

Versions Notes

Abstract

Although atmospheric CO₂ concentrations collected by satellites play a crucial role in understanding global greenhouse gases, the sparse geographic distribution greatly affects their widespread application. In this paper, a hybrid CNN and spatiotemporal Kriging (CNN-STK) model is proposed to generate a monthly spatiotemporal continuous XCO2 dataset over China at 0.25° grid-scale from 2015 to 2020, utilizing OCO-2 XCO2 and geographic covariates. The validations against observation samples, CAMS XCO2 and TCCON measurements indicate the CNN-STK model is effective, robust, and reliable with high accuracy (validation set metrics: R² = 0.936, RMSE = 1.3 ppm, MAE = 0.946 ppm; compared with TCCON: R² = 0.954, RMSE = 0.898 ppm and MAE = 0.741 ppm). The accuracy of CNN-STK XCO2 exhibits spatial inhomogeneity, with higher accuracy in northern China during spring, autumn, and winter and lower accuracy in northeast China during summer. XCO2 in low-value-clustering areas is notably influenced by biological activities. Moreover, relatively high uncertainties are observed in the Qinghai-Tibet Plateau and Sichuan Basin. This study innovatively integrates deep learning with the geostatistical method, providing a stable and cost-effective approach for other countries and regions to obtain regional scales of atmospheric CO₂ concentrations, thereby supporting policy formulation and actions to address climate change.

Keywords:

XCO2; CNN model; spatiotemporal Kriging; TCCON; spatial inhomogeneity; greenhouse gases

Graphical Abstract

1. Introduction

Nowadays, global climate change is profoundly affecting both natural and human systems [1,2]. In 2023, the sixth Synthesis Report of the Intergovernmental Panel on Climate Change (IPCC) stated that human activities are undoubtedly contributing to global warming through the emissions of greenhouse gases (GHGs) [3]. Carbon dioxide (CO₂), which has the highest concentration and broadest sources, exhibits pronounced absorption characteristics for solar visible light and infrared radiation from Earth [4], as well as the close association with biological activities, including human activities, establishes it as the predominant GHG [5].

There are limited ways to determine the CO₂ concentrations in the atmosphere. Ground-based observation networks, such as the Network for the Detection of Atmospheric Composition Change (NDACC) and the Total Carbon Column Observing Network (TCCON), can obtain high-precision, ground-based measurements of GHGs [6,7]. However, the high construction and operating costs of these networks and the strict site selection have led to its uneven global geospatial distribution, making it difficult to obtain continuous spatial distribution of CO₂ concentration in large regions. Furthermore, atmospheric physicochemical models (such as CESM, WRF-Chem, CTMs, etc.) can be used to simulate and predict the transport, diffusion, and concentration distribution of various components, including atmospheric CO₂. However, the calculation of such models involves many physical and chemical processes, requiring large amounts of computational resources and data support, leading to inherent uncertainties [8,9].

Remote sensing data possess the advantages of consistency, stability, and objectivity. Envisat-1, launched by the European Space Agency in 2002, carried the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) sensor for atmospheric mapping and was the first satellite that had the ability to measure atmospheric CO₂ in the world. This mission confirmed the feasibility of space-based detection of near-surface CO₂ concentrations in the near-infrared spectral range [10]. After that, GHG satellite GOSAT, as well as carbon observation satellites such as OCO-2, OCO-3, and TanSat, acquire the column-averaged dry air mole fraction of CO₂—XCO2 by detecting short-wave infrared reflected from the Earth’s surface and thermal infrared emitted by the surface and atmosphere [11]. Bias correction of these raw XCO2 retrievals, including the removal of footprint bias, the removal of XCO2 errors correlated with specific variables, and the determination of global offsets based on the “truth proxy” (TCCON), allows users to obtain the best estimated XCO2 L2 products directly. Thus, satellite observations have become one of the most reliable means currently available for capturing changes in global and regional atmospheric CO₂ concentration [12,13]. This remote sensing method does not rely on numerical model simulations and can provide independent atmospheric CO₂ concentration from space observation, which can address the sparsity problem of the GHG monitoring network. However, due to factors such as clouds and aerosols and limitations in inversion algorithms, the effective retrieval of XCO2 data is temporally and spatially discontinuous, which significantly constrains the widespread application of satellite observations. Figure S1 in the Supplementary Materials illustrates the XCO2 footprint density of OCO-2 satellite over mainland China at daily, weekly, monthly, and yearly time scales, from which it can be seen that the OCO-2 XCO2 data have the noticeable problem of missing data at various temporal scales. Therefore, filling the spatial and temporal gaps in XCO2 data to obtain high-precision and high-resolution full-coverage maps is significant for enhancing the application potential of XCO2 satellite data and understanding the spatiotemporal distributions and patterns of atmospheric CO₂ on a large scale.

Various geostatistical methods can enhance the spatiotemporal coverage of XCO2 data. For example, traditional ordinary Kriging [14,15], sliding window Kriging [16], empirical Bayesian Kriging [17], precision-weighted Kriging [18], fixed-rank Kriging [19], among others, have proved effective in obtaining large-scale spatial distributions of XCO2. The spatiotemporal Kriging algorithm, an extension from spatial to spatiotemporal domains, takes the temporal trend and spatial correlation within the data into account at the same time, thereby improving the accuracy of interpolation [20]. However, the resolution of the results obtained by such interpolation methods is limited, especially when the satellite data are very sparse, making it challenging to obtain full-coverage results with high temporal and spatial resolution. Moreover, this method may inadvertently smooth the spatial characteristics of XCO2, which should not be ignored in applied researches like atmospheric pollution source studies [21]. In addition to the various Kriging methods, researchers have used the Bayesian Maximum Entropy (BME) [22] and the High-Accuracy Surface Modeling (HASM) [23] to attain precise XCO2 distribution results, which have high computational complexity and require substantial computational power when handling large datasets.

In recent years, with the development of artificial intelligence technology and the advent of the big data era, data-driven machine learning methods have rapidly evolved in the field of remote sensing. Many scholars have already explored the potential of machine learning methods to apply to the XCO2 gap-filling problem [24,25,26]. Based on multi-source remote sensing datasets such as elevation, nighttime light, vegetation indices, meteorology, land use, and socioeconomic data like population density as modeling independent variables, they used XCO2 retrieval data and existing XCO2 datasets simulated by atmospheric physicochemical models (such as Carbon Tracker XCO2 and CAMS XCO2) to construct machine learning models including Random Forest (RF), Extreme Random Forest, XGBoost, LightGBM and CatBoost to obtain full-coverage XCO2 datasets for different study areas. The experiments indicate that machine learning models have satisfactory performances in XCO2 data interpolation, especially the RF model. Furthermore, some researchers made efforts to use deep learning methods. Li [27] constructed a Deep Neural Network (DNN) model based on OCO-2 XCO2, CAMS XCO2, and auxiliary datasets, such as meteorology datasets and vegetation indices, to generate a long-term daily full-coverage XCO2 dataset with high accuracy (R² = 0.866) compared to TCCON measurements. Zhang and Liu [28] developed a Convolutional Neural Network (CNN) model based on XCO2 data from the SCIAMACHY sensor, GOSAT satellite, OCO-2 satellite, and Carbon Tracker dataset, along with multi-source remote sensing datasets as auxiliary variables. This model produced a long-term monthly XCO2 dataset for China, and accuracy validation results demonstrate its strong generalization ability and good agreement with TCCON observations, with an average bias of −0.60 ppm, MAE of 0.95 ppm, and RMSE of 1.18 ppm.

Machine learning models can automatically capture complex surface features and patterns, enabling the achievement of high-precision parameter extractions and spatiotemporal predictions for quantitative problems caused by the lack of precise description through traditional mechanisms. Although machine learning models have problems related to poor interpretability, these models can provide full-coverage XCO2 datasets with high consistency with ground-based measurements, showcasing significant practical utility. For deep learning models, a substantial number of training samples are typically required, posing a challenge in obtaining a sufficient quantity of high-quality labeled training datasets.

Among the existing methods for interpolating XCO2 data, geostatistical methods can fully utilize the spatial and temporal correlation of geographic data and have significant advantages in revealing the spatial trends and patterns of geographic phenomena, while machine learning methods can mine various features of auxiliary data, efficiently process large-scale data, and obtain high-precision estimation results. Combining different models allows for the synergistic utilization of the strengths of each type of model, thereby mitigating the bias inherent in individual models and reducing the volatility of predictive results [29,30]. However, most current researches rely on a single method to fill gaps in the XCO2 dataset.

Therefore, in order to enhance the support of existing discrete XCO2 observations for the studies of spatiotemporal trend of atmospheric CO₂ concentrations, surface carbon emission monitoring, and regional-scale carbon cycling under a space-based perspective, this study considered to construct a novel hybrid model combining deep learning model (CNN) and geostatistical method (spatiotemporal Kriging), termed CNN-STK, to derive fully-covered regional atmospheric CO₂ column concentration data from existing satellite observations. Firstly, the CNN model was constructed by combining a variety of geographic covariates and the OCO-2 XCO2 data in order to extract the deterministic trends in the XCO2 data. Secondly, the residuals of the CNN model were interpolated based on the spatiotemporal Kriging method to optimize the residual distribution [31]. Thirdly, the predicted trends of the CNN model were combined with the residual distributions interpolated by the spatiotemporal Kriging to generate a high-precision monthly XCO2 dataset at 0.25° grid scale in China from 2015 to 2020. Last but not least, the hybrid CNN-STK model was assessed for accuracy and reliability using multiple validation methods. The objectives and novelties of this study are as follows:

(1): Introducing a novel method to transform discrete satellite observations into continuous spatiotemporal datasets.
(2): Effectively integrating different types of models to optimize inherent biases of individual models.
(3): Producing an independent, high-precision atmospheric CO₂ dataset to enhance understanding of the carbon cycle and climate change.

The paper is organized as follows: Section 2 describes the datasets used in the study and the research methodology. Section 3 shows the experimental results and model evaluations; Section 4 discusses the spatial inhomogeneity, seasonal fluctuations, and uncertainties of CNN-STK XCO2 accuracy, as well as the advantages and limitations of this study.

2. Materials and Methods

2.1. Datasets

2.1.1. OCO-2 Dataset

Orbiting Carbon Observatory-2 (OCO-2) is a satellite launched by NASA dedicated to monitoring the distribution and variability of atmospheric CO₂ concentrations on Earth. Its primary product is XCO2 data, which represents the average concentration of CO₂ in the dry air column extending from the Earth’s surface to the top of the atmosphere, measured in parts per million (ppm). The OCO-2 spacecraft collects reflected sunlight in the A-band of molecular oxygen centered at 765 nm and the CO₂ bands centered at 1610 and 2060 nm in high-resolution spectra. OCO-2 XCO2 data have a spatial resolution of 2.25 km × 1.29 km, a 16-day revisit period, and a satellite observing footprint of approximately 10 km. Despite the OCO-2 satellite can acquire nearly a million observations per day, only about 10% of these measurements are valid due to cloud cover and optically thick aerosols that impede observations of the entire atmospheric column [32]. Aerosol was the largest source of error for OCO-2 XCO2, followed by spectroscopy and calibration. The total terrestrial error due to all error sources is ~1.5–3.5 ppm [33].

In this study, we collected OCO-2 XCO2 data (OCO2_L2_Lite_FP V10r) from EARTHDATA (https://www.earthdata.nasa.gov/ [last access: 18 March 2023]) from 2015 to 2020. Then, we filtered high-quality data with “xco2_quality_flag = 0” and removed outliers for each month according to the 3σ criterion. Considering the uneven spatial distribution of XCO2 data, which may introduce scale biases to the model results, we calculated the average values of XCO2 data within each 0.25° grid in China and generated the monthly 0.25° grid-scale XCO2 data in China for the period from 2015 to 2020, with a total of 211,116 data points.

2.1.2. Reanalysis Datasets

(1) ERA5 dataset: ERA5 is the fifth-generation atmospheric reanalysis dataset released by the European Center for Medium-Range Weather Forecasts (ECMWF). It provides daily and monthly estimates of atmospheric, terrestrial, and oceanic climate variables on a global scale based on information assimilation techniques. ERA5 has a spatial resolution of 0.25° × 0.25° and a temporal resolution of monthly. It is accessible on the Climate Data Store (https://cds.climate.copernicus.eu/ [last access: 26 June 2023]). This study collected fourteen variables from 2015 to 2020. Please refer to Table S1 in the Supplementary Materials for a detailed description.

(2) EGG4 dataset: CAMS global greenhouse gas reanalysis (EGG4) is a reanalysis dataset focusing on CO₂ and methane released by ECMWF. By simulating the CO₂ fluxes from terrestrial vegetation, it captures the variability of CO₂ at various temporal scales from daily to interannual. EGG4 has a spatial resolution of 0.75° × 0.75° and a temporal resolution of monthly. It is available on the Atmosphere Data Store (https://ads.atmosphere.copernicus.eu/ [last access: 26 June 2023]). In this study, a total of eight variables were collected from 2015 to 2020. Among them, the TCCO2 variable characterizing the CO₂ column concentration data was used to validate the model’s accuracy, and the remaining variables were used to construct the model. We downscaled EGG4 data to a resolution of 0.25° × 0.25° by spatial interpolation. For more detailed descriptions, please refer to Table S2 in the Supplementary Materials.

2.1.3. Other Geographical Covariates

(1) NPP-VIIRS dataset: This dataset provides high-resolution global nighttime light data. It is jointly maintained and published by NASA and the NOAA National Centers for Environmental Information (NCEI). The dataset has a spatial resolution of 0.004° × 0.004° and can be obtained as monthly composite products from the Earth Observation Group (EOG) Website at https://eogdata.mines.edu/products/vnl/ ([last access: 26 June 2023]). (2) DEM: STRM DEM 90m data were collected from the Resource and Environment Science and Data Center (https://www.resdc.cn/ [last access: 2 April 2021]). (3) Longitude/Latitude: Previous studies have demonstrated that the XCO2 distribution has obvious latitudinal differences [34], so we extracted the latitude and longitude information of each data point as auxiliary variables in this study. The above datasets were upscaled to 0.25° × 0.25° by resampling to standardize the spatial resolution of all variables.

2.1.4. TCCON Ground-Based Network

TCCON is a network of ground-based Fourier Transform Spectrometers recording direct solar spectra in the near-infrared spectral region. From these spectra, accurate and precise column-averaged abundances of CO₂, CH₄, N₂O, HF, CO, H₂O, and HDO are retrieved and reported. The TCCON data have become the primary ground-based data source for systematic calibration of atmospheric CO₂ column concentration data acquired by GHG observing satellites and can provide high-precision CO₂ measurements to validate the accuracy of remote sensing products [35]. We collected the GGG2020 dataset of the Hefei site and Xianghe site (Table 1), which are located in the study area, from the TCCON data archive website (https://tccondata.org/ (accessed on 9 November 2023)).

2.1.5. Mapping-XCO2 Dataset

The global 1° land mapping XCO2 dataset (Mapping-XCO2) was derived from satellite XCO2 retrievals of GOSAT and OCO-2 spanning the period of April 2009 to December 2020. The dataset adjusted and integrated XCO2 retrievals from GOSAT and OCO-2, and then, utilizing spatiotemporal Kriging method to interpolate XCO2 values in data-sparse regions, resulting in a comprehensive global terrestrial XCO2 dataset [36]. The spatial resolution is 1° × 1° and the temporal resolution is 3 days or 1 month. The overall bias of Mapping-XCO2 dataset is 0.01 ppm, and a standard deviation of the difference of 1.22 ppm compared with TCCON observations, indicating high quality and reliability. We collected monthly Mapping-XCO2 from January 2015 to December 2020 and extracted for the China region.

In general, in our experiment, the target variable is OCO-2 XCO2, while the auxiliary variables comprise fourteen meteorological features from ERA5 dataset (blh, u10, v10, si10, msl, t2m, e, skt, ssr, sp, tco3, tcw, tp, totalx), seven flux features from EGG4 dataset (aco2gpp, aco2nee, aco2rec, fco2gpp, fco2nee, fco2rec, tcch4), and four additional geographic covariates (nighttime light dataset, DEM, latitude and longitude), totaling 25. The tcco2 variable, representing CO₂ column concentrations from the EGG4 dataset, along with ground-based XCO2 observations from the TCCON network and the Mapping-XCO2 dataset, were used to assess the accuracy and uncertainty of the experimentally generated dataset across multiple dimensions.

2.2. Methods

The CNN-STK model is a hybrid model combining a CNN model with the spatiotemporal Kriging method, which is similar in essence to regression Kriging [37], i.e., decomposing the non-stationary regionalized variables into two components: a deterministic trend and stochastic residuals [38] (p. 8), as expressed in Equation (1). To the best of our knowledge, this integrated method is new and has never been reported. Implementing the CNN-STK method involves three main steps. Firstly, a CNN model is constructed based on XCO2 observations and geographic covariates to extract the deterministic spatial trend. Secondly, the spatiotemporal Kriging method is employed to handle the stochastic residuals, involving the computation of its empirical semi-variogram, construction of the covariance functions, and prediction of unsampled points. Thirdly, the final interpolated results of the CNN-STK model are obtained by combining the trend values extracted by the CNN model with the residual terms estimated by the spatiotemporal Kriging. The flowchart of this study is shown in Figure 1.

\hat{Z} (s, t) = \hat{M} (s, t) + \hat{R} (s, t)

(1)

where

\hat{Z} (s, t)

denotes the XCO2 estimation of the unknown point at position s and time t;

\hat{M} (s, t)

represents the trend component simulated by the CNN model at the same unknown point;

\hat{R} (s, t)

is the residual term at the same unknown point obtained by the spatiotemporal Kriging interpolation.

2.2.1. CNN Model

Compared to traditional machine learning models, the CNN model possesses the capability to autonomously extract and learn features as well as correlations within images, thus eliminating the necessity to manually perform feature engineering to filter out variables with weak correlation. We utilized the approach proposed by Zhang and Liu [28] to construct a training dataset and validation dataset for the CNN model. Initially, we defined 9 × 9 as the length and width of the sample, the number of auxiliary variables as the depth (i.e., 25), and each XCO2 observation point as the sample center to create high-dimensional arrays with dimensions (25, 9, 9), where the label value of each array is the XCO2 value corresponded to the sample center. In total, there are 211,116 samples. Subsequently, 80% of the samples were randomly selected as the training dataset, while the remaining 20% served as the validation dataset. We normalized the training dataset to remove the magnitude differences between the feature variables. Through several experiments, we constructed a three-layer neural network structure, as depicted in Figure 2. There are two convolutional layers and one fully connected layer in the network, and we added an activation layer after each convolutional layer. Additionally, the MSELoss function was used to monitor the model loss. We retained the optimal model through adjustments to the trainable parameters and ultimately utilized it to predict all points in the study area. The model was constructed using Python based on the PyTorch framework.

2.2.2. Spatiotemporal Kriging

It is a common mapping method for spatiotemporal discrete points to a continuum. The process of calculating weight coefficients and estimating weighted averages is basically the same as that of spatial Kriging interpolation, which extends the computational dimension from space to space-time. Suppose

\{Z (s, t), (s, t) \in D \times T \subseteq R^{d + 1}\}

is a space-time random field,

s = (s_{1}, s_{2}, \dots, s_{d})

(

d \leq 3

, generally) is the space coordinate and

t \leq T

is the time coordinate. The interpolation formula of the spatiotemporal Kriging is as follows [38] (p. 39):

Z \hat{{(s, t)}_{0}} = \sum_{i = 1}^{n} λ_{i} {Z (s, t)}_{i}

(2)

where

Z \hat{{(s, t)}_{0}}

is the predicted result of point s at t time.

{Z (s, t)}_{i}

denotes the observation; n is the number of observations; and

λ_{i}

is the weight coefficient which constitutes the optimal set of coefficients that satisfy the minimum difference between the estimation and the observation at the point

{(s, t)}_{0}

, i.e.,:

\min_{λ_{i}} V a r (\hat{{Z (s, t)}_{0}} - {Z (s, t)}_{0})

. Simultaneously, satisfying the conditions for unbiased estimation (Equation (3)) [38] (pp. 31–32):

E (\hat{{Z (s, t)}_{0}} - {Z (s, t)}_{0}) = 0

(3)

For the residuals

\hat{R} (s, t)

obtained in this study, we need to test its stationarity before constructing the spatiotemporal Kriging model. After passing the test, we first calculated the spatiotemporal empirical semi-variogram function and then fit the spatiotemporal theoretical variogram function model. In this step, we chose the spatiotemporal product-sum model as the theoretical variogram model to fit the spatiotemporal variational structure of the spatiotemporal geographic data [18,36], which is obtained by transforming the known pure spatial covariance and pure temporal covariance functions by addition and multiplication, mixing, and integrals. The covariance function and variational function are as follows (Equations (4) and (5), [38], pp. 25–27):

C (h_{s}, h_{t}) = k_{1} C_{s} (h_{s}) C_{t} (h_{t}) + k_{2} C_{s} (h_{s}) {+ k_{3} C}_{t} (h_{t})

(4)

γ (h_{s}, h_{t}) = (k_{1} C_{t} (0) + k_{2}) γ_{s} (h_{s}) + (k_{1} C_{s} (0) + k_{3}) γ_{t} (h_{t}) - k_{1} γ_{s} (h_{s}) γ_{t} (h_{t})

(5)

In Equation (4),

γ (h_{s}, h_{t}), γ_{s} (h_{s}), γ_{t} (h_{t})

are the corresponding spatiotemporal, spatial, and temporal variogram functions, respectively, while

C (0,0), C_{s} (0), C_{t} (0)

are the corresponding sill values, respectively. Generally, the maximum value approaching stability is taken when computing the spatiotemporal variogram function for the sample. This study implemented spatiotemporal Kriging interpolation based on the “gstat” package in the R programming language.

Finally, the constructed spatiotemporal model was used to interpolate the residuals. The trend term was added to the residual term so as to obtain the final XCO2 estimates. For a more comprehensive understanding, please refer to the previous works [39,40].

2.2.3. Validation Methods

In order to assess the accuracy of the CNN-STK model, the common metrics, including the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE), were computed using the independent validation dataset. The formulas of the metrics are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(6)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(\hat{y_{i}} - y_{i})}^{2}}{n}}

(7)

M A E = \frac{\sum_{i = 1}^{n} |\hat{y_{i}} - y_{i}|}{n}

(8)

where n represents the number of sample points;

y_{i}

is the observation of the ith point;

\hat{y_{i}}

is the estimation of the ith point;

\bar{y}

is the mean of the observations. The larger the R², the lower the RMSE and MAE, and the higher the accuracy of model estimations.

In addition, we used the same variables and samples to construct a CNN model, an RF model, and an RF-STK model (a hybrid method combining RF and spatiotemporal Kriging with the same strategy as CNN-STK) for comparison. The accuracies of these models were evaluated using the same metrics, enabling a comparative assessment of the effectiveness of the CNN-STK model. Then, we compared the CNN-STK interpolation results with a model simulation dataset (CAMS XCO2) and ground-based measurements (TCCON XCO2) to assess the reliability of the model predictions.

Last but not least, we separately calculated the relative errors (REs) (Equation (9)) between the CNN-STK estimations and observations to discuss the spatial heterogeneity of accuracy. The smaller the RE, the higher the precision of the interpolation.

R E = |\frac{\hat{y_{i}} - y_{i}}{y_{i}}| \times 100 %

(9)

3. Experimental Result and Accuracy Evaluation

3.1. Experimental Results

In this experiment, a CNN model was initially constructed using geographic covariates and OCO2 XCO2 data to extract deterministic trends. The stationarity test conducted on the residuals showed that the CNN model has the capability to capture the primary trends and relationships within the data, with the residuals exhibiting stationarity at a 1% confidence level, which meets the prerequisite for Kriging interpolation.

Subsequently, the sample variogram and the theoretical variogram model were calculated for the residuals. The quality of the sample variogram depends on the number and distribution of available sample points. A sufficient number and good distribution of sample points can reduce the lag distance as much as possible and obtain more reliable calculation results. We aggregated discrete OCO-2 XCO2 retrievals into 0.25° grids at a monthly time scale, greatly reducing data gaps, but over 60% of the positions still have less than 10% valid values within a 72-unit time frame. Therefore, we attempted to fit residual semi-variogram models using all grid positions and subsets with at least 12, 24, 36, or 48 valid values as sample points. It was found that when using all positions and the subset with at least 12 valid values as sample points, we were unable to obtain the semi-variogram fitting curve converged to a stable value over space and time at certain distances. In other words, to achieve stable spatial and temporal variation scales when distances between spatiotemporal sample points are excessively large, it is necessary to screen the points to reduce uncertainties. After several experimental comparisons, it was recommended to select grid points with at least one-third (24-time units) of valid values as sample points to participate in the calculation and modeling of the variational function. The least squares method was used for parameter estimation, and the product-sum model was used to fit the empirical spatiotemporal variogram. The resulting model achieved relatively stable spatiotemporal sill values within certain time lags and spatial distances. Detailed descriptions of the stationarity test of residuals and the process and results of constructing the theoretical spatiotemporal variogram are given in Supplementary Materials Sections S1 and S2.

Finally, the trend values estimated by the CNN model were combined with the residuals optimized through the spatiotemporal Kriging method to obtain the spatiotemporal distribution of monthly XCO2 for the study area from 2015 to 2020 at a resolution of 0.25° × 0.25°, as shown in Figure 3.

3.2. Evaluation of Model Performance

The accuracy of the CNN-STK model was first evaluated based on a completely random independent validation dataset consisting of 51,054 samples. The model exhibited satisfactory performance, achieving an R² of 0.936, an RMSE of 1.300 ppm, and an MAE of 0.946 ppm, as illustrated in Figure 4a. Then, to validate the efficacy of this model, we compared it against a single CNN model, an RF model, and an RF-STK model, all trained and evaluated using the same data sets (Figure 4b–d). The accuracies of the comparative models are inferior to that of the CNN-STK model, suggesting that the CNN-STK model is better at accurately reconstructing full-coverage XCO2 data. Meanwhile, both the CNN-STK and RF-STK models demonstrated higher accuracies compared to their respective non-STK counterparts, indicating that the spatiotemporal Kriging method effectively optimizes the error distribution by considering the spatiotemporal autocorrelation of residuals, thereby improving prediction accuracy. In particular, the RF-STK model demonstrates a more pronounced improvement in accuracy through the spatiotemporal Kriging method. The CNN model can probably comprehensively learn the spatial relationships and features from sample points and surrounding auxiliary data, while, in contrast, the RF model faces challenges in capturing spatial relationships between data points. Consequently, the spatiotemporal Kriging method exhibits a notable application advantage in addressing the residuals of the RF model.

Due to the latitudinal and seasonal variations in XCO2 data, we systematically categorized all data points based on latitudes and seasons, respectively. Then, the accuracy of each category was validated to assess the overall stability of the model. As illustrated in Table 2, the model exhibits higher accuracy in the low latitude region (I) compared to the mid-latitude regions (II–V), as evidenced by higher R² and lower RMSE and MAE. The accuracy in summer (VII) is slightly lower than that in other seasons, probably attributable to the hot and rainy climate during the summer in the study area, resulting in fewer valid data points acquired by satellites. Nevertheless, there is no strong fluctuation in the accuracies across different latitude zones and seasons, indicating the robustness of the CNN-STK model, with an average R² of 0.942 across latitudes and 0.931 across seasons.

3.3. Validation with Model Simulation

The CAMS EGG4 dataset provides globally continuous spatiotemporal XCO2 data obtained by data assimilation and model simulations. It serves as a valuable supplementary dataset for understanding atmospheric CO₂ concentrations in regions where precise ground-based measurements and satellite XCO2 observations are not available. According to the official validation report [41] (p. 6), CAMS XCO2 showed very good agreement across all sites (±1%) compared to TCCON, with a monthly mean difference of ±4 ppm. There is a seasonal pattern of deviations, with maximum deviations up to 10 ppm in summer.

Comparing the 2015–2020 XCO2 values reconstructed in this paper with the CAMS XCO2 data, it can be seen that there is a consistent temporal trend between the CNN-STK XCO2 and CAMS XCO2 (Figure 5). From 2015 to 2019, the two datasets show close agreement, with an average deviation of approximately 0.18 ppm. In contrast to the CAMS XCO2, the CNN-STK model underestimates in all summers but generally overestimates in the other seasons. After June 2019, the CNN-STK XCO2 estimates are consistently lower than CAMS XCO2, although the trend aligns with previous years. Furthermore, the results of Spearman’s rank correlation coefficient (ρ) between the two datasets reveal positive correlations in summer and negative correlations in autumn, though these correlations are both weak. The significant differences emerge from nonparametric tests simultaneously (Section S3 in the Supplementary Materials). We specifically discussed the uncertainty of the CNN-STK XCO2 dataset relative to CAMS-XCO2 in Section 4.2.1.

3.4. Validation with TCCON Measurements

To ensure the spatial consistency of ground-based data and XCO2 estimates, we defined circular geographical regions centered on TCCON sites with diameters of 1°, 3°, and 5° as the validation areas (Figure S5). The averages of XCO2 estimates for each period within different validation regions were calculated. Meanwhile, the mean values for each period of the TCCON sites were obtained by first calculating the daily means and then calculating the monthly mean values. The standard deviations of the measurements relative to the monthly means at Hefei and Xianghe sites are in the ranges of 0.26–2.64 ppm and 0.67–2.9 ppm, respectively. The comparison between the two datasets was conducted to validate the accuracy of the interpolated data, as illustrated in Figure 6.

The validation results of three geographic ranges of each site prove that the model-estimated XCO2 data is highly consistent with TCCON data. The validation accuracy of the Hefei site ranges from 0.961 to 0.968 for R², 0.75 to 0.833 ppm for RMSE, and 0.628 to 0.688 ppm for MAE, which is higher than that of the Xianghe site. Taking the 3° region as an example (Figure 6d), the biases between model-estimated XCO2 and TCCON XCO2 reveal a general underestimation at the Xianghe site. On the one hand, this discrepancy may be attributed to the close proximity of the Xianghe site to Beijing, China, which is affected strongly by human activities. On the other hand, the 3° and 5° regions of the Xianghe site include a portion of the maritime area for which corresponding XCO2 estimates are unavailable. As a result, the omission of XCO2 values over the marine airspace near the Xianghe site from the calculation of regional average concentrations may have contributed to the relatively large biases at the Xianghe site. In general, the experimental results fit well with the TCCON data, with an overall accuracy of about R² of 0.954, RMSE of 0.898 ppm, and MAE of 0.741 ppm.

4. Discussion

4.1. Spatial Inhomogeneity of Accuracy

Given the large spatial heterogeneity of urban structure, climatic conditions, vegetation, and other characteristics in the study area and the uneven distribution of satellite observation data across different seasons and regions, there may be corresponding differences in the accuracy of CNN-STK XCO2 data. We separately calculated the REs between the interpolations and observations for each season and plotted their spatial distributions (Figure 7) for specific evaluation.

In spring, autumn, and winter, a similar pattern of error distribution is observed, with higher error points clustering in southern China, particularly in the eastern margin of the Qinghai-Tibet Plateau (highlighted by the blue box in Figure 7a), which marks the transition zone between two climatic regions (Figure 7d). Lower error points are concentrated in the green box area, situated in temperate arid/semi-arid regions and the western part of the Qinghai-Tibet Plateau. Based on the climatic zoning map, it is evident that southern China, heavily influenced by monsoons, has significantly fewer effective XCO2 data points collected by carbon observation satellites compared to northern regions. Especially in the subtropical humid region of the middle and lower reaches of the Yangtze River Plain, noticeable data gaps occur in spring, autumn, and winter, greatly impacting data reconstruction accuracy in this area.

Summer exhibits a different distribution pattern. Figure 7b illustrates that high-error points are clustered in northeastern China, which boasts the largest natural forest area in China, accounting for approximately 37% of the national total forest area. Although summer is brief, characterized by abundant precipitation and high temperatures fostering vigorous vegetation activity, it serves as an important forest carbon sink region. Therefore, the data quality here is slightly poorer compared to other seasons, potentially influenced by vegetation photosynthesis-respiration dynamics. Additionally, apart from the northern marginal region of China, data scarcity prevails across most regions during this season, hindering the assessment of error value clustering.

We further conducted statistical analysis on the data accuracy within areas of high and low XCO2 aggregation for each season, as shown in Figure 8, where the hot spot indicates regions of high-value aggregation. In contrast, cold spots represent areas of low-value aggregation. Details of the XCO2 aggregation method and distribution maps are provided in the Supplementary Materials Section S4. Within a 95% confidence interval, all four seasons exhibit significant clustering characteristics (Table S6). Figure S6 shows distinct patterns of hot and cold spots across different seasons in China. Specifically, in spring, autumn, and winter, the spatial distribution of hot and cold spots is generally similar, with hot spots widely distributed in the central and eastern regions, as well as the northeastern region. In contrast, cold spots are mainly concentrated in the western regions. In contrast, during summer, hot spots are predominantly located in southern and western China, while cold spots are evident in the northeastern region.

The various clustering regions demonstrate relatively high accuracies across different seasons. The accuracies of cold spots are generally lower than those of hot spots and areas without significant clustering, with average R² of 0.91, 0.93, and 0.93, respectively. The lowest accuracy is observed in cold-spot regions during summer, with an R² of 0.89.

In general, the CNN-STK model we constructed can generate high-precision spatiotemporal continuous datasets based on sparse satellite data. Since data distribution from satellite collection during spring and autumn is relatively uniform, resulting in high-quality XCO2 reconstruction datasets. Moreover, the accuracy is higher in the northern regions compared to the southern regions. However, during summer, particularly in the northeastern region where low-value aggregation occurs, accuracy is lower, possibly due to significant disturbances from biosphere activities. Future research could consider vegetation characteristics and conduct separate modeling analyses for this region.

4.2. Uncertainty Analysis

Assessing the uncertainty of a dataset is essential to understanding its reliability and limitations, as well as aiding other researchers in comprehension and validation. Previous analyses have shown fairly significant seasonal characteristics in the distribution of XCO2 data and its errors. Therefore, in this section, we independently analyze the uncertainty of the CNN-STK XCO2 dataset for different seasons using both the CAMS XCO2 dataset and the open-source 1° Mapping-XCO2 product. Furthermore, the RF feature importance algorithm will be used to explore the key impact factors on XCO2 data.

4.2.1. Comparison with CAMS XCO2

We computed the mean and standard deviation of the absolute differences between CNN-STK XCO2 and CAMS XCO2 for each season to quantify the uncertainty of CNN-STK XCO2. The uncertainties for spring, summer, autumn, and winter are 0.54 (±0.46) ppm, 1.40 (±1.09) ppm, 0.63 (±0.55) ppm, and 0.61 (±0.51) ppm, respectively. Uncertainty is significantly higher in summer than in other seasons. The spatial distribution of uncertainties for each season is shown in Figure 9.

The CNN-STK XCO2 uncertainties calculated from CAMS XCO2 have significant differences in geographical distribution. Overestimation is more prevalent in western China than in the central and eastern regions. Statistically, CNN-STK XCO2 exhibits more positive deviations in spring and winter, accounting for approximately 66% and 81% of mainland China. During this period, negative deviations are primarily clustered in south-central China. Conversely, negative deviations dominate in summer and autumn, covering approximately 88% and 61%, with positive deviations concentrated in the western regions. Moreover, we highlighted several key regions with large uncertainties. Firstly, the Qinghai-Tibet Plateau region in southwest China (the magenta boxes in Figure 9a–c), which has a complex topography and climate, shows the maximum overestimations in spring, autumn, and winter. Secondly, the Sichuan Basin in the south-central region (the bright blue boxes in Figure 9a–c), which is characterized by topographic occlusion, experienced maximum underestimations during spring, autumn, and winter. Thirdly, the northeastern region exhibits significant negative anomalies during summer, consistent with the biosphere disturbance error mentioned in Section 4.1.

4.2.2. Comparison with Mapping-XCO2

We further assessed the possible uncertainties of the CNN-STK XCO2 dataset by comparing it with the existing open-source XCO2 product, Mapping-XCO2. Initially, the CNN-STK XCO2 data were upscaled to 1° × 1° using mean aggregation to unify the spatial resolution. Subsequently, the uncertainties between CNN-STK XCO2 and Mapping-XCO2 for each season were quantified using the same method. The results reveal that the uncertainties of CNN-STK XCO2 are 0.21 (±0.19) ppm of spring, 0.25 (±0.20) ppm of summer, 0.20 (±0.18) ppm of autumn, and 0.25 (±0.21) ppm of winter, which is in better agreement with Mapping-XCO2 than CAMS XCO2. No significant seasonal bias in the uncertainties was identified. The spatial distribution of uncertainties for each season is illustrated in Figure 10.

The uncertainty of CNN-STK XCO2 does not exhibit particularly distinct spatial clustering characteristics, but there is a general underestimation observed, encompassing approximately 62% of mainland China. From Figure 10, we note that the Qinghai-Tibet Plateau (the bright blue boxes in Figure 10a,c and the magenta box in Figure 10b) and the Sichuan Basin (the magenta box in Figure 10c,d) are still the focal areas of uncertainty. Hainan Province shows the largest overestimations in spring, probably because the upscaling of the CNN-STK XCO2 data enhances the uncertainties in this region.

The uncertainty analysis indicates that the CNN-STK XCO2 dataset is comparable to the model simulation dataset and the open-source dataset released by scholars. It is expected that datasets produced from different sources and methods could be integrated in future studies to obtain a more comprehensive and reliable XCO2 dataset.

4.2.3. Feature Importance Evaluation

Feature importance analysis elucidates the degree to which each variable impacts the dependent variable. Using the Random Forest feature importance algorithm, we systematically evaluated and ranked all features to find the impact factors of XCO2 and their weights, presenting the outcomes in Figure 11. Detailed definitions of each variable can be found in Tables S1 and S2 in the Supplementary Materials. A higher feature importance value indicates a greater influence on XCO2. Notably, the nighttime light data (NTL) emerges as the most influential variable, followed by CH₄ column-mean molar fraction (tcch4), total column ozone (tco3), total column water (tcw), and surface solar radiation (ssr), all significantly contributing to XCO2. To enhance the interpolation accuracy of the XCO2 dataset, we propose choosing the independent variables with higher accuracy and reliability, particularly those with substantial weight. Moreover, given China’s pronounced spatial heterogeneity, regional variations in optimal independent variables and modeling approaches should be considered. Zoning the study area based on topographic or climatic factors could thus refine the models and improve dataset accuracy.

4.3. Advantages and Limitations

In recent years, carbon observation satellite data has made significant contributions to the global-scale monitoring of atmospheric CO₂ concentrations, but its spatial and temporal discontinuities have become the main reason for restricting the wide application. In response to this situation, some scholars have explored different spatiotemporal reconstruction methods for XCO2 and obtained better accuracy results [20,27,42]. Compared with the machine learning models constructed by Wang [24], Wu [25], and He [26], with R² ranging from 0.61 to 0.91, the CNN model utilized in this study performs better. Deep learning models typically outperform machine learning models in tasks with large amounts of training samples and can automatically learn features without the need for feature engineering before modeling. This study has a considerable sample size and possesses the prerequisites for exploring the application of deep learning models. Moreover, we tried to integrate the CNN model with the spatiotemporal Kriging method, aiming to leverage the advantages of these two types of models, i.e., after getting the predictions from the CNN model, we further refined the model errors using the spatiotemporal Kriging method, which led to a more accurate interpolation result. To the best of our knowledge, this attempt to effectively integrate a deep learning model with a geostatistical method is new and has never been reported. Compared to single deep learning models constructed by Li [27] and Zhang [28] (accuracy validations based on TCCON are: R² = 0.87; RMSE = 0.90 ppm and MAE = 0.74 ppm, respectively), this study also has higher accuracy, which not only indicates that the model constructed in this paper is reliable, but also demonstrates that the strategy of optimizing the residuals using spatiotemporal Kriging method is effective. It is worth mentioning that all of their experiments introduced existing XCO2 datasets from model simulations as a priori knowledge or constraints in modeling, which may cause unnecessary biases. In contrast, we only used XCO2 satellite data for modeling and interpolation and then validated it with model simulations, which effectively avoids this problem and ensures the model results are independent. In addition, although the experiment used the strategy of randomly selecting 20% of the sample points as an independent validation dataset to evaluate the model accuracy, we used different randomly selected training and validation datasets several times in constructing the CNN and spatiotemporal Kriging models, and all of them obtained stable accuracy results.

However, the CNN-STK model proposed in this study also has some limitations. For instance, the density of valid data points significantly affects interpolation precision when using the spatiotemporal Kriging method. Future research aiming for finer-scale analysis should consider integrating more satellite data sources, such as GOSAT, OCO-3, TanSat, etc., to enhance data density and improve the reliability of geostatistical methods. Secondly, we directly used the cropping method for CNN model samples provided by the previous study, wherein the surrounding imagery of each satellite observation point was cropped into an independent sample. However, the appropriate sample size should be determined through corresponding comparative experiments. To be specific, larger samples may be able to provide richer features and thus improve the model accuracy, but it is also possible to affect the model accuracy because of the spatiotemporal heterogeneity.

Furthermore, current studies often use XCO2 as auxiliary data for modeling anthropogenic carbon emissions of interest or computing enhancement and anomaly indicators to characterize the spatial distribution of anthropogenic carbon emissions. Exploration of its potential practical significance remains insufficient. The XCO2 dataset constructed in this study exhibits spatial consistency with datasets of fossil fuel CO₂ emissions such as EDGAR and ODIAC (Figure 12). Hence, XCO2 data may possess a certain potential for characterizing patterns and trends of anthropogenic carbon emissions, yet current research inadequately explores this aspect (it is important to note that XCO2 data are not equivalent to CO₂ emissions). In future work, we will delve into in-depth spatiotemporal pattern recognition of XCO2 data.

5. Conclusions

This study introduces a novel approach to transforming discrete satellite observations into continuous spatiotemporal datasets. Specifically, we developed a CNN-STK model to leverage the strengths of deep learning and geostatistical method, reconstructing a monthly spatiotemporal XCO2 dataset of China at 0.25° grid-scale from 2015 to 2020 based on single-satellite retrievals. It provides a new workflow to obtain comprehensive, objective, and reliable regional estimates of atmospheric CO₂ concentrations, which is particularly beneficial for countries and regions lacking effective terrestrial CO₂ observations.

We conducted multiple experiments to assess the CNN-STK XCO2 dataset. Firstly, we evaluated the CNN-STK model using an independent validation dataset and comparative experiments, demonstrating its high precision with an R² of 0.936, an RMSE of 1.3 ppm, and an MAE of 0.946 ppm. Besides, the integrated approach proved effective. Secondly, we categorized interpolation results by latitude and season, revealing minimal fluctuations in precision, indicating its robustness. Finally, we compared the CNN-STK XCO2 dataset with model-simulated CAMS XCO2 dataset and TCCON ground-based observations. The experimental results show coherence between CNN-STK XCO2 and CAMS XCO2 trends, as well as the validation results against TCCON XCO2, demonstrate an R² of 0.954, RMSE of 0.898 ppm, and MAE of 0.741 ppm, underscoring the reliability of CNN-STK dataset.

We further extensively discussed the spatial distribution of the accuracy of the CNN-STK XCO2 dataset across different seasons and hot-spot/cold-spot regions, along with its uncertainties and influencing factors. The error distributions in spring, autumn, and winter exhibit similarities. The northern region of China has relatively more XCO2 observations with uniform distribution, resulting in generally higher accuracy compared to the southern region. High errors and uncertainties are concentrated in the Qinghai-Tibet Plateau, Sichuan Basin, and the northeastern region. Summer is the period of high uncertainty for the CNN-STK XCO2 dataset. Additionally, the accuracies of cold-spot regions are lower than that of hot-spot areas and regions with no significant clustering characteristics, indicating potential large biogenic disturbances in areas where XCO2 low values cluster, such as forest carbon sink absorption. Furthermore, we assessed that nighttime light data exerted the greatest influence on CNN-STK XCO2, followed by CH₄ column-mean molar fraction, total column ozone, total column water, and surface solar radiation.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs16132433/s1, Figure S1. The XCO2 data acquired by OCO-2 at daily, weekly, monthly, and yearly time scales; Figure S2. Histogram of residual frequency distribution of CNN model; Figure S3. The spatial empirical variogram function and temporal empirical variogram function of the residuals (red points), and their corresponding fitted variogram function models (blue lines); Figure S4. The empirical spatiotemporal variogram functions for residuals and its fitted variogram function model; Figure S5. Locations of Hefei site and Xianghe site (a). Circular geographical regions centered on each site, with diameters of 1°, 3°, and 5°. The land cover data was obtained from GlobeLand30 dataset (http://globallandcover.com/ (accessed on 9 November 2023)), and the blank area in the diagrams is the oceanic area (b); Figure S6. Spatial distribution of hot and cold spots of XCO2 in different seasons; Table S1. ERA5 variables collected in this study; Table S2. EGG4 variables collected in this study; Table S3. ADF test results; Table S4. The optimal parameters of the theoretical spatiotemporal variogram function model; Table S5. Non-parametric statistical test results for CNN-STK XCO2 and CAMS XCO2; Table S6. Global Moran’s I of different seasons. Table S6. Global Moran’s I of different seasons [43,44,45,46,47].

Author Contributions

Conceptualization, Y.H. and X.Z.; methodology, Y.H. and Q.S.; software, Y.H.; validation, W.S. and Q.S.; formal analysis, Y.H.; investigation, Y.H.; resources, X.Z.; data curation, Q.S.; writing—original draft preparation, Y.H.; writing—review and editing, X.Z. and W.S.; visualization, Q.S.; supervision, X.Z.; funding acquisition, X.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the key program of the National Natural Science Foundation of China, grant number 41930650; the general program of the National Natural Science Foundation of China, grant number 42371412; and the general program of the National Natural Science Foundation of China, grant number 42271435.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary Materials. The new dataset generated has been provided on Zenodo (DOI 10.5281/zenodo.10406012).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, Z.; Chau, S.N.; Chen, X.; Zhang, J.; Li, Y.; Dietz, T.; Wang, J.; Winkler, J.A.; Fan, F.; Huang, B.; et al. Assessing Progress towards Sustainable Development over Space and Time. Nature 2020, 577, 74–78. [Google Scholar] [CrossRef] [PubMed]
Vicedo-Cabrera, A.M.; Scovronick, N.; Sera, F.; Royé, D.; Schneider, R.; Tobias, A.; Astrom, C.; Guo, Y.; Honda, Y.; Hondula, D.M.; et al. The Burden of Heat-Related Mortality Attributable to Recent Human-Induced Climate Change. Nat. Clim. Change 2021, 11, 492–500. [Google Scholar] [CrossRef] [PubMed]
Shukla, P.R.; Skea, J.; Slade, R.; Fradera, R.; Pathak, M.; Khourdajie, A.A.; Belkacemi, M.; Diemen, R.; Hasija, A.; Lisboa, G.; et al. Climate Change 2022—Mitigation of Climate Change: Working Group III Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, 1st ed.; Intergovernmental Panel on Climate Change (IPCC), Ed.; Cambridge University Press: Cambridge, UK, 2023; ISBN 978-1-00-915792-6. [Google Scholar]
Gillett, N.P.; Kirchmeier-Young, M.; Ribes, A.; Shiogama, H.; Hegerl, G.C.; Knutti, R.; Gastineau, G.; John, J.G.; Li, L.; Nazarenko, L.; et al. Constraining Human Contributions to Observed Warming since the Pre-Industrial Period. Nat. Clim. Change 2021, 11, 207–212. [Google Scholar] [CrossRef]
Kazancoglu, Y.; Ozbiltekin-Pala, M.; Ozkan-Ozen, Y.D. Prediction and Evaluation of Greenhouse Gas Emissions for Sustainable Road Transport within Europe. Sustain. Cities Soc. 2021, 70, 102924. [Google Scholar] [CrossRef]
Wunch, D.; Wennberg, P.O.; Osterman, G.; Fisher, B.; Naylor, B.; Roehl, C.M.; O’Dell, C.; Mandrake, L.; Viatte, C.; Kiel, M.; et al. Comparisons of the Orbiting Carbon Observatory-2 (OCO-2) XCO₂ Measurements with TCCON. Atmos. Meas. Tech. 2017, 10, 2209–2238. [Google Scholar] [CrossRef]
De Mazière, M.; Thompson, A.M.; Kurylo, M.J.; Wild, J.D.; Bernhard, G.; Blumenstock, T.; Braathen, G.O.; Hannigan, J.W.; Lambert, J.-C.; Leblanc, T.; et al. The Network for the Detection of Atmospheric Composition Change (NDACC): History, Status and Perspectives. Atmos. Chem. Phys. 2018, 18, 4935–4964. [Google Scholar] [CrossRef]
Wu, C.-Y.; Zhang, X.-Y.; Guo, L.-F.; Zhong, J.-T.; Wang, D.-Y.; Miao, C.-H.; Gao, X.; Zhang, X.-L. An Inversion Model Based on GEOS-Chem for Estimating Global and China’s Terrestrial Carbon Fluxes in 2019. Adv. Clim. Change Res. 2023, 14, 49–61. [Google Scholar] [CrossRef]
Schuh, A.E.; Jacobson, A.R.; Basu, S.; Weir, B.; Baker, D.; Bowman, K.; Chevallier, F.; Crowell, S.; Davis, K.J.; Deng, F.; et al. Quantifying the Impact of Atmospheric Transport Uncertainty on CO₂ Surface Flux Estimates. Glob. Biogeochem. Cycles 2019, 33, 484–500. [Google Scholar] [CrossRef] [PubMed]
Xi, W.; Xingying, Z.; Liyang, Z.; Ling, G.; Lin, T. Interpreting Seasonal Changes of Low-Tropospheric CO₂ over China Based on SCIAMACHY Observations during 2003–2011. Atmos. Environ. 2015, 103, 180–187. [Google Scholar] [CrossRef]
Bie, N.; Lei, L.; Zeng, Z.; Cai, B.; Yang, S.; He, Z.; Wu, C.; Nassar, R. Regional Uncertainty of GOSAT XCO₂ Retrievals in China: Quantification and Attribution. Atmos. Meas. Tech. 2018, 11, 1251–1272. [Google Scholar] [CrossRef]
Zhang, L.L.; Yue, T.X.; Wilson, J.P.; Zhao, N.; Zhao, Y.P.; Du, Z.P.; Liu, Y. A Comparison of Satellite Observations with the XCO₂ Surface Obtained by Fusing TCCON Measurements and GEOS-Chem Model Outputs. Sci. Total Environ. 2017, 601–602, 1575–1590. [Google Scholar] [CrossRef]
Zhou, M.; Ni, Q.; Cai, Z.; Langerock, B.; Nan, W.; Yang, Y.; Che, K.; Yang, D.; Wang, T.; Liu, Y.; et al. CO₂ in Beijing and Xianghe Observed by Ground-Based FTIR Column Measurements and Validation to OCO-2/3 Satellite Observations. Remote Sens. 2022, 14, 3769. [Google Scholar] [CrossRef]
Falahatkar, S.; Mousavi, S.M.; Farajzadeh, M. Spatial and Temporal Distribution of Carbon Dioxide Gas Using GOSAT Data over IRAN. Environ. Monit. Assess. 2017, 189, 627. [Google Scholar] [CrossRef] [PubMed]
Bezyk, Y.; Sówka, I.; Górka, M.; Blachowski, J. GIS-Based Approach to Spatio-Temporal Interpolation of Atmospheric CO₂ Concentrations in Limited Monitoring Dataset. Atmosphere 2021, 12, 384. [Google Scholar] [CrossRef]
Tadić, J.M.; Qiu, X.; Miller, S.; Michalak, A.M. Spatio-Temporal Approach to Moving Window Block Kriging of Satellite Data v1.0. Geosci. Model Dev. 2017, 10, 709–720. [Google Scholar] [CrossRef]
Ma, X.; Zhang, H.; Han, G.; Mao, F.; Xu, H.; Shi, T.; Hu, H.; Sun, T.; Gong, W. A Regional Spatiotemporal Downscaling Method for CO₂ Columns. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8084–8093. [Google Scholar] [CrossRef]
He, Z.; Lei, L.; Zhang, Y.; Sheng, M.; Wu, C.; Li, L.; Zeng, Z.-C.; Welp, L.R. Spatio-Temporal Mapping of Multi-Satellite Observed Column Atmospheric CO₂ Using Precision-Weighted Kriging Method. Remote Sens. 2020, 12, 576. [Google Scholar] [CrossRef]
Zammit-Mangion, A.; Cressie, N.; Shumack, C. On Statistical Approaches to Generate Level 3 Products from Satellite Remote Sensing Retrievals. Remote Sens. 2018, 10, 155. [Google Scholar] [CrossRef]
Zeng, Z.-C.; Lei, L.; Strong, K.; Jones, D.B.A.; Guo, L.; Liu, M.; Deng, F.; Deutscher, N.M.; Dubey, M.K.; Griffith, D.W.T.; et al. Global Land Mapping of Satellite-Observed CO₂ Total Columns Using Spatio-Temporal Geostatistics. Int. J. Digit. Earth 2017, 10, 426–456. [Google Scholar] [CrossRef]
Van Zoest, V.; Osei, F.B.; Hoek, G.; Stein, A. Spatio-Temporal Regression Kriging for Modelling Urban NO₂ Concentrations. Int. J. Geogr. Inf. Sci. 2020, 34, 851–865. [Google Scholar] [CrossRef]
Gao, Z.; Jiang, Y.; He, J.; Wu, J. Spatiotemporal Variation Analysis of Global XCO₂ Concentration during 2010–2020 Based on DINEOF-BME Framework and Wavelet Function. Sci. Total Environ. 2023, 892, 164750. [Google Scholar] [CrossRef]
Liu, Y.; Yue, T.; Zhang, L.; Zhao, N.; Zhao, M.; Liu, Y. Simulation and Analysis of XCO₂ in North China Based on High Accuracy Surface Modeling. Environ. Sci. Pollut. Res. 2018, 25, 27378–27392. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; He, J.; Feng, H.; Jin, Z. High-Coverage Reconstruction of XCO₂ Using Multisource Satellite Remote Sensing Data in Beijing–Tianjin–Hebei Region. Int. J. Environ. Res. Public Health 2022, 19, 10853. [Google Scholar] [CrossRef] [PubMed]
Wu, C.; Ju, Y.; Yang, S.; Zhang, Z.; Chen, Y. Reconstructing Annual XCO₂ at a 1 km × 1 km Spatial Resolution across China from 2012 to 2019 Based on a Spatial CatBoost Method. Environ. Res. 2023, 236, 116866. [Google Scholar] [CrossRef]
He, S.; Yuan, Y.; Wang, Z.; Luo, L.; Zhang, Z.; Dong, H.; Zhang, C. Machine Learning Model-Based Estimation of XCO₂ with High Spatiotemporal Resolution in China. Atmosphere 2023, 14, 436. [Google Scholar] [CrossRef]
Li, T.; Wu, J.; Wang, T. Generating Daily High-Resolution and Full-Coverage XCO₂ across China from 2015 to 2020 Based on OCO-2 and CAMS Data. Sci. Total Environ. 2023, 893, 164921. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Liu, G. Mapping Contiguous XCO₂ by Machine Learning and Analyzing the Spatio-Temporal Variation in China from 2003 to 2019. Sci. Total Environ. 2023, 858, 159588. [Google Scholar] [CrossRef]
Liu, D.; Di, B.; Luo, Y.; Deng, X.; Zhang, H.; Yang, F.; Grieneisen, M.L.; Zhan, Y. Estimating Ground-Level CO Concentrations across China Based on the National Monitoring Network and MOPITT: Potentially Overlooked CO Hotspots in the Tibetan Plateau. Atmos. Chem. Phys. 2019, 19, 12413–12430. [Google Scholar] [CrossRef]
Zhan, Y.; Luo, Y.; Deng, X.; Zhang, K.; Zhang, M.; Grieneisen, M.L.; Di, B. Satellite-Based Estimates of Daily NO₂ Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model. Environ. Sci. Technol. 2018, 52, 4180–4189. [Google Scholar] [CrossRef]
Shao, Y.; Ma, Z.; Wang, J.; Bi, J. Estimating Daily Ground-Level PM2.5 in China with Random-Forest-Based Spatiotemporal Kriging. Sci. Total Environ. 2020, 740, 139761. [Google Scholar] [CrossRef] [PubMed]
Osterman, G.; O’Dell, C.; Eldering, A.; Fisher, B.; Crisp, D.; Cheng, C.; Frankenberg, C.; Lambert, A.; Gunson, M.; Mandrake, L.; et al. Orbiting Carbon Observatory-2 & 3 (OCO-2 & OCO-3) Data Product User’s Guide, Operational Level 2 Data Versions 10 and Lite File Version 10 and VEarly. 2020. Available online: https://docserver.gesdisc.eosdis.nasa.gov/public/project/OCO/OCO2_OCO3_B10_DUG.pdf (accessed on 18 March 2023).
Connor, B.; Bösch, H.; McDuffie, J.; Taylor, T.; Fu, D.; Frankenberg, C.; O’Dell, C.; Payne, V.H.; Gunson, M.; Pollock, R.; et al. Quantification of Uncertainties in OCO-2 Measurements of XCO₂ Simulations and Linear Error Analysis. Atmos. Meas. Tech. 2016, 9, 5227–5238. [Google Scholar] [CrossRef]
Jacobs, N.; Simpson, W.R.; Graham, K.A.; Holmes, C.; Hase, F.; Blumenstock, T.; Tu, Q.; Frey, M.; Dubey, M.K.; Parker, H.A.; et al. Spatial Distributions of XCO₂ Seasonal Cycle Amplitude and Phase over Northern High-Latitude Regions. Atmos. Chem. Phys. 2021, 21, 16661–16687. [Google Scholar] [CrossRef]
Yang, Y.; Zhou, M.; Langerock, B.; Sha, M.K.; Hermans, C.; Wang, T.; Ji, D.; Vigouroux, C.; Kumps, N.; Wang, G.; et al. New Ground-Based Fourier-Transform near-Infrared Solar Absorption Measurements of XCO₂, XCH₄ and XCO at Xianghe, China. Earth Syst. Sci. Data 2020, 12, 1679–1696. [Google Scholar] [CrossRef]
Zeng, Z.; Lei, L.; Hou, S.; Ru, F.; Guan, X.; Zhang, B. A Regional Gap-Filling Method Based on Spatiotemporal Variogram Model of CO₂ Columns. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3594–3603. [Google Scholar] [CrossRef]
Sun, X.-L.; Yang, Q.; Wang, H.-L.; Wu, Y.-J. Can Regression Determination, Nugget-to-Sill Ratio and Sampling Spacing Determine Relative Performance of Regression Kriging over Ordinary Kriging? CATENA 2019, 181, 104092. [Google Scholar] [CrossRef]
Luo, X. Spatiotemporal Stochastic Models for Earth Science and Engineering Applications. Ph.D. Thesis, McGill University, Montreal, QC, Canada, 1998. [Google Scholar]
Yang, J.; Hu, M. Filling the Missing Data Gaps of Daily MODIS AOD Using Spatiotemporal Interpolation. Sci. Total Environ. 2018, 633, 677–683. [Google Scholar] [CrossRef]
Hu, H.; Hu, Z.; Zhong, K.; Xu, J.; Zhang, F.; Zhao, Y.; Wu, P. Satellite-Based High-Resolution Mapping of Ground-Level PM2.5 Concentrations over East China Using a Spatiotemporal Regression Kriging Model. Sci. Total Environ. 2019, 672, 479–490. [Google Scholar] [CrossRef] [PubMed]
Ramonet, M.; Langerock, B.; Warneke, T.; Eskes, H.J. Validation Report of the CAMS Greenhouse Gas Global Re-Analysis, Years 2003–2020, Copernicus Atmosphere Monitoring Service (CAMS) Report. 2021. Available online: https://atmosphere.copernicus.eu/sites/default/files/2021-04/CAMS84_2018SC3_D5.1.2-2020.pdf (accessed on 26 June 2023).
Zhang, L.; Yue, T.; Wilson, J.; Wang, D.; Zhao, N.; Liu, Y.; Liu, D.; Du, Z.; Wang, Y.; Lin, C.; et al. Modelling of XCO₂ Surfaces Based on Flight Tests of TanSat Instruments. Sensors 2016, 16, 1818. [Google Scholar] [CrossRef]
Worden, K.; Iakovidis, I.; Cross, E.J. New Results for the ADF Statistic in Nonstationary Signal Analysis with a View towards Structural Health Monitoring. Mech. Syst. Signal Process. 2021, 146, 106979. [Google Scholar] [CrossRef]
Gianfreda, A.; Maranzano, P.; Parisio, L.; Pelagatti, M. Testing for Integration and Cointegration When Time Series Are Observed with Noise. Econ. Model. 2023, 125, 106352. [Google Scholar] [CrossRef]
Varouchakis, E.A.; Hristopulos, D.T. Comparison of Spatiotemporal Variogram Functions Based on a Sparse Dataset of Groundwater Level Variations. Spat. Stat. 2019, 34, 100245. [Google Scholar] [CrossRef]
Sukkuea, A.; Heednacram, A. Prediction on Spatial Elevation Using Improved Kriging Algorithms: An Application in Environmental Management. Expert Syst. Appl. 2022, 207, 117971. [Google Scholar] [CrossRef]
Getis, A.; Ord, J.K. The Analysis of Spatial Association by Use of Distance Statistics. Geogr. Anal. 1992, 24, 189–206. [Google Scholar] [CrossRef]

Figure 1. Flow chart of this study.

Figure 2. CNN structure.

Figure 3. Spatiotemporal maps of monthly XCO2 over China at 0.25° grid-scale from 2015 to 2020. (Each period of XCO2 data from January 2015 to December 2020 is listed sequentially in order from left to right within each row and top to bottom across rows.).

Figure 4. Accuracy validation results of CNN-STK model and comparison models. (a) CNN-STK model; (b) CNN model; (c) RF-STK model; (d) RF model.

Figure 5. Comparisons between CNN-STK XCO2 and CAMS XCO2.

Figure 6. Comparisons between CNN-STK XCO2 and TCCON XCO2. (a–c) are validation results of the average CNN-STK XCO2 within circular geographic regions centered on TCCON sites, with diameters of 1°, 3°, and 5°, respectively, compared against measurements from the Hefei site and Xianghe site of the TCCON network. (d) presents line plots of CNN-STK XCO2 and TCCON measurements within the 3° diameter geographic region, accompanied by a bar chart of their biases.

Figure 7. Spatial distributions of relative errors (REs) of XCO2 for each season. In (a,b): Blue boxes show the high RE areas, while green boxes show the low RE areas. In (d): ① Temperate semi-humid zone. ② Temperate semi-arid zone. ③ Temperate arid zone. ④ Warm temperate semi-humid zone. ⑤ Highland temperate semi-arid zone (Qinghai-Tibet Plateau). ⑥ Northern subtropical humid zone (the middle and lower reaches of Yangzi River Plain). ⑦ Marginal tropical humid zone.

Figure 8. The accuracies of XCO2 in different seasons and different clustering regions.

Figure 9. Spatial distributions of the uncertainty between CNN-STK XCO2 and CAMS XCO2 for each season. The maximum underestimations of CNN-STK XCO2 relative to CAMS XCO2 appear in bright blue boxes, while the maximum overestimations appear in magenta boxes.

Figure 10. Spatial distributions of the difference between CNN-STK XCO2 and Mapping-XCO2 for each season. The maximum underestimations of CNN-STK XCO2 relative to Mapping-XCO2 appear in bright blue boxes, while the maximum overestimations appear in magenta boxes.

Figure 11. Feature importance ranking for XCO2.

Figure 12. Spatial distributions of CNN-STK XCO2, ODIAC CO_2, and EDGAR CO₂ in 2015.

Table 1. Information of TCCON sites selected in this study.

Site	Latitude	Longitude	Start Date	End Date
Hefei	31.9°N	117.17°E	2015-11-02	2020-12-31
Xianghe	39.8°N	116.96°E	2018-06-14	2022-05-31

Table 2. Accuracy validation results of CNN-STK model by latitude and by season.

	Latitude (°N)	R²	RMSE	MAE
I	[Min *, 20]	0.963	0.903	0.670
II	(20, 30]	0.944	1.133	0.822
III	(30, 40]	0.932	1.295	0.953
IV	(40, 50]	0.937	1.303	0.964
V	[50, Max *]	0.934	1.487	1.101
	Season	R²	RMSE	MAE
VI	Spring (Mar, Apr, May)	0.939	1.129	0.824
VII	Summer (Jun, Jul, Aug)	0.920	1.540	1.169
VIII	Autumn (Sep, Oct, Nov)	0.932	1.235	0.911
IX	Winter (Dec, Jan, Feb)	0.933	1.170	0.845

* “Min” and “Max” refer to the minimum and maximum latitude of China.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hua, Y.; Zhao, X.; Sun, W.; Sun, Q. Satellite-Based Reconstruction of Atmospheric CO₂ Concentration over China Using a Hybrid CNN and Spatiotemporal Kriging Model. Remote Sens. 2024, 16, 2433. https://doi.org/10.3390/rs16132433

AMA Style

Hua Y, Zhao X, Sun W, Sun Q. Satellite-Based Reconstruction of Atmospheric CO₂ Concentration over China Using a Hybrid CNN and Spatiotemporal Kriging Model. Remote Sensing. 2024; 16(13):2433. https://doi.org/10.3390/rs16132433

Chicago/Turabian Style

Hua, Yiying, Xuesheng Zhao, Wenbin Sun, and Qiwen Sun. 2024. "Satellite-Based Reconstruction of Atmospheric CO₂ Concentration over China Using a Hybrid CNN and Spatiotemporal Kriging Model" Remote Sensing 16, no. 13: 2433. https://doi.org/10.3390/rs16132433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Satellite-Based Reconstruction of Atmospheric CO2 Concentration over China Using a Hybrid CNN and Spatiotemporal Kriging Model