Enhancing 3D Planetary Atmosphere Simulations with a Surrogate Radiative Transfer Model

Tara P. A. Tahseen,1 João M. Mendonça,2 Kai Hou Yip1 and Ingo P. Waldmann1
1Department of Physics and Astronomy, University College London, Gower Street, WC1E 6BT London, United Kingdom
2Department of Space Research and Space Technology, Technical University of Denmark, Elektrovej 328, 2800 Kgs. Lyngby, Denmark
E-mail: tara.tahseen.22@ucl.ac.uk (TPAT)
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract

This work introduces an approach to enhancing the computational efficiency of 3D atmospheric simulations by integrating a machine-learned surrogate model into the OASIS global circulation model (GCM). Traditional GCMs, which are based on repeatedly numerically integrating physical equations governing atmospheric processes across a series of time-steps, are time-intensive, leading to compromises in spatial and temporal resolution of simulations. This research improves upon this limitation, enabling higher resolution simulations within practical timeframes. Speeding up 3D simulations holds significant implications in multiple domains. Firstly, it facilitates the integration of 3D models into exoplanet inference pipelines, allowing for robust characterisation of exoplanets from a previously unseen wealth of data anticipated from JWST and post-JWST instruments. Secondly, acceleration of 3D models will enable higher resolution atmospheric simulations of Earth and Solar System planets, enabling more detailed insights into their atmospheric physics and chemistry. Our method replaces the radiative transfer module in OASIS with a recurrent neural network-based model trained on simulation inputs and outputs. Radiative transfer is typically one of the slowest components of a GCM, thus providing the largest scope for overall model speed-up. The surrogate model was trained and tested on the specific test case of the Venusian atmosphere, to benchmark the utility of this approach in the case of non-terrestrial atmospheres. This approach yields promising results, with the surrogate-integrated GCM demonstrating above 99.0% accuracy and 101 factor GPU speed-up of the entire simulation compared to using the matched original GCM under Venus-like conditions.

keywords:
planets and satellites: atmospheres – radiative transfer
pubyear: 2024pagerange: Enhancing 3D Planetary Atmosphere Simulations with a Surrogate Radiative Transfer ModelD

1 Introduction

3D atmospheric models, commonly known as global circulation models (GCMs) are key tools for studying the climates of solar system planets including the Earth, and are becoming increasingly important in the characterisation of exoplanet atmospheres. GCMs consist of multiple components which individually model different atmospheric processes: each component numerically solves equations governing an atmospheric process across elements of a grid and across many time-steps until simulation convergence criteria are reached. Due to the large number of numerical integrations involved in GCM simulations, producing a simulated atmospheric state for a given input set of planetary and stellar parameters is incredibly computationally expensive and time-consuming.

GCMs are applied in exoplanet science in the forward modelling component of Bayesian atmospheric retrieval pipelines. To retrieve posterior distributions on planetary and stellar parameters from a single transit spectrum, the Bayesian retrieval framework requires of the order of tens of thousands of forward simulations corresponding to different samples of input parameter space. With current state-of-the-art 3D modelling techniques, the compute resources and time needed to produce the required number of 3D simulations prohibits statistically-rigorous inference of data from the James Webb Space Telescope (JWST, Gardner et al., 2006) and further next-generation observational instruments yet to come, namely the Ariel Space Telescope (Tinetti et al., 2021). Acceleration without compromising the accuracy of GCMs is thus one clear method of facilitating inference using JWST and post-JWST data in exoplanet science.

Reducing GCM simulation time would also incur benefits in climate science of Earth and other solar system planets. Increased speed of computation would enable simulations to be run at greater resolution, and/or with more physics included, thus enabling more realistic climate simulations to be achieved.

The past few years have yielded work in both exoplanet science and Earth climate science to accelerate 3D models. Much of this work in exoplanet science involves extending 1D models with extra parameters to reflect certain 3D atmospheric variations deemed necessary to account for 3D effects in phase-curve data (Changeat & Al-Refaie, 2020; Irwin et al., 2020; Chubb & Min, 2022; Himes et al., 2023; Nixon & Madhusudhan, 2022; Feng et al., 2020). This approach is prone to introducing biases to the simulated atmospheric states produced, and a fully 3D model parametrisation is essential to ensuring that simulations are robust against both known and unknown biases resulting from model oversimplifications Pluriel et al. (2020). Earth climate science, benefiting from a wealth of high-resolution observational measurements, has had a different set of methods employed, namely machine-learned surrogate models trained on such observations (Yao et al., 2023; Ukkonen, 2022). Surrogate models have not been broadly explored outside of Earth climate science, but demonstrate the potential of speed-up without requiring an oversimplified model parametrisation (Yao et al., 2023; Ukkonen, 2022).

Of the components within GCMs, radiative transfer is often the slowest or least-resolved process. In the OASIS GCM (Mendonca et al., 2014), the radiative transfer component contributes to 60-70% of the total simulation runtime for massive and complex atmospheres such as Venus (Mendonça & Buchhave, 2020). There is thus key scope to substantially improve GCM computational efficiency by targeting the radiative transfer component specifically.

To assess the effectiveness of our new approach, we are testing it on Venus’ atmospheric conditions. Simulating the Venus atmosphere in 3D is very computationally intensive due to the need for a complex radiative transfer model to accurately represent the energy balance in the atmosphere (e.g. Eymet et al., 2009; Lee & Richardson, 2012; Mendonca et al., 2015). Venus has a substantial amount of CO2, which generates a strong greenhouse effect (Sagan, 1962) and is covered by highly reflective sulfuric acid clouds, obscuring the planet’s surface. Only about 2.5%percent\%% of the incoming solar radiation reaches the surface (e.g. Tomasko et al., 1980; Mendonca et al., 2015). The radiation model also requires fine spectral resolution to capture the spectral windows impacting energy exchange between the deep atmosphere and the upper layers above the clouds. Additionally, due to the high thermal inertia of the massive CO2 atmosphere, the models need to be integrated over a long period to reach a statistically steady state (Mendonça & Read, 2016). Using a surrogate model to represent the radiative transfer is key to enhancing the performance of 3D simulations whilst enabling a more realistic depiction of radiative processes at minimal cost. Therefore, Venus presents a significant challenge and serves as a benchmark for the complexities our new modelling approach may encounter in future applications.

2 Data & Codes

2.1 OASIS

OASIS is a planetary climate model composed of different coupled modules representing physical and chemical processes within planetary atmospheres (Mendonça & Buchhave, 2020). The mathematics and assumptions of the radiative transfer component of OASIS are detailed in Mendonca et al. (2015).

The equations in OASIS are discretized over concentric icosahedral spatial grids (Mendonça et al., 2016; Deitrick et al., 2020). For the 3D simulations in this study, we set the grid to approximately 2 degrees in the horizontal and 49 vertical layers. This grid configuration results in a total of 10242 columns covering the entire model domain. Our simulations started from the converged state obtained in Mendonça & Buchhave (2020) and were integrated for 5 Venus solar days (approximately 117 Earth days) using a time step of 15 seconds. The model configuration is similar to the simulations in Mendonça & Buchhave (2020), and in the following sections, we describe the main physical modules relevant to this work.

2.1.1 OASIS-RT

The radiative transfer module of OASIS (henceforth referred to as OASIS-RT) models the interaction of radiation with gas and cloud species in the atmosphere in a two-stream manner. Radiation from two different sources are modelled separately: these are solar radiation (0.15.5μm0.15.5𝜇m0.1-5.5\ \mu\text{m}0.1 - 5.5 italic_μ m) and thermal radiation (1.7260μm1.7260𝜇m1.7-260\ \mu\text{m}1.7 - 260 italic_μ m). The solar radiation code utilises the δ𝛿\deltaitalic_δ-Eddington approximation (Joseph et al., 1976) in combination with an adding-layer method (Liu & Weng, 2006; Mendonca et al., 2015), whilst the thermal code considers absorptivity and emissivity (Mendonca et al., 2015). The radiation scheme uses the k𝑘kitalic_k-distribution method to represent the gas absorption cross sections, which are integrated over 353 spectral bands and 20 Gaussian points (Mendonça & Buchhave, 2020).

In the OASIS code used for this project, the spatial distribution and radiative properties of the clouds, which are composed of sulphuric acid and water, were taken from Crisp (1986) and Mendonca et al. (2015). The main cloud deck is located between roughly 4565km4565km45-65\,\text{km}45 - 65 km altitude with layers of sub-micron particles below and above (Knollenberg & Hunten, 1980). More details on the cloud properties can be found in Mendonca et al. (2015).

The radiative transfer code involves two steps of computation for each timestep; these are:

  1. 1.

    Computing the gas optics: computing the optical properties of layer boundaries from the thermodynamic variables.

  2. 2.

    Computing the flow of radiation: computing the upward and downward-welling fluxes at each layer boundary, from the incident flux at the top of the column in combination with the optical properties at each layer boundary.

The model uses a k𝑘kitalic_k-distribution table (Lacis & Oinas, 1991) to calculate the optical properties of each layer across the wavelength bands using pre-computed wavelength-dependent absorption coefficients for CO2subscriptCO2\text{CO}_{2}CO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, SO2subscriptSO2\text{SO}_{2}SO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and H2OsubscriptH2O\text{H}_{2}\text{O}H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT O. These coefficients are combined with a continuum absorption, mostly from CO2subscriptCO2\text{CO}_{2}CO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTCO2subscriptCO2\text{CO}_{2}CO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT collisions, and Rayleigh scattering from CO2subscriptCO2\text{CO}_{2}CO start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and N2subscriptN2\text{N}_{2}N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (Mendonca et al., 2015).

The radiative transfer model takes inputs of the density ρ𝜌\rhoitalic_ρ, pressure p𝑝pitalic_p, temperature T𝑇Titalic_T and chemical composition per grid element as outputted from the dynamical core THOR111The model component that governs the resolved 3D fluid flow evolution. (Mendonça et al., 2016). Heating rates are then calculated per grid element from fluxes outputted from the radiative transfer model, and these heating rates update the temperature profile of the atmosphere, which serves as input back into the dynamical core. Flux is used to compute the heating rate per layer according to equation 1.

dTdt=1ρcpdFnetdz𝑑𝑇𝑑𝑡1𝜌subscript𝑐𝑝𝑑superscript𝐹𝑛𝑒𝑡𝑑𝑧\frac{dT}{dt}=\frac{1}{\rho c_{p}}\frac{dF^{net}}{dz}divide start_ARG italic_d italic_T end_ARG start_ARG italic_d italic_t end_ARG = divide start_ARG 1 end_ARG start_ARG italic_ρ italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG divide start_ARG italic_d italic_F start_POSTSUPERSCRIPT italic_n italic_e italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_z end_ARG (1)

where dFnet𝑑superscript𝐹𝑛𝑒𝑡dF^{net}italic_d italic_F start_POSTSUPERSCRIPT italic_n italic_e italic_t end_POSTSUPERSCRIPT is the spectral-integrated net radiative flux (Wm2Wsuperscriptm2\text{W}\,\text{m}^{-2}W m start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT), ρ𝜌\rhoitalic_ρ is the atmospheric density (kgm3kgsuperscriptm3\text{kg}\,\text{m}^{-3}kg m start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT), and cpsubscript𝑐𝑝c_{p}italic_c start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the specific heat capacity at constant pressure (900 Jkg1K1Jsuperscriptkg1superscriptK1\text{J}\,\text{kg}^{-1}\,\text{K}^{-1}J kg start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT K start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT).

In order to improve the computational efficiency of 3D simulations with radiative transfer, Venus GCMs traditionally do not update the radiative fluxes from the solar and thermal schemes at every time step (Lebonnois et al., 2016; Mendonça & Read, 2016; Mendonça & Buchhave, 2020). In the 3D simulations with explicit radiative transfer used in this study, the fluxes calculated from the solar radiation scheme were updated every 2880 steps, and the thermal radiation fluxes were updated every 320 steps. These values are adjusted by the model user and are specific to Venus’s simulations. Larger values could potentially cause model instabilities. Although this approach introduces some inaccuracies in the heating/cooling rates, the robustness of the simulation is not compromised because the composition of the atmosphere and clouds remains constant over time. With the efficiency of the new surrogate model described in the next section, we are now able to update radiative fluxes at every time step. To enhance the stability of the model with the new surrogate model, we also apply the three-step Adams-Bashforth method to the heating rate calculated from the surrogate radiative fluxes.

2.2 Data

The data used for this project is 3D data simulated by OASIS corresponding to input parameters of Venus (Mendonça & Buchhave, 2020). The data consists of over 1,000 snapshots, each taken 12 hours apart, covering a period of 500 Earth days. The data covers the entire icosahedral grid with dimensions 10,242 columns ×\times× 49 layers.

At the sample level (per atmospheric column), the data comprises the following quantities: pressure p𝑝pitalic_p, temperature T𝑇Titalic_T, and gas density ρ𝜌\rhoitalic_ρ, all defined per layer; upwelling short-wave flux FSW,superscriptFSW\text{F}^{\text{SW},\uparrow}F start_POSTSUPERSCRIPT SW , ↑ end_POSTSUPERSCRIPT, downwelling short-wave flux FSW,superscriptFSW\text{F}^{\text{SW},\downarrow}F start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT, upwelling long-wave flux FLW,superscriptFLW\text{F}^{\text{LW},\uparrow}F start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT, and downwelling long-wave flux FLW,superscriptFLW\text{F}^{\text{LW},\downarrow}F start_POSTSUPERSCRIPT LW , ↓ end_POSTSUPERSCRIPT, all defined per layer boundary; and cosine of the solar zenith angle μ𝜇\muitalic_μ, short-wave surface albedo αSWsubscript𝛼SW\alpha_{\text{SW}}italic_α start_POSTSUBSCRIPT SW end_POSTSUBSCRIPT, long-wave surface albedo αSWsubscript𝛼SW\alpha_{\text{SW}}italic_α start_POSTSUBSCRIPT SW end_POSTSUBSCRIPT, and surface temperature T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, all defined per column.

Figures illustrating the distribution of samples within the dataset as a function of cosμ𝜇\cos\muroman_cos italic_μ and surface temperature are contained within appendix section C.

3 Methods

3.1 Surrogate Modelling

Surrogate models, or emulators, are approximate mathematical models which model outcomes of interest, whereby the emulator mechanism does not necessarily reflect the physical mechanism which produces the outcomes. Surrogate models are useful in cases where the physical mechanism producing such outcomes is not well-understood, or in cases where modelling the physical mechanism is excessively computational demanding; in the case of the latter, the aim is to produce a surrogate model which is computationally efficient compared to the physical model whilst maintaining accuracy. Machine learning provides a framework by which surrogate models can be produced: deep neural networks adhere to the Universal Approximation Theorem, and can (in theory) model any arbitrarily complex non-linear relationship, thus providing a function space whereby a suitable surrogate function is almost certain to exist (Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016).

Research into surrogate modelling of exoplanetary atmospheres is relatively nascent (Himes et al., 2022, 2023; Unlu et al., 2023). The use of surrogate models within Earth Climate Science, though still new, is much more established and well-explored (Yao et al., 2023; Mukkavilli et al., 2023; Ukkonen, 2022). The development of surrogate models for atmospheric models of exoplanets can thus be informed by the development of such surrogate models for the parameter space of Earth.

An example where surrogate modelling has already proved valuable within 3D modelling of exoplanetary atmospheres is work by Schneider et al. (2024), who utilised DeepSets to achieve fast and accurate mixing of correlated-k𝑘kitalic_k opacities. Their work exemplifies how machine learning can be leveraged to successfully speed up a single process of a GCM whilst retaining accuracy, thus establishing a basis for further exploration into the integration of surrogates within GCM frameworks.

In this work, two surrogate models were produced for 3D modelling of Venus using OASIS: one surrogate model to emulate the short-wave radiative transfer schema, and one to emulate the long-wave radiative transfer schema (see 2.1.1 for details on the computations for the two radiation schemas).

3.2 Data Preprocessing

Data were preprocessed separately for the long-wave and short-wave regimes. Each model took input of two types of data: variables defined at the grid-element level, referred to as vector variables of dimension (ncolumns,)=(49,)(n_{\text{columns}},)=(49,)( italic_n start_POSTSUBSCRIPT columns end_POSTSUBSCRIPT , ) = ( 49 , ), and variables defined at the column-level, referred to as scalar variables.

The short-wave surrogate model took scalar inputs of: surface temperature, T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; gas density of the lowest-altitude layer, ρSsubscript𝜌𝑆\rho_{S}italic_ρ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT; pressure of the lowest-altitude layer, pSsubscript𝑝𝑆p_{S}italic_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT; cosine of the solar zenith angle μ𝜇\muitalic_μ; and short-wave surface albedo αSWsubscript𝛼𝑆𝑊\alpha_{SW}italic_α start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT. The long-wave surrogate model took scalar inputs of: surface temperature, T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; gas density of the lowest-altitude layer, ρSsubscript𝜌𝑆\rho_{S}italic_ρ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT; pressure of the lowest-altitude layer, pSsubscript𝑝𝑆p_{S}italic_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT; and long-wave surface albedo αLWsubscript𝛼𝐿𝑊\alpha_{LW}italic_α start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT.

Both models took input of the same vector variables, which were as follows: temperature, T𝑇Titalic_T; pressure p𝑝pitalic_p; and gas density ρ𝜌\rhoitalic_ρ. Vector variables were scaled as:

xi,j=loge(xi,j)loge(xi,0)subscript𝑥𝑖𝑗subscript𝑒subscript𝑥𝑖𝑗subscript𝑒subscript𝑥𝑖0x_{i,j}=\frac{\log_{e}(x_{i,j})}{\log_{e}(x_{i,0})}$$italic_x start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = divide start_ARG roman_log start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG roman_log start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT ) end_ARG (2)

for x{T,p}𝑥𝑇𝑝x\in\{T,p\}italic_x ∈ { italic_T , italic_p }, and

xi,j=(xi,jxi,0)0.25subscript𝑥𝑖𝑗superscriptsubscript𝑥𝑖𝑗subscript𝑥𝑖00.25x_{i,j}=\left(\frac{x_{i,j}}{x_{i,0}}\right)^{0.25}italic_x start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = ( divide start_ARG italic_x start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 0.25 end_POSTSUPERSCRIPT (3)

for x=ρ𝑥𝜌x=\rhoitalic_x = italic_ρ, for the i𝑖iitalic_ith column and j𝑗jitalic_jth atmospheric level.

Scalar variables were then re-scaled as

xiscaled=ximiniStrainximaxiStrainximiniStrainxisuperscriptsubscript𝑥𝑖scaledsubscript𝑥𝑖subscript𝑖subscript𝑆trainsubscript𝑥𝑖subscript𝑖subscript𝑆trainsubscript𝑥𝑖subscript𝑖subscript𝑆trainsubscript𝑥𝑖x_{i}^{\text{scaled}}=\frac{x_{i}-\min\limits_{i\in S_{\text{train}}}{x_{i}}}{% \max\limits_{i\in S_{\text{train}}}{x_{i}}-\min\limits_{i\in S_{\text{train}}}% {x_{i}}}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT scaled end_POSTSUPERSCRIPT = divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_min start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT train end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT train end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_min start_POSTSUBSCRIPT italic_i ∈ italic_S start_POSTSUBSCRIPT train end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG (4)

for x{T0,p0,ρ0}𝑥subscript𝑇0subscript𝑝0subscript𝜌0x\in\{T_{0},p_{0},\rho_{0}\}italic_x ∈ { italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } where and Strainsubscript𝑆trainS_{\text{train}}italic_S start_POSTSUBSCRIPT train end_POSTSUBSCRIPT is the training set.

Targets were scaled as follows:

ujLW=yjLWALWsubscriptsuperscript𝑢LW𝑗subscriptsuperscript𝑦LW𝑗superscript𝐴LWu^{\text{LW}}_{j}=\frac{y^{\text{LW}}_{j}}{A^{\text{LW}}}italic_u start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG italic_y start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT end_ARG (5)

for yj{FjLW,,FjLW,}subscript𝑦𝑗superscriptsubscript𝐹𝑗LWsuperscriptsubscript𝐹𝑗LWy_{j}\in\{F_{j}^{\text{LW},\uparrow},F_{j}^{\text{LW},\downarrow}\}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT , italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT LW , ↓ end_POSTSUPERSCRIPT } and

ujSW=yjSWBSWsubscriptsuperscript𝑢SW𝑗superscriptsubscript𝑦𝑗SWsuperscript𝐵SWu^{\text{SW}}_{j}=\frac{y_{j}^{\text{SW}}}{B^{\text{SW}}}italic_u start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT end_ARG start_ARG italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT end_ARG (6)

for yj{FjSW,,FjSW,}subscript𝑦𝑗superscriptsubscript𝐹𝑗SWsuperscriptsubscript𝐹𝑗SWy_{j}\in\{F_{j}^{\text{SW},\uparrow},F_{j}^{\text{SW},\downarrow}\}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT SW , ↑ end_POSTSUPERSCRIPT , italic_F start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT }, for the j𝑗jitalic_jth altitude level (j[0,49]𝑗049j\in[0,49]italic_j ∈ [ 0 , 49 ] where 0 indexes the ground level and 49 indexes the top level of the atmospheric column), where ALWsuperscript𝐴LWA^{\text{LW}}italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT and BSWsuperscript𝐵SWB^{\text{SW}}italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT are scaling factors linear in T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and μ𝜇\muitalic_μ respectively, and fitted from the data:

ALW(T0)=a1T0+a2superscript𝐴LWsubscript𝑇0subscript𝑎1subscript𝑇0subscript𝑎2A^{\text{LW}}(T_{0})=a_{1}\cdot T_{0}+a_{2}italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (7)
BSW(μ)=b1μ+b2superscript𝐵SW𝜇subscript𝑏1𝜇subscript𝑏2B^{\text{SW}}(\mu)=b_{1}\cdot\mu+b_{2}italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT ( italic_μ ) = italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_μ + italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (8)

where (a1,a2)subscript𝑎1subscript𝑎2(a_{1},a_{2})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) are constants fitted using y0LW,subscriptsuperscript𝑦LW0y^{\text{LW},\uparrow}_{0}italic_y start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and (b1,b2)subscript𝑏1subscript𝑏2(b_{1},b_{2})( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) are fitted using y49SW,subscriptsuperscript𝑦SW49y^{\text{SW},\downarrow}_{49}italic_y start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 49 end_POSTSUBSCRIPT across all columns of the training set Strainsubscript𝑆trainS_{\text{train}}italic_S start_POSTSUBSCRIPT train end_POSTSUBSCRIPT. Values of (a1,a2,b1,b2)subscript𝑎1subscript𝑎2subscript𝑏1subscript𝑏2(a_{1},a_{2},b_{1},b_{2})( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) are retained for model prediction post-processing. Residuals between the targets y0LW,subscriptsuperscript𝑦LW0y^{\text{LW},\uparrow}_{0}italic_y start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, y49SW,subscriptsuperscript𝑦SW49y^{\text{SW},\downarrow}_{49}italic_y start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 49 end_POSTSUBSCRIPT, and the respective scaling factors approximating the value of these targets, are displayed in figure 1. These simple linear scaling methods were chosen for data pre-processing in this work instead of using exact computations of y0LW,subscriptsuperscript𝑦LW0y^{\text{LW},\uparrow}_{0}italic_y start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and y49SW,subscriptsuperscript𝑦SW49y^{\text{SW},\downarrow}_{49}italic_y start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 49 end_POSTSUBSCRIPT as the latter computations are much more involved, and the more simple computations produce results with an acceptably small marginal difference in the values of the fluxes.

Columns across all epochs were shuffled and split into train, test and validation datasets, in the ratio 70:15:15:7015:1570:15:1570 : 15 : 15.

Refer to caption
Figure 1: The figures above display scatter plots illustrating the residuals between scaling factors ALW(T0)superscript𝐴LWsubscript𝑇0A^{\text{LW}}(T_{0})italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and BSW(μ)superscript𝐵SW𝜇B^{\text{SW}}(\mu)italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT ( italic_μ ) and their respective targets y0LW,subscriptsuperscript𝑦LW0y^{\text{LW},\uparrow}_{0}italic_y start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and y49SW,subscriptsuperscript𝑦SW49y^{\text{SW},\downarrow}_{49}italic_y start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 49 end_POSTSUBSCRIPT, plotted over one epoch of 10,242 test columns covering the entire icosahedral grid. Left: This panel shows the residuals between the scaling factors BSW(μi)superscript𝐵SWsubscript𝜇𝑖B^{\text{SW}}(\mu_{i})italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) used for the short-wave targets of a given column i𝑖iitalic_i and the downward-welling short-wave flux at the top of the column Fi,49SW,subscriptsuperscriptFSW𝑖49\text{F}^{\text{SW},\downarrow}_{i,49}F start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 49 end_POSTSUBSCRIPT. BSW(μi)superscript𝐵SWsubscript𝜇𝑖B^{\text{SW}}(\mu_{i})italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), a linear function of the cosine of the solar zenith angle μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, approximates Fi,49SW,subscriptsuperscriptFSW𝑖49\text{F}^{\text{SW},\downarrow}_{i,49}F start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 49 end_POSTSUBSCRIPT. Right: This panel displays the residuals between the scaling factors ALW(Ti,0)superscript𝐴LWsubscript𝑇𝑖0A^{\text{LW}}(T_{i,0})italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT ) used for the long-wave targets of a given column i𝑖iitalic_i and the upward-welling long-wave flux at the ground level Fi,0LW,subscriptsuperscriptFLW𝑖0\text{F}^{\text{LW},\uparrow}_{i,0}F start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT. Here, ALW(Ti,0)superscript𝐴LWsubscript𝑇𝑖0A^{\text{LW}}(T_{i,0})italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT ) approximates Fi,0LW,subscriptsuperscriptFLW𝑖0\text{F}^{\text{LW},\uparrow}_{i,0}F start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT as a linear function of surface temperature Ti,0subscript𝑇𝑖0T_{i,0}italic_T start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT. Both: In both the long-wave and short-wave cases, the preferred quantities Fi,0LW,subscriptsuperscriptFLW𝑖0\text{F}^{\text{LW},\uparrow}_{i,0}F start_POSTSUPERSCRIPT LW , ↑ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT and Fi,49SW,subscriptsuperscriptFSW𝑖49\text{F}^{\text{SW},\downarrow}_{i,49}F start_POSTSUPERSCRIPT SW , ↓ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 49 end_POSTSUBSCRIPT to use for scaling flux profiles across atmospheric levels, involve complex calculations. The figures demonstrate that simple linear functions ALW(T0)superscript𝐴LWsubscript𝑇0A^{\text{LW}}(T_{0})italic_A start_POSTSUPERSCRIPT LW end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and BSW(μ)superscript𝐵SW𝜇B^{\text{SW}}(\mu)italic_B start_POSTSUPERSCRIPT SW end_POSTSUPERSCRIPT ( italic_μ ) yield close approximations with low residuals, making them suitable scaling factors instead.

3.3 Model Architecture

Model architecture was chosen to be based on recurrent neural networks (RNNs), as RNNs structurally incorporate the spatial dependence of the training data, which fits naturally in this scenario. RNN layers were implemented in the form of gated recurrent units (GRUs). The specific architecture of the benchmark was chosen to be that used by Ukkonen (2022) (illustrated in figure 2), which utilised a bi-directional RNN-based architecture to create a two-stream radiative transfer emulator for Earth, trained using observational data. Ukkonen’s model performed with 0.5%absentpercent0.5\leq 0.5\%≤ 0.5 % mean absolute error for the upwelling and downwelling fluxes on the test-set (Ukkonen, 2022), suggesting its potential efficacy for developing surrogates trained on analogous simulated data. The models used in this work were constructed and trained using TensorFlow version 2.12.0 (Abadi et al., 2016), and converted into ONNX format for integration within OASIS. Input data to the surrogate model is detailed in the table 1, and surrogate model parameters are detailed in section A.

Refer to caption
Figure 2: Figure illustrating the architecture of the learned surrogate models used in this work. Surrogate models for both the long-wave and short-wave schemas took scaled vector inputs Xj=(p~j,T~j,ρ~j)subscript𝑋𝑗subscript~𝑝𝑗subscript~𝑇𝑗subscript~𝜌𝑗\vec{X}_{j}=(\tilde{p}_{j},\tilde{T}_{j},\tilde{\rho}_{j})over→ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for the j𝑗jitalic_jth layer where p~j,T~j,ρ~jsubscript~𝑝𝑗subscript~𝑇𝑗subscript~𝜌𝑗\tilde{p}_{j},\tilde{T}_{j},\tilde{\rho}_{j}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are scaled pressure, scaled temperature and scaled density of the j𝑗jitalic_jth layer of the input atmospheric column, respectively (see 3.2 for details on scaling). The surrogate models took input of scalar inputs β𝛽\vec{\beta}over→ start_ARG italic_β end_ARG, where βSW=(μ,p0,T0,ρ0,αSW)subscript𝛽𝑆𝑊𝜇subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝑆𝑊\vec{\beta}_{SW}=(\mu,p_{0},T_{0},\rho_{0},\alpha_{SW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT = ( italic_μ , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT ) for the short-wave surrogate model, and βLW=(p0,T0,ρ0,αLW)subscript𝛽𝐿𝑊subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝐿𝑊\vec{\beta}_{LW}=(p_{0},T_{0},\rho_{0},\alpha_{LW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT = ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT ) for the long-wave surrogate model, where μ𝜇\muitalic_μ and α𝛼\alphaitalic_α denote the cosine of the solar zenith angle and surface albedo, respectively. Vector inputs are passed through a gated recurrent unit (GRU) layer starting from the top atmospheric layer (layer 48) and moving downwards to the ground layer (layer 0). Scalar inputs are concatenated with the output of the downwards GRU, and inputted into a dense layer, which then serves as the first input to an upward-moving GRU layer. The outputs of the two GRU layers are then concatenated and passed through a dense layer, which produces the model outputs of shape (nlevels,noutputs)=(50,2)subscript𝑛levelssubscript𝑛outputs502(n_{\text{levels}},n_{\text{outputs}})=(50,2)( italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT outputs end_POSTSUBSCRIPT ) = ( 50 , 2 ), corresponding to scaled upwards and downwards flux for each atmospheric level. ReLU (Agarap, 2019) activation functions were used for both GRU layers and Dense1subscriptDense1\text{Dense}_{1}Dense start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, whilst a sigmoid activation function was used for the DenseoutsubscriptDenseout\text{Dense}_{\text{out}}Dense start_POSTSUBSCRIPT out end_POSTSUBSCRIPT layer.
Surrogate Schema Short-wave Long-wave
Vector Inputs Xi=(p~i,T~i,ρ~i)subscript𝑋𝑖subscript~𝑝𝑖subscript~𝑇𝑖subscript~𝜌𝑖\vec{X}_{i}=(\tilde{p}_{i},\tilde{T}_{i},\tilde{\rho}_{i})over→ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) Xi=(p~i,T~i,ρ~i)subscript𝑋𝑖subscript~𝑝𝑖subscript~𝑇𝑖subscript~𝜌𝑖\vec{X}_{i}=(\tilde{p}_{i},\tilde{T}_{i},\tilde{\rho}_{i})over→ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
Scalar Inputs βSW=(μ,p0,T0,ρ0,αSW)subscript𝛽𝑆𝑊𝜇subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝑆𝑊\vec{\beta}_{SW}=(\mu,p_{0},T_{0},\rho_{0},\alpha_{SW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT = ( italic_μ , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT ) βLW=(p0,T0,ρ0,αLW)subscript𝛽𝐿𝑊subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝐿𝑊\vec{\beta}_{LW}=(p_{0},T_{0},\rho_{0},\alpha_{LW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT = ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT )
Table 1: Surrogate model inputs: Surrogate models for both the long-wave and short-wave schemas took scaled vector inputs Xi=(p~i,T~i,ρ~i)subscript𝑋𝑖subscript~𝑝𝑖subscript~𝑇𝑖subscript~𝜌𝑖\vec{X}_{i}=(\tilde{p}_{i},\tilde{T}_{i},\tilde{\rho}_{i})over→ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for the i𝑖iitalic_ith layer where p~i,T~i,ρ~isubscript~𝑝𝑖subscript~𝑇𝑖subscript~𝜌𝑖\tilde{p}_{i},\tilde{T}_{i},\tilde{\rho}_{i}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are scaled pressure, scaled temperature and scaled density of the i𝑖iitalic_ith layer of the input atmospheric column, respectively (see 3.2 for details on scaling). The surrogate models took input of scalar inputs β𝛽\vec{\beta}over→ start_ARG italic_β end_ARG, where βSW=(μ,p0,T0,ρ0,αSW)subscript𝛽𝑆𝑊𝜇subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝑆𝑊\vec{\beta}_{SW}=(\mu,p_{0},T_{0},\rho_{0},\alpha_{SW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT = ( italic_μ , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_S italic_W end_POSTSUBSCRIPT ) for the short-wave surrogate model, and βLW=(p0,T0,ρ0,αLW)subscript𝛽𝐿𝑊subscript𝑝0subscript𝑇0subscript𝜌0subscript𝛼𝐿𝑊\vec{\beta}_{LW}=(p_{0},T_{0},\rho_{0},\alpha_{LW})over→ start_ARG italic_β end_ARG start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT = ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_L italic_W end_POSTSUBSCRIPT ) for the long-wave surrogate model, where μ𝜇\muitalic_μ and α𝛼\alphaitalic_α represent the cosine of the solar zenith angle and surface albedo, respectively.

3.4 Model Training

Both models were trained in a supervised, end-to-end fashion. An Adam (Kingma & Ba, 2017) optimiser was used in combination with a cyclical learning rate. Models were trained using TensorFlow on an NVIDIA A100 GPU.

3.4.1 Loss Function

The loss function was constructed as a combination of the mean percentage error between the predictions and targets, per output. The mean error (ME) for the i𝑖iitalic_ith test column and k𝑘kitalic_kth target variable is defined as follows:

MEi,k=j=0nlevels1|y^i,j,ky~i,j,k|nlevelssubscriptME𝑖𝑘superscriptsubscript𝑗0subscript𝑛levels1subscript^𝑦𝑖𝑗𝑘subscript~𝑦𝑖𝑗𝑘subscript𝑛levels\text{ME}_{i,k}=\sum_{j=0}^{n_{\text{levels}}-1}\frac{\left|\hat{y}_{i,j,k}-% \tilde{y}_{i,j,k}\right|}{n_{\text{levels}}}ME start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | end_ARG start_ARG italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT end_ARG (9)

where y^i,j,ksubscript^𝑦𝑖𝑗𝑘\hat{y}_{i,j,k}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT is the target for the i𝑖iitalic_ith test column, j𝑗jitalic_jth atmospheric level, and k𝑘kitalic_kth target variable, where k=0𝑘0k=0italic_k = 0 corresponds to down-welling flux and k=1𝑘1k=1italic_k = 1 corresponds to up-welling flux, and nlevelssubscript𝑛levelsn_{\text{levels}}italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT is the number of atmospheric levels (nlevels=50subscript𝑛levels50n_{\text{levels}}=50italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT = 50 is this work). Normalisation factors were defined as

normi,k=j=0nlevels1|y^i,j,k|nlevelssubscriptnorm𝑖𝑘superscriptsubscript𝑗0subscript𝑛levels1subscript^𝑦𝑖𝑗𝑘subscript𝑛levels\text{norm}_{i,k}=\frac{\sum_{j=0}^{n_{\text{levels}}-1}\left|\hat{y}_{i,j,k}% \right|}{n_{\text{levels}}}norm start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | end_ARG start_ARG italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT end_ARG (10)

such that mean percentage error (MPE) of the i𝑖iitalic_ith test column and k𝑘kitalic_kth target variable can be expressed as

MPEi,k=MEi,knormi,ksubscriptMPE𝑖𝑘subscriptME𝑖𝑘subscriptnorm𝑖𝑘\text{MPE}_{i,k}=\frac{\text{ME}_{i,k}}{\text{norm}_{i,k}}MPE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = divide start_ARG ME start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT end_ARG start_ARG norm start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT end_ARG (11)

The loss function per sample was then defined as

lossi=12kMPEi,ksubscriptloss𝑖12subscript𝑘subscriptMPE𝑖𝑘\text{loss}_{i}=\frac{1}{2}\sum_{k}\text{MPE}_{i,k}loss start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT MPE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (12)

with the total loss defined as the sum over all test samples:

loss=12ikMPEi,kloss12subscript𝑖subscript𝑘subscriptMPE𝑖𝑘\text{loss}=\frac{1}{2}\sum_{i}\sum_{k}\text{MPE}_{i,k}loss = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT MPE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (13)

3.4.2 Hyperparameter Tuning

Multiple models were trained corresponding to different hyperparameter values. Number of neurons of all RNN layers was varied across the range of values [16,32,64,128]163264128[16,32,64,128][ 16 , 32 , 64 , 128 ] for both surrogate models. The best candidate models were chosen as having 128 neurons per RNN layer for the short-wave surrogate model, and 32 neurons per RNN layer for the long-wave surrogate model.

3.5 Performance Analysis

Below, we detail the metrics used to analyse the performance of the surrogate models on the test set. In the results (section 4), different aggregations of absolute error (equation 15 for raw model outputs and equation 14 for postprocessed model outputs) are used to investigate the performance of both the long-wave and short-wave surrogate models.

Absolute ErrorAEi,j,k=|y^i,j,ky~i,j,k|Absolute ErrorsubscriptAE𝑖𝑗𝑘subscript^𝑦𝑖𝑗𝑘subscript~𝑦𝑖𝑗𝑘\text{Absolute Error}\equiv\text{AE}_{i,j,k}=\left|\hat{y}_{i,j,k}-\tilde{y}_{% i,j,k}\right|Absolute Error ≡ AE start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT = | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | (14)

where y^i,j,ksubscript^𝑦𝑖𝑗𝑘\hat{y}_{i,j,k}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT are the post-processed model predictions, and y~i,j,ksubscript~𝑦𝑖𝑗𝑘\tilde{y}_{i,j,k}over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT are the unscaled target variables.

AE^i,j,k=|u^i,j,ku~i,j,k|subscript^AE𝑖𝑗𝑘subscript^𝑢𝑖𝑗𝑘subscript~𝑢𝑖𝑗𝑘\widehat{\text{AE}}_{i,j,k}=\left|\hat{u}_{i,j,k}-\tilde{u}_{i,j,k}\right|over^ start_ARG AE end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT = | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | (15)

where u^i,j,ksubscript^𝑢𝑖𝑗𝑘\hat{u}_{i,j,k}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT are the raw model predictions, and u~i,j,ksubscript~𝑢𝑖𝑗𝑘\tilde{u}_{i,j,k}over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT are target variables which have been scaled to lie in the interval [0,1]01[0,1][ 0 , 1 ] using the scaling methods detailed in section 3.2.

Column-aggregated error quantities are defined as follows:

CAEi,k=j=0nlevels1|y^i,j,ky~i,j,k|subscriptCAE𝑖𝑘superscriptsubscript𝑗0subscript𝑛levels1subscript^𝑦𝑖𝑗𝑘subscript~𝑦𝑖𝑗𝑘\text{CAE}_{i,k}=\sum_{j=0}^{n_{\text{levels}}-1}\left|\hat{y}_{i,j,k}-\tilde{% y}_{i,j,k}\right|CAE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | (16)
CAE^i,k=j=0nlevels1|u^i,j,ku~i,j,k|subscript^CAE𝑖𝑘superscriptsubscript𝑗0subscript𝑛levels1subscript^𝑢𝑖𝑗𝑘subscript~𝑢𝑖𝑗𝑘\widehat{\text{CAE}}_{i,k}=\sum_{j=0}^{n_{\text{levels}}-1}\left|\hat{u}_{i,j,% k}-\tilde{u}_{i,j,k}\right|over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | (17)

where nlevelssubscript𝑛levelsn_{\text{levels}}italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT is the number of atmospheric levels.

Error quantities averaged across samples per altitude level are defined as follows:

MAEj,k=i=0N1|y^i,j,ky~i,j,k|NsubscriptMAE𝑗𝑘superscriptsubscript𝑖0𝑁1subscript^𝑦𝑖𝑗𝑘subscript~𝑦𝑖𝑗𝑘𝑁\text{MAE}_{j,k}=\frac{\sum_{i=0}^{N-1}\left|\hat{y}_{i,j,k}-\tilde{y}_{i,j,k}% \right|}{N}MAE start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | end_ARG start_ARG italic_N end_ARG (18)
MAE^j,k=i=0N1|u^i,j,ku~i,j,k|Nsubscript^MAE𝑗𝑘superscriptsubscript𝑖0𝑁1subscript^𝑢𝑖𝑗𝑘subscript~𝑢𝑖𝑗𝑘𝑁\widehat{\text{MAE}}_{j,k}=\frac{\sum_{i=0}^{N-1}\left|\hat{u}_{i,j,k}-\tilde{% u}_{i,j,k}\right|}{N}over^ start_ARG MAE end_ARG start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT | over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT - over~ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT | end_ARG start_ARG italic_N end_ARG (19)

where N𝑁Nitalic_N is the number of test samples.

Mean flux for the k𝑘kitalic_kth target variable is defined as:

Mean FluxkF¯k=i=0N1j=0nlevels1y~i,j,kNsubscriptMean Flux𝑘subscript¯𝐹𝑘superscriptsubscript𝑖0𝑁1superscriptsubscript𝑗0subscript𝑛levels1subscript~𝑦𝑖𝑗𝑘𝑁\text{Mean Flux}_{k}\equiv\bar{F}_{k}=\frac{\sum_{i=0}^{N-1}\sum_{j=0}^{n_{% \text{levels}}-1}{\tilde{y}_{i,j,k}}}{N}Mean Flux start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≡ over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i , italic_j , italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG (20)

and the mean absolute error for the k𝑘kitalic_kth target variable aggregated across all altitude levels is calculated as

MAEk=j=0nlevels1MAEj,ksubscriptMAE𝑘superscriptsubscript𝑗0subscript𝑛levels1subscriptMAE𝑗𝑘\text{MAE}_{k}=\sum_{j=0}^{n_{\text{levels}}-1}{\text{MAE}_{j,k}}MAE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT levels end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT MAE start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT (21)

4 Results & Discussion

Regime Stream Mean Flux F¯ksubscript¯F𝑘\bar{\textbf{F}}_{k}over¯ start_ARG F end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Wm2Wsuperscriptm2\text{W}\,\text{m}^{-2}W m start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT) MAEksubscriptMAE𝑘\textbf{MAE}_{k}MAE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Wm2Wsuperscriptm2\text{W}\,\text{m}^{-2}W m start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT) MAEksubscriptMAE𝑘\textbf{MAE}_{k}MAE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / F¯ksubscript¯F𝑘\bar{\textbf{F}}_{k}over¯ start_ARG F end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
Long-wave Upwelling 4211.0 18.8 0.45 %
Downwelling 4139.3 16.9 0.41 %
Short-wave Upwelling 577.9 6.4 1.11 %
Downwelling 707.8 7.7 1.09 %
Table 2: The above table summarises the mean absolute error (MAEksubscriptMAE𝑘\text{MAE}_{k}MAE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, defined in equation 21) across the four target variables, and relative to the mean values F¯ksubscript¯𝐹𝑘\bar{F}_{k}over¯ start_ARG italic_F end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of these four variables (as defined in equation 20), across the test set.
Refer to caption
Refer to caption
Refer to caption
Figure 3: Plots A and B display the MAE^j,ksubscript^MAE𝑗𝑘\widehat{\text{MAE}}_{j,k}over^ start_ARG MAE end_ARG start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT (defined in equation 19) for test samples which have column-aggregated absolute error (CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT, defined in equation 17) in the [5,95]595[5,95][ 5 , 95 ] percentile interval (that is, test samples with errors falling in the smallest 5% and largest 5% of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plots D and E display the MAEj,ksubscriptMAE𝑗𝑘\text{MAE}_{j,k}MAE start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT (defined in equation 18) of surrogate model predictions across the two short-wave target variables, for test samples which lie within one angular degree of the substellar point; and plots G and H display the MAEj,ksubscriptMAE𝑗𝑘\text{MAE}_{j,k}MAE start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT of surrogate model predictions across the two short-wave target variables, for test samples which lie within 0.1 angular degree of the day-night terminator. Plot C displays the scaled target short-wave flux profiles averaged across the test samples with CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT within the chosen percentile interval. Plot F displays the target short-wave flux profiles averaged across samples which lie within one angular degree of the substellar point; and plot I displays displays the target short-wave flux profiles averaged across samples which lie within 1.1 angular degree of the terminator. The shaded regions in all plots represent the interval [max(0,θjσj),θj+σj]0subscript𝜃𝑗subscript𝜎𝑗subscript𝜃𝑗subscript𝜎𝑗[\max{(0,\theta_{j}-\sigma_{j})},\,\,\theta_{j}+\sigma_{j}][ roman_max ( 0 , italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] where θjsubscript𝜃𝑗\theta_{j}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σjsubscript𝜎𝑗\sigma_{j}italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denote the mean and standard deviation of the plotted quantities for the j𝑗jitalic_jth altitude level.
Refer to caption
Refer to caption
Figure 4: Plots A and B display the MAE^j,ksubscript^MAE𝑗𝑘\widehat{\text{MAE}}_{j,k}over^ start_ARG MAE end_ARG start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT (defined in equation 19) and plots D and E display the MAEj,ksubscriptMAE𝑗𝑘\text{MAE}_{j,k}MAE start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT (defined in equation 18) of surrogate model predictions across the two long-wave target variables, for test samples which have column-aggregated absolute error (CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT, CAEi,ksubscriptCAE𝑖𝑘\text{CAE}_{i,k}CAE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT) in the [5,95]595[5,95][ 5 , 95 ] percentile interval (that is, test samples with errors falling in the smallest 5% and largest 5% of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plot C displays the scaled target long-wave flux profiles averaged across the test samples with CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT within the chosen percentile interval. Plot F displays the target flux long-wave profiles averaged across the test samples with CAEi,ksubscriptCAE𝑖𝑘{\text{CAE}}_{i,k}CAE start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT within the chosen percentile interval (not necessarily the same subset displayed in plot C). The shaded regions in all plots represent the interval [max(0,θjσj),θj+σj]0subscript𝜃𝑗subscript𝜎𝑗subscript𝜃𝑗subscript𝜎𝑗[\max{(0,\theta_{j}-\sigma_{j})},\,\,\theta_{j}+\sigma_{j}][ roman_max ( 0 , italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] where θjsubscript𝜃𝑗\theta_{j}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and σjsubscript𝜎𝑗\sigma_{j}italic_σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT denote the mean and standard deviation of the plotted quantities for the j𝑗jitalic_jth altitude level.
Refer to caption
Refer to caption
Refer to caption
Figure 5: The above plots display the variation in CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (as defined in equation 17) across the long-wave and short-wave schemas, as functions of the cosine of the solar zenith angle (cosμ𝜇\cos\muroman_cos italic_μ) and of surface temperature T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (for the long-wave schema only). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin.
Top row: The above figure displays the error distribution of test samples per cosμ𝜇\cos\muroman_cos italic_μ bin, across both short-wave target variables. Test samples have been binned into 20 bins of cosμ𝜇\cos\muroman_cos italic_μ, into 20 bins of CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT. Bins have been normalised to percentages by the total number of test columns in each cosμ𝜇\cos\muroman_cos italic_μ bin.
Middle & bottom rows: The above figure displays the error distribution of test samples per cosμ𝜇\cos\muroman_cos italic_μ bin, across both long-wave target variables as a function of cosμ𝜇\cos\muroman_cos italic_μ (middle row) and T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (bottom row). cosμ𝜇\cos\muroman_cos italic_μ, T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT have been divided linearly into 20 bins; the plots display the proportion of test samples falling into each error bin relative to total samples in a given cosμ𝜇\cos\muroman_cos italic_μ or T0subscript𝑇0T_{0}italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bin.
Refer to caption
Refer to caption
Figure 6: Left: The above figure displays the simulated atmospheric temperature profile (in units of K) of Venus averaged over the final Venus solar day of the simulation. The simulation was run using OASIS and OASIS-RT for 5 Venus solar days (each Venus solar day is equivalent to approximately 117 Earth days). Right: The above figure displays the percentage difference in simulated atmospheric temperature profiles of Venus averaged across the final Venus day of the simulation, for two simulations using 1. OASIS-RT and 2. the surrogate models presented in this work, to model the radiative transfer. Both: In both plots, the main cloud deck extends from approximately 102mbarsuperscript102mbar10^{2}\,\text{mbar}10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT mbar to 2×103mbar2superscript103mbar2\times 10^{3}\,\text{mbar}2 × 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT mbar.

4.1 Model Performance on Test Set

Table 2 summarises the mean absolute error (MAE, equation 21) across the four target variables, and relative to the mean values of these four variables, across the test set. These MAE values are in line with those achieved using similar surrogate modelling methods for radiative transfer within the Earth’s atmosphere, with Ukkonen (2022) quoting MAE for short-wave fluxes of around 1% or less. For both the long-wave and short-wave regimes, the MAE is higher for the upwelling fluxes as compared to downwelling fluxes: this is expected as physical computation of the upwelling flux depends on the computation of the downwelling flux, thus meaning this is a more complicated mapping to emulate. The percentage errors for the short-wave targets are a factor of 2-3 greater than those for the long-wave targets: this is to be expected as the magnitude of the short-wave targets are smaller, and the supervised learning task set for the short-wave model in this work is a more complex mapping from inputs to outputs as compared to that for the long-wave model.

In the following subsections, we visualise and interrogate the variation of model prediction errors with altitude and with scalar variables: cosμ𝜇\cos\muroman_cos italic_μ for both long-wave and short-wave model predictions, and surface temperature for long-wave model predictions only. For completeness, further plots of error as a function of the remaining input scalar variables (surface temperature, surface pressure, surface gas density) are contained in appendix section D.

4.1.1 Short-Wave Surrogate Model

A. Variation of error vs. altitude
Figure 3 displays the average absolute error of predictions at different altitude levels across test samples. Below around 65km65km65\,\text{km}65 km, the average error increases roughly in proportion to the average target fluxes. Above this altitude, the average error decreases for both target variables, even though the average values of these variables continue to rise with altitude. This change in the trend of the average error occurs around the top of the cloud deck.

Potential explanations as to why there is lower error in predicting targets above 65km65km65\,\text{km}65 km are as follows: above the top of the cloud deck, there is less complexity in mapping from input variables to output fluxes, and so this can be naively assumed to be an easier task to learn; there is also tighter variance in the target variables within the test set above this altitude threshold.

Plots A and B of figure 3 display mean errors of less than 3% across all altitude levels, averaged across test samples in the [5, 95] percentile interval of column-aggregated errors. This accuracy falls within an acceptably small margin of error.
B. Variation of error vs. cosμ𝜇\cos\muroman_cos italic_μ
Figure 5 displays the distribution in test errors as a function of cosμ𝜇\cos\muroman_cos italic_μ of the test column. Data in these plots have been binned to more simply display the spread of errors for a given cosμ𝜇\cos\muroman_cos italic_μ interval. For values of cosμ𝜇\cos\muroman_cos italic_μ close to 1, there is a narrower distribution of error, with test samples being predicted reliably more accurately as compared to test samples corresponding to lower values of cosμ𝜇\cos\muroman_cos italic_μ.

This variation in test error distribution as a function of cosμ𝜇\cos\muroman_cos italic_μ may be attributable to the approximations used during target preprocessing when training the model (see section 3.2); if indeed this is the case, then there is scope to mitigate against this by refining data preprocessing methods when producing future iterations of this surrogate model. This error variation with cosμ𝜇\cos\muroman_cos italic_μ may otherwise be due to the model not exactly capturing the complexities in how cosμ𝜇\cos\muroman_cos italic_μ is used in mapping inputs to outputs, which may be an acceptable and necessary trade-off in using a surrogate model for the purposes of model speed-up.

4.1.2 Long-Wave Surrogate Model

A. Variation of error vs. altitude
Figure 4 displays the absolute error of predictions averaged across altitude levels across test samples. A similar trend can be seen in these plots as compared to the trends described in 4.1.1: average magnitude of error roughly follows the same trend as average magnitude of the target variables, except for between 4070km4070km40-70\,\text{km}40 - 70 km altitude (roughly in the interval of the main cloud deck) whereby the error rises significantly at the top of the cloud deck and decreases going deeper into the cloud deck, for both target variables.

Below the cloud deck, it can be seen from plots A and B that the mean error in target predictions is less than 5% of mean target magnitude, for the given percentile interval of test samples. Above the cloud deck, the long-wave fluxes tend to zero, so though the percentage errors increase for increasing altitude, these accuracies still fall within the reasonable margin of error for the model.
B. Variation of error vs. cosμ𝜇\cos\muroman_cos italic_μ/surface temperature
Variations in test error versus surface temperature (displayed in figure 5), appear to follow the same trend as compared to residuals in the scaling factor used in preprocessing (displayed in figure 1). This is promising as it may indicate that scaling residuals are the limiting factor in model accuracy, and these scaling residuals were initially deemed as acceptably small for the purpose of this work. Considering variation in test error versus cosine of the solar zenith angle (cosμ𝜇\cos\muroman_cos italic_μ) (also displayed in figure 5), there is not a discernible trend in the error variation across cosμ𝜇\cos\muroman_cos italic_μ, though it appears that the error distribution is most narrow for cosμ𝜇\cos\muroman_cos italic_μ approaching 1.

4.2 Model Performance in Simulation

To evaluate the performance of our new surrogate model on 3D simulations of Venus’s atmosphere, we have run two OASIS simulations: one using OASIS-RT for the radiative transfer scheme and the other using the new surrogate model presented in this work. In order to efficiently run the 3D simulation with OASIS-RT, as detailed in section 2.1.1, we updated the radiative fluxes for solar radiation every 2880 steps, and the thermal radiation fluxes were updated every 320 steps. Both simulations run for 5 Venus solar days, each of which is approximately 117 Earth days. Figure 6 displays the temperature of the simulation using OASIS-RT averaged over the last simulated Venus day in the left-hand plot, and the percentage difference in this quantity between the two simulations in the right-hand plot. Beneath the bottom of the cloud deck, the percentage difference between the time-averaged temperature profiles produced by the two simulations is below 1.5%; in the interval of the cloud deck, below 2.7%; and above the cloud deck, below 4.0%. These deviations in the final temperature profiles fall within the range of uncertainties in the measurements. This is reasonable due to two main factors: firstly, there will be inherent discrepancies between simulated and real temperatures arising from assumptions made in the physical model, and secondly, deviations will also naturally arise between the physical model and real temperature profiles due to the spatial resolution of the simulation. Also, we expected differences to arise due to the frequency at which the radiative fluxes are updated. In the case of the surrogate model, it is possible to update them at every physical time step.

4.3 Speed-up

Our new model with the surrogate-integrated model, integrated for 5 Venus days has been shown in figure 6 to produce temperature profiles with an acceptably small deviation from those produced in the same number of epochs using the original model. These two models described in the previous section had similar performances. However, when we compare OASIS simulations with OASIS-RT, where the fluxes are updated every timestep, and the OASIS simulations with the new surrogate model, we find a 101-factor speed-up for integration of only 1000 timesteps. Additionally, a large amount of global memory is saved on the GPU graphics card in the simulations with the new surrogate model since it does not require storing opacity cross-sections from the gas absorption or clouds. With the potential to update the radiative fluxes on every physical timestep, our new approach allows for the inclusion of dynamical cloud feedback at a small computational cost. This addresses one of the main limitations of current 3D Venus atmospheric models.

4.4 Limitations

The surrogate models produced in this work do not take explicit input of the structure of cloud and gas absorber constituents of the atmosphere being modelled, except for the input of gas density, ρi,jsubscript𝜌𝑖𝑗\rho_{i,j}italic_ρ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. This means that the learning objective for surrogate models in this work is to approximate the mapping from input thermodynamic column variables to output columnar flux profiles, conditioned on a specific cloud and absorber structure. Consequently, this means that the models produced in this work are only applicable for planets corresponding to the planetary parameters and cloud and absorber structure specific to Venus. This is a limitation in terms of generalisability of the surrogate-integrated GCM to other types of atmosphere.

5 Conclusions

This work introduces a surrogate model approach to replacing numerical simulations of short-wave and long-wave computations in a two-stream radiative transfer model, aimed at accelerating the Global Circulation Model (GCM), OASIS. The results show a significant GCM speed-up by a factor of 101 GPU performance, with surrogate models for both long-wave and short-wave regimes achieving test set accuracies of approximately 1%. Additionally, this approach replicates the temperature profile of the original Venus simulations averaged across a Venus solar day with differences of 4% after 5 Venus solar days of simulation.

This work is significant in that it enables:

  1. 1.

    100×\sim 100\times∼ 100 × faster simulations of planets with massive atmospheres that require complex radiative transfer schemes, such as the Venus atmosphere.

  2. 2.

    Longer simulations with a much higher spatial resolution (10×\sim 10\times∼ 10 ×).

  3. 3.

    Improved representation of the temperature evolution of short-term physical phenomena in the atmosphere. These can be atmospheric waves with timescales shorter than the period at which the radiative fluxes are updated in the simulation. In the case of our Venus simulations, we can measure the temperature change of atmospheric waves with timescales <12absent12<12< 12 hours.

  4. 4.

    A model free of model tuning to optimize performance, such as the frequency of how the radiative fluxes are updated.

  5. 5.

    The inclusion in 3D simulations of cloud dynamical feedback or higher-order, more complex radiative schemes with a small extra computational cost.

This achievement of faster and/or higher spatial resolution atmospheric simulations will facilitate better insight into the nature of the atmosphere of Venus, as well as bench-marking the utility and applicability of such modelling techniques for use in exoplanet science.

Acknowledgements

We thank Dr Ahmed Al-Refaie, Nikita Pond and Max Hart for helpful conversations on integrating TensorFlow models within a C/C++ framework.

The authors acknowledge the use of the High Performance Computing facilities of University College London to carry out this work, specifically the Hypatia cluster. This work used computing equipment funded by the Research Capital Investment Fund (RCIF) provided by UKRI, and partially funded by the UCL Cosmoparticle Initiative. This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement n 758892/ExoAI), and from the Science and Technology Facilities Council (STFC; grant n ST/W00254X/1 and grant n ST/W50788X/1).

Data Availability

The code for this project will be made publicly available at https://github.com/ttahseen/oasis-rt-surrogate on acceptance of this paper.

References

Appendix A Model Parameters

The tables below display the number of model parameters per layer of the two surrogate models produced in this work.

A.1 Short-Wave Surrogate Model

Layer Output Shape Number of Parameters
Main Inputs [(n𝑛nitalic_n, 49, 3)] 0
Auxiliary Inputs [(n𝑛nitalic_n, 5)] 0
GRUsubscriptGRU\text{GRU}_{\downarrow}GRU start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT [(n𝑛nitalic_n, 49, 128), (n𝑛nitalic_n, 128)] 51,072
Dense1subscriptDense1\text{Dense}_{1}Dense start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [(n𝑛nitalic_n, 128)] 17,152
GRUsubscriptGRU\text{GRU}_{\uparrow}GRU start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT [(n𝑛nitalic_n, 50, 128)] 99,072
DenseoutsubscriptDenseout\text{Dense}_{\text{out}}Dense start_POSTSUBSCRIPT out end_POSTSUBSCRIPT [(n𝑛nitalic_n, 50, 2)] 514
Total number of model parameters: 167,810
Table 3: This table displays the number of parameters across model layers for the short-wave surrogate model, as well as the output shapes of each layer. n𝑛nitalic_n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in figure 2.

A.2 Long-Wave Surrogate Model

Layer Output Shape Number of Parameters
Main Inputs [(n𝑛nitalic_n, 49, 3)] 0
Auxiliary Inputs [(n𝑛nitalic_n, 4)] 0
GRUsubscriptGRU\text{GRU}_{\downarrow}GRU start_POSTSUBSCRIPT ↓ end_POSTSUBSCRIPT [(n𝑛nitalic_n, 49, 32), (n𝑛nitalic_n, 32)] 3,552
Dense1subscriptDense1\text{Dense}_{1}Dense start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [(n𝑛nitalic_n, 32)] 1,184
GRUsubscriptGRU\text{GRU}_{\uparrow}GRU start_POSTSUBSCRIPT ↑ end_POSTSUBSCRIPT [(n𝑛nitalic_n, 50, 32)] 6,336
DenseoutsubscriptDenseout\text{Dense}_{\text{out}}Dense start_POSTSUBSCRIPT out end_POSTSUBSCRIPT [(n𝑛nitalic_n, 50, 2)] 130
Total number of model parameters: 11,202
Table 4: This table displays the number of parameters across model layers for the long-wave surrogate model, as well as the output shapes of each layer. n𝑛nitalic_n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in figure 2.

Appendix B Further aggregate statistics of model test set performance

Refer to caption
Figure 7: The above plots display the mean absolute error of the long-wave surrogate model predictions. Plot C displays the target scaled flux profiles averaged across the test set. Plot F displays the target flux profiles averaged across the test set. Plots A and B display the mean absolute error of the scaled predictions and targets across the test set. Plots D and E display the mean absolute error of the unscaled predictions and targets across the test set.
Refer to caption
Figure 8: The above plots display the mean absolute error of the short-wave surrogate model predictions. Plot C displays the target scaled flux profiles averaged across the test set. Plot F displays the target flux profiles averaged across the test set. Plots A and B display the mean absolute error of the scaled predictions and targets across the test set. Plots D and E display the mean absolute error of the unscaled predictions and targets across the test set.

Appendix C Distribution of Samples

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Top row: The figure displays the distributions of surface temperature (left tile) and cosine of the solar zenith angle, cosμ𝜇\cos\muroman_cos italic_μ (right tile) across the dataset used for this work.
Bottom row: The figure displays the distributions of surface temperature across samples falling within the dayside (left tile) and nightside (right tile).
Refer to caption
Figure 10: The figure displays the co-variation of surface temperature and cosμ𝜇\cos\muroman_cos italic_μ across dayside samples within the dataset.

Appendix D Distribution of Errors with Scalar Parameters

Refer to caption
Refer to caption
Refer to caption
Figure 11: The figure displays variation in CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (as defined in equation 17) of short-wave model predictions as a function of surface temperature (top row), surface pressure (middle row), and surface gas density (bottom row). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin. Bins have been normalised to percentages by the total number of test columns in each x-axis bin.
Refer to caption
Refer to caption
Figure 12: The figure displays variation in CAE^i,ksubscript^CAE𝑖𝑘\widehat{\text{CAE}}_{i,k}over^ start_ARG CAE end_ARG start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT (as defined in equation 17) of long-wave model predictions as a function of surface pressure (top row), and surface gas density (bottom row). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin. Bins have been normalised to percentages by the total number of test columns in each x-axis bin.