Scattering transforms on the sphere, application to large scale structure modelling

L. Mousset1    E. Allys1    M. A. Price2    J. Aumont3    J.M. Delouis4    L. Montier3    J. D. McEwen2 1Laboratoire de Physique de l’École Normale Supérieure, ENS, Univ. PSL, CNRS, Sorbonne Univ., Univ. Paris Cité, 75005 Paris, France
2Mullard Space Science Laboratory, UCL, Holmbury St Mary, Dorking, Surrey RH5 6NT, UK
3Institut de Recherches en Astrophysique et Planétologie, Univ. de Toulouse, CNRS, CNES, UPS
4Laboratoire d’Océanographie Physique et Spatiale, Univ. Brest, CNRS, Ifremer, IRD, Brest, France
Abstract

Scattering transforms are a new type of summary statistics recently developed for the study of highly non-Gaussian processes, which have been shown to be very promising for astrophysical studies. In particular, they allow one to build generative models of complex non-linear fields from a limited amount of data. In the context of upcoming cosmological surveys, the extension of these tools to spherical data is necessary. We develop scattering transforms on the sphere and focus on the construction of maximum-entropy generative models of astrophysical fields. The quality of the generative models, both statistically and visually, is very satisfying, which therefore open up a wide range of new applications for future cosmological studies.

1 Introduction

Scattering transforms (ST) are a recently developed class of summary statistics for the study of non-Gaussian processes [1]. They are inspired from neural networks but do not require any training step to be computed. Introduced recently in astrophysics [2][3], ST have since demonstrated their ability to characterize highly non-Gaussian processes, for instance for parameter estimation and classification tasks.

Another feature of ST is that they allow one to build very efficient generative models of physical fields, in a maximum entropy framework [4]. This allows one to sample new approximate realisations of a given process relying only on its ST statistics, that can be estimated even from a small amount of data, sometime even a single example image [3][5][6][7].

While these promising ST generative models have mainly been developed for 2D planar data, the adaptation of these tools to spherical data is necessary for cosmological analysis, especially for the next generation of full sky surveys such as LiteBIRD for the cosmic microwave background polarization, or Rubin-LSST and Euclid for study of the large scale structures of the Universe. The extension of ST to spherical data however raises some difficulties: the definition of a directional convolution with oriented filters [8][9], as well as the transposition of the planar translations which appear in certain ST representations. In this paper, we propose an adaptation of state-of-the-art ST to spherical fields. As a first step, we restrict ourselves to homogeneous fields with properties that do not depend on the position on the sphere. This naturally leads us to cosmological fields. In this proceeding, we only show the results for a weak lensing field from the Large Scale Structures (LSS) of the Universe using the CosmoGrid data set [10]. For this field, we construct and validate a ST generative model from one single example image. Generative models built from other fields, such as the thermal Sunyaev-Zeldovitch effect are presented in Mousset et al. (2024) [11].

2 Scattering covariance on the sphere

ST refer to a family of summary statistics which includes in particular the Wavelet Scattering Transforms (WST) [2] or the Wavelet Phase Harmonics (WPH) [3]. In this work, we have considered the scattering covariances (SC), or scattering spectra [5]. We chose the SC, because they only rely on convolutions, and not on translations as the Wavelet Phase Harmonics [3], which are difficult to univocally define on spherical maps.

SC statistics are computed from wavelet transforms, which are obtained by convolving an initial map with a set of wavelet filters, where each filter extracts the local information at a particular scale (labelled by j𝑗jitalic_j) and orientation (labelled by γ𝛾\gammaitalic_γ). Wavelet filters need to be localized both in pixel and harmonic space.

To compute the wavelet transforms, various convolutions on the sphere can be considered. In this work we follow the standard directional convolution formalism presented in, e.g., McEwen et al. (2015) [8]. The directional convolution IΨj𝐼superscriptΨ𝑗I\star\Psi^{j}italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT of a field I𝐼Iitalic_I with a wavelet ΨjsuperscriptΨ𝑗\Psi^{j}roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT consists in applying a rotation by a set ρ=(α,β,γ)𝜌𝛼𝛽𝛾\rho=(\alpha,\beta,\gamma)italic_ρ = ( italic_α , italic_β , italic_γ ) of Euler angles of the wavelet ΨjsuperscriptΨ𝑗\Psi^{j}roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT initially located at the north pole, before computing an inner product between the wavelet and the field I𝐼Iitalic_I:

(IΨj)(ρ)=ΩI(𝝎)[RρΨj(𝝎)]dΩ,𝐼superscriptΨ𝑗𝜌subscriptΩ𝐼𝝎superscriptdelimited-[]subscript𝑅𝜌superscriptΨ𝑗𝝎dΩ(I\star\Psi^{j})(\rho)=\int_{\Omega}I(\bm{\omega})[R_{\rho}\Psi^{j}(\bm{\omega% })]^{*}\text{d}\Omega,( italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ( italic_ρ ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_I ( bold_italic_ω ) [ italic_R start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( bold_italic_ω ) ] start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT d roman_Ω , (1)

where Rρsubscript𝑅𝜌R_{\rho}italic_R start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT is the rotation by Euler angles ρ𝜌\rhoitalic_ρ, and * stands for complex conjugation. From (IΨj)(ρ)𝐼superscriptΨ𝑗𝜌(I\star\Psi^{j})(\rho)( italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ( italic_ρ ), we can identify (β,α𝛽𝛼\beta,\alphaitalic_β , italic_α) with the spherical coordinates 𝝎=(θ,φ)𝝎𝜃𝜑\bm{\omega}=(\theta,\varphi)bold_italic_ω = ( italic_θ , italic_φ ) and γ𝛾\gammaitalic_γ to the orientation which is probed in the convolution. In this way, we obtain oriented wavelet coefficients, that we describe with the following shorthand notation

(IΨj,γ)(𝝎)(IΨj)(α=φ,β=θ,γ).𝐼superscriptΨ𝑗𝛾𝝎𝐼superscriptΨ𝑗formulae-sequence𝛼𝜑𝛽𝜃𝛾(I\star\Psi^{j,\gamma})(\bm{\omega})\equiv(I\star\Psi^{j})(\alpha=\varphi,% \beta=\theta,\gamma).( italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_j , italic_γ end_POSTSUPERSCRIPT ) ( bold_italic_ω ) ≡ ( italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) ( italic_α = italic_φ , italic_β = italic_θ , italic_γ ) . (2)

SC statistics characterize the power and sparsity at each scale, as well as interaction between different scales. They are built from successive applications of wavelet transforms and modulus operators, followed by average and covariance computations [5]. We consider two coefficients at a single oriented scale λ1=(j1,γ1)subscript𝜆1subscript𝑗1subscript𝛾1\lambda_{1}=(j_{1},\gamma_{1})italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ):

S1λ1=|IΨλ1|,S2λ1=|IΨλ1|2,formulae-sequencesuperscriptsubscript𝑆1subscript𝜆1delimited-⟨⟩𝐼superscriptΨsubscript𝜆1superscriptsubscript𝑆2subscript𝜆1delimited-⟨⟩superscript𝐼superscriptΨsubscript𝜆12S_{1}^{\lambda_{1}}=\langle|I\star\Psi^{\lambda_{1}}|\rangle\,,\quad S_{2}^{% \lambda_{1}}=\langle|I\star\Psi^{\lambda_{1}}|^{2}\rangle\,,italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟨ | italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ⟩ , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = ⟨ | italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ , (3)

and two coefficients that characterize the couplings between two and three oriented scales:

S3λ1,λ2=Cov[IΨλ1,|IΨλ2|Ψλ1],S4λ1,λ2,λ3=Cov[|IΨλ3|Ψλ1,|IΨλ2|Ψλ1],formulae-sequencesuperscriptsubscript𝑆3subscript𝜆1subscript𝜆2Cov𝐼superscriptΨsubscript𝜆1𝐼superscriptΨsubscript𝜆2superscriptΨsubscript𝜆1superscriptsubscript𝑆4subscript𝜆1subscript𝜆2subscript𝜆3Cov𝐼superscriptΨsubscript𝜆3superscriptΨsubscript𝜆1𝐼superscriptΨsubscript𝜆2superscriptΨsubscript𝜆1S_{3}^{\lambda_{1},\lambda_{2}}=\text{Cov}\left[I\star\Psi^{\lambda_{1}},|I% \star\Psi^{\lambda_{2}}|\star\Psi^{\lambda_{1}}\right],\quad S_{4}^{\lambda_{1% },\lambda_{2},\lambda_{3}}=\text{Cov}\left[|I\star\Psi^{\lambda_{3}}|\star\Psi% ^{\lambda_{1}},|I\star\Psi^{\lambda_{2}}|\star\Psi^{\lambda_{1}}\right],italic_S start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = Cov [ italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , | italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] , italic_S start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = Cov [ | italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , | italic_I ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ⋆ roman_Ψ start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] , (4)

where delimited-⟨⟩\langle\cdot\rangle⟨ ⋅ ⟩ corresponds to the mean over the sphere, and where covariances are defined as Cov[XY]=XYXYCovdelimited-[]𝑋𝑌delimited-⟨⟩𝑋superscript𝑌delimited-⟨⟩𝑋delimited-⟨⟩superscript𝑌\text{Cov}[XY]=\langle XY^{*}\rangle-\langle X\rangle\langle Y^{*}\rangleCov [ italic_X italic_Y ] = ⟨ italic_X italic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⟩ - ⟨ italic_X ⟩ ⟨ italic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⟩ for two complex fields X𝑋Xitalic_X and Y𝑌Yitalic_Y.

3 Generative model of the large scale structures

3.1 Maximum entropy generative model

We build generative models under scattering covariance constraints. These are maximum entropy microcanonical models, which are approximately sampled by gradient descent [4]. Such models are constructed from statistics ΦΦ\Phiroman_Φ estimated from a target field xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT; in this paper the target field is a single full-sky map. The associated microcanonical set ΩεsubscriptΩ𝜀\Omega_{\varepsilon}roman_Ω start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT of width ε𝜀\varepsilonitalic_ε is

Ωε={x:Φ(x)Φ(xt)2<ε},subscriptΩ𝜀conditional-set𝑥superscriptnormΦ𝑥Φsubscript𝑥𝑡2𝜀\Omega_{\varepsilon}=\left\{x:||\Phi(x)-\Phi(x_{t})||^{2}<\varepsilon\right\},roman_Ω start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = { italic_x : | | roman_Φ ( italic_x ) - roman_Φ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < italic_ε } , (5)

where ||.||||.||| | . | | is the Euclidean norm. The microcanonical maximum entropy model is the model of maximal entropy defined over ΩεsubscriptΩ𝜀\Omega_{\varepsilon}roman_Ω start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT, which has an uniform distribution over this set.

In this paper, we approximate this sampling with a gradient descent approach, which consists in transporting a higher entropy Gaussian white noise distribution into a distribution supported in ΩεsubscriptΩ𝜀\Omega_{\varepsilon}roman_Ω start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT. In practice, each new sample is obtained by first drawing a white noise realization, and then performing a gradient descent in pixel space using a loss

(x)=Φ(x)Φ(xt)2.𝑥superscriptnormΦ𝑥Φsubscript𝑥𝑡2\mathcal{L}(x)=||\Phi(x)-\Phi(x_{t})||^{2}.caligraphic_L ( italic_x ) = | | roman_Φ ( italic_x ) - roman_Φ ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (6)

The typical width ε𝜀\varepsilonitalic_ε of the microcanonical ensemble is then fixed by the number of iterations used in the gradient descent. In practice we find that 100similar-toabsent100\sim 100∼ 100 iterations is typically sufficient, in which case an image at nside=128nside128\texttt{nside}=128nside = 128 may be generated in 4similar-toabsent4\sim 4∼ 4 seconds.

In our case, the summary statistics Φ(x)Φ𝑥\Phi(x)roman_Φ ( italic_x ) that we consider are the mean over pixels xdelimited-⟨⟩𝑥\langle x\rangle⟨ italic_x ⟩, its variance Var(x)Var𝑥\text{Var}(x)Var ( italic_x ) and the SC statistics.

3.2 Validation of the generative models

We compare the generated fields with the target one. We first do a visual comparison of the maps to asses the quality of the spatial texture reproduction. We then compute various standard statistics, such as the probability density function (PDF) and the angular power spectrum, in order to quantitatively evaluate our generative model. As we have drawn 50 samples of the microcanonical ensemble, we compute the mean and the standard deviation of these statistics over those 50 realisations. We also propose a comparison with samples from a Gaussian model built from the power spectrum of the target field. This allows us to quantify the contribution of SC statistics compared to purely Gaussian statistics.

In Fig. 1 we show the full-sky maps. We plot the logarithm of the maps in order to better see the textures. The generated map appears to be visually very similar to the target one, which clearly shows that the SC statistics capture an important part of the non-Gaussian texture of the field. On the contrary, the structures are not reproduced in the Gaussian realisation.

Refer to caption
Figure 1: The original target field, a generated map and a Gaussian realisation. We plot the logarithm of the field and color bars are identical for all maps.

The PDF is shown in Fig. 2, both with a linear (left) and a logarithmic (middle) y-axis scaling in order to better exhibit the tails of the distributions on several orders of magnitude. The target field is shown in blue, the generated ones in red and the Gaussian realisations in yellow. The comparison with the Gaussian PDF allows us to better see the non symmetric shape, which is characteristic of non-Gaussian features. As we can see, the PDF is well reproduced up to five orders of magnitude.

The angular power spectrum is shown in Fig. 2 (right) and it is well reproduced over all scales. However, small oscillations around the target can be seen in the generated power spectra. These are residual features related to the frequency bands of the wavelets, which illustrate the trade-off between the quality of reproduction we want to achieve and the number of filters we use; i.e. the computational efficiency of our generative model. In Mousset et al. (2024) [11], another validation using the Minkowski functionals, which are standard non-Gaussian statistics, is made, showing that they are also very well reproduced.

Refer to caption
Refer to caption
Refer to caption
Figure 2: PDF, both with a linear (left) and a logarithmic (middle) y-axis scaling, and angular power spectrum for the LSS field. The target is shown in blue, the generated fields in red, and the Gaussian realisations in yellow. We plot the mean (solid line) and the standard deviation (shadow envelope) over 50 realisations.

4 Conclusions

The main result of this work is the extension of state-of-the-art ST for generative modelling to spherical fields. We have worked with the last generation of ST statistics, named scattering covariances, which were previously introduced for one dimension and two dimensions planar fields. They have the advantage of relying only on successive wavelet transforms and modulus, as well as on covariances, and do not require any translations. We have also used state-of-the-art directional convolutions on the sphere [12], computed in spherical harmonic space.

These developments allow us to build generative models of full sky spherical fields without the need for large training datasets. In fact, our method holds even in the limit of a single data realisation. In this proceeding, we have shown the result for the LSS weak lensing field for which they performed extremely well. In fact, the performance of those generative models were validated quantitatively on different fields and the diversity in terms of structures between the maps shows the impressive ability of SC to comprehensively characterize very different non-Gaussian textures [11].

This work introduces a new powerful innovative approach for spherical data, and it opens interesting perspectives for astrophysical applications. In particular, we plan to use it for the study and the modeling of CMB astrophysical foregrounds. The first goal will be to have a tool to produce multiple realisations of the different astrophysical components. Then, ST could play a role in component separation, relying both on recently developed ST-based statistical component separations approaches, e.g., Auclair et al. (2024) [13], as well as investigating how classical component separation methods could benefit from ST, using the non-Gaussianities as an additional lever arm to disentangle different components.

References

References

  • [1] J. Bruna and S. Mallat, IEEE Trans. Pattern Anal. Mach. Intell. 35, 1872 (2013)
  • [2] E. Allys et al, A&A 629, A115 (2019)
  • [3] E. Allys et al, Phys. Rev. D 102, 103506 (2020)
  • [4] J. Bruna and S. Mallat, Mathematical Statistics and Learning 1, 257 (2019)
  • [5] S. Cheng et al, PNAS Nexus 3, 103 (2024)
  • [6] B. Régaldo-Saint Blancard et al, The Astrophysical Journal 943, 9 (2023)
  • [7] M. A. Price et al, The Open Journal of Astrophysics 6, 35 (2023)
  • [8] J. D. McEwen et al, IEEE Trans. Sig. Proc. , arXiv:1509.06749 (2015)
  • [9] J. D. McEwen, C. Durastanti and Y. Wiaux, Applied Comput. Harm. Anal. 44, 59 (2018)
  • [10] T. Kacprzak et al, JCAP 2023, 050 (2023)
  • [11] L. Mousset et al, submitted to A&A , (2024)
  • [12] M. A. Price and J. D. McEwen, Journal of Computational Physics 510, 113109 (2024)
  • [13] C. Auclair et al, A&A 681, A1 (2024)