ABSTRACT

Galaxy clustering measurements are a key probe of the matter density field in the Universe. With the era of precision cosmology upon us, surveys rely on precise measurements of the clustering signal for meaningful cosmological analysis. However, the presence of systematic contaminants can bias the observed galaxy number density, and thereby bias the galaxy two-point statistics. As the statistical uncertainties get smaller, correcting for these systematic contaminants becomes increasingly important for unbiased cosmological analysis. We present and validate a new method for understanding and mitigating both additive and multiplicative systematics in galaxy clustering measurements (two-point function) by joint inference of contaminants in the galaxy overdensity field (one-point function) using a maximum-likelihood estimator (MLE). We test this methodology with Kilo-Degree Survey-like mock galaxy catalogues and synthetic systematic template maps. We estimate the cosmological impact of such mitigation by quantifying uncertainties and possible biases in the inferred relationship between the observed and the true galaxy clustering signal. Our method robustly corrects the clustering signal to the sub-per cent level and reduces numerous additive and multiplicative systematics from |$1.5 \sigma$| to less than |$0.1\sigma$| for the scenarios we tested. In addition, we provide an empirical approach to identifying the functional form (additive, multiplicative, or other) by which specific systematics contaminate the galaxy number density. Even though this approach is tested and geared towards systematics contaminating the galaxy number density, the methods can be extended to systematics mitigation for other two-point correlation measurements.

1 INTRODUCTION

Over the past few decades, groundbreaking observations and theoretical advances have allowed us to construct a remarkably detailed picture of the properties and evolution of the Universe (e.g. Weinberg et al. 2013). Early-Universe observations from the cosmic microwave background (e.g. Planck Collaboration VI 2020) and distance measurements from Type Ia supernova (Riess et al. 1998; Perlmutter et al. 1999; Filippenko 2005) already provide strong constraints on the currently accepted cosmological model, Lambda cold dark matter (⁠|$\Lambda$|CDM). To further test this model and its predictions across the full span of cosmic time and with a range of observables, photometric galaxy surveys have emerged as a cornerstone of modern cosmological analyses. Recent large-scale structure (LSS) surveys such as the Dark Energy Survey (DES; The Dark Energy Survey Collaboration 2005), Hyper Suprime-Camera Survey (HSC; Aihara et al. 2018a), and Kilo-Degree Survey (KiDS; de Jong et al. 2013) have shown strong cosmological constraining power. Upcoming and newly begun surveys such as the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST; Ivezić et al. 2019), Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al. 2022), Euclid (Euclid Collaboration et al. 2022), and the Nancy Grace Roman Space Telescope High Latitude Survey (Spergel et al. 2015) will enable even more precise tests of the |$\Lambda$|CDM model in the coming decade. However, with the increasing survey size enabling greater statistical precision, the challenge of identifying and mitigating systematic contamination to provide robust and accurate constraints will be more important than ever.

Galaxy clustering measurements play an important role in cosmological analyses, as they provide information about the large-scale distribution of matter of the Universe, which carries information about the cosmic expansion history. The relation between the galaxy and matter density field, commonly known as galaxy bias, is a key determinant in clustering (Desjacques, Jeong & Schmidt 2018) that connects the observed galaxy clustering to the underlying matter field at all physical scales. Alongside weak gravitational lensing, galaxy clustering represents one of the three observables used in the current state-of-the-art analysis method for cosmological inference from imaging surveys (3|$\times$|2pt analysis; e.g. Krause et al. 2021; Abbott et al. 2022). Clustering measurements depend on the galaxy (over-)density field and have a very high signal-to-noise ratio in large-area surveys, so potential systematic effects can disproportionately affect the observed galaxy overdensity field and have an outsized impact on cosmological constraints (Elvin-Poole et al. 2018). Consequently, generating tools for understanding and mitigating the impact of systematics on the galaxy overdensity field is paramount in the effort to produce robust cosmological constraints from LSS measurements.

Systematic effects, in the context of imaging surveys, encompass a wide array of observational and instrumental effects (Sevilla-Noarbe et al. 2021) that can systematically perturb the observed galaxy overdensity field, and thereby bias the measured galaxy two-point statistics. These effects can arise from survey-dependent factors such as atmospheric and observing conditions (e.g. seeing, airmass) and detector effects, or survey-independent (astrophysical) factors such as stellar density or Galactic extinction. In addition, systematics-induced modulation of galaxy redshifts across the sky can also bias our measurements of angular clustering (Baleato Lizancos & White 2023). Accounting for all contaminating systematics can often be difficult, resulting in many possible systematic contaminants to consider (e.g. Rodríguez-Monroy et al. ). Moreover, for survey-independent systematics, cross-correlation of different surveys will not help detect and remove their impact, motivating a comprehensive and robust approach to their mitigation.

Multiple methods have been developed for mitigating galaxy overdensity systematics. We discuss these methods in more detail in Section 2.3. However, despite significant progress in mitigating galaxy overdensity systematics, challenges remain. The DES Y3 clustering results (Rodríguez-Monroy et al. 2022) are a notable illustrative example. Comparison between results from the colour-selected redMaGiC sample (Rozo et al. 2016) and the flux-limited MAGLIM (Porredon et al. 2021) galaxy samples, designed for 3|$\times$|2-pt analysis, revealed unanticipated inconsistencies. This sparked great interest in the area of systematics treatment even though extensive validation determined the cause of the inconsistencies was most likely systematics-dependant sample selection. One pressing difficulty lies in the empirical determination of the form of the systematics contamination to the overdensity field, particularly when multiple systematics effects combine in complex ways. While simulations offer valuable insights, developing empirical methods for modelling and learning about the functional form of systematics contamination is an important path forward for the field.

In this study, we use a new and more generalized modelling approach for systematics contamination of the galaxy overdensity field. Our approach is empirical: we provide a generalized functional form for the contamination, and a method for constraining its parameters using the real survey data. We apply this method to jointly infer and mitigate contaminants in the clustering signal and validate its performance using KiDS-like mock galaxy catalogues (Harnois-Deraps et al. 2018) with synthetic systematics incorporated into them. Our goal is to provide a method for empirically characterizing and mitigating more complex systematic contamination.

The structure of the paper is as follows. In Section 2, we provide background on methods for mitigating galaxy overdensity systematics and the connection to cosmology. In Section 3, we outline our new systematics characterization and mitigation method. Section 4 details the mock extragalactic galaxy catalogue used for testing our method, along with approaches used in its analysis. In Section 5, we present the results of applying our method under a variety of conditions, building up to a multisystematic case that emulates the real complexity of survey data. Section 6 discusses the implications of our results and possible applications to real data. Finally, Section 7 summarizes our findings and presents the challenges and future work.

2 BACKGROUND

Here, we outline the background for this work, starting with the importance of clustering in cosmology and how systematics can bias cosmological analysis. We then introduce the galaxy overdensity formalism and contamination model in Section 2.2 and discuss various existing methods to deal with systematic contamination in Sections 2.3 and 2.4.

2.1 Connection to cosmology

The matter distribution of the Universe is of fundamental interest in cosmology as it carries important information about the evolution and structure of the Universe over time. It is therefore used for testing theoretical models for cosmic acceleration and understanding the |$\Lambda$|CDM cosmological model. Current cosmological analyses, such as the 3|$\times$|2pt analysis (Krause et al. 2021; Abbott et al. 2022), use the combined information from clustering, shear, and galaxy–galaxy lensing to extract cosmological information (e.g. matter energy density, amplitude of matter fluctuations) in a Bayesian inference framework. Combining these probes is advantageous because the probes are complementary and combining them enables us to marginalize over observational and theoretical uncertainties associated with single probes. We use galaxies to probe the underlying matter density field, but marginalizing over the galaxy–matter relationship requires modelling both linear and non-linear behaviour, with the linear relationship that holds on large scales described as |$\delta _{{\rm g}}= b \delta _m \rightarrow w_{{\rm gg}} = b^2 w_{\rm mm}$|⁠, where |$\delta _{{\rm g}/m}$| represents the galaxy/matter overdensity field, w the correlation function and b the linear galaxy bias.

The galaxy bias (for a review, see Desjacques et al. 2018) represents the relation between the galaxy and matter distribution and depends on the mass of the galaxy’s host dark matter halo, along with other properties such as luminosity and redshift. Galaxy bias parameters are fitted during the inference process and are crucial in connecting the galaxy clustering signal to the matter clustering. If a simple linear bias model is used, clustering is then sensitive to |$b^2$| while galaxy–galaxy lensing is sensitive to b (⁠|$w_{{\rm g}\kappa } = b w_{\rm m\kappa }$|⁠, where |$\kappa$| represents the lensing convergence), so we can eliminate b by using both probes. Therefore, biased measurements of the galaxy angular correlation function will bias the cosmological inference analysis. It is explicitly shown in Elvin-Poole et al. (2018) that ignoring systematic contamination to the galaxy clustering signal can cause overestimation of the galaxy bias parameters and underestimation of |$\Omega _{\rm m}$|⁠, for example. For this reason, mitigation of systematic contamination in galaxy clustering plays a crucial role in producing an unbiased cosmological analysis.

2.2 Galaxy overdensity and contamination formalism

In order to create a reliable contamination model for galaxy overdensity systematics, it is useful to understand what kind of contaminants can be present and how they can affect the galaxy number density. For example, spurious galaxy detections due to stars mistakenly identified as galaxies can have a purely additive effect on the observed galaxy number density: to lowest order, this effect should depend on the stellar density but not on the galaxy density. Spatial variations in the point spread function (PSF) size can modulate the selection function (detection probability) of galaxies, for example, due to modifications in the measured signal-to-noise ratio or modulation in the degree of unrecognized blending between galaxies. This effect results in a multiplicative effect on the galaxy number density, as its magnitude depends on the local density of galaxies. An example illustration of additive and multiplicative effects on observed images is depicted in Fig. 1. We emphasize that we have only mentioned examples here, and that subtleties in the image processing can give rise to numerous other additive or multiplicative systematics in the galaxy overdensity field.

Schematic illustrating examples of additive and multiplicative systematics on the galaxy density field. Yellow stars and blue ellipses represent stars and galaxies, respectively. The left panel shows the true galaxy and star locations, while the right panel represents the measured locations. The regions where the measured and true galaxy distributions differ are circled on both panels, with the colour of the circle indicating the type of systematic that is generated. An example of a multiplicative systematic is the unrecognized blending of two or more distinct galaxies in the observed distribution, resulting in a multiplicative effect to the observed number density. An example of an additive systematic is when stars are misidentified as galaxies in the measured distribution. This illustration shows how multiplicative systematics depend on the galaxy density field while additive ones are independent of the galaxy distribution.
Figure 1.

Schematic illustrating examples of additive and multiplicative systematics on the galaxy density field. Yellow stars and blue ellipses represent stars and galaxies, respectively. The left panel shows the true galaxy and star locations, while the right panel represents the measured locations. The regions where the measured and true galaxy distributions differ are circled on both panels, with the colour of the circle indicating the type of systematic that is generated. An example of a multiplicative systematic is the unrecognized blending of two or more distinct galaxies in the observed distribution, resulting in a multiplicative effect to the observed number density. An example of an additive systematic is when stars are misidentified as galaxies in the measured distribution. This illustration shows how multiplicative systematics depend on the galaxy density field while additive ones are independent of the galaxy distribution.

With these examples in mind, we start by defining notation that describes how contaminants modulate the observed galaxy overdensity field. We model the observed galaxy number density, |$\hat{n}_{\rm g}\!(\pmb {\theta })$|⁠, in a given direction |$\pmb {\theta }$| on the sky, as the true galaxy number density |$n_{\rm g}\!(\pmb {\theta })$| modulated by some systematic contamination function |$f_{\text{syst}}(\pmb {\theta })$| and an additive term |$f^{\text{add}}_{\text{syst}}(\pmb {\theta })$|⁠:

$$\begin{eqnarray} \hat{n}_{\rm g}\!(\pmb {\theta }) = n_{\rm g}\!(\pmb {\theta }) (1 + f_{\text{syst}}(\pmb {\theta })) + f^{\text{add}}_{\text{syst}}(\pmb {\theta }). \end{eqnarray}$$
(1)

Since we are interested in the clustering signal quantified as a two-point correlation function of the galaxy overdensity field, we must write equation (1) in terms of the galaxy overdensity field. We write the true number density in the usual way |$n_{\rm g}\!(\pmb {\theta }) = \bar{n}_{\rm g} (1+ \delta _{\rm g}\!(\pmb {\theta }))$|⁠, where |$\delta _{\rm g}$| represents the overdensity of some galaxy density field and bars over quantities indicate ensemble averages.

If we assume1|$\bar{n}_{\rm g} = \bar{\hat{n}}_{\rm g}$|⁠, equation (1) becomes

$$\begin{eqnarray} \hat{\delta }_{\rm g}(\pmb {\theta }) = \delta _{\rm g}(\pmb {\theta }) (1 + f_{\text{syst}}(\pmb {\theta })) + f_{\text{syst}}(\pmb {\theta }) + \frac{f^{\text{add}}_{\text{syst}}(\pmb {\theta })}{\bar{n}_{\rm g}} \end{eqnarray}$$
(2)

after dividing by |$\bar{n}_{\rm g}$| and subtracting 1. Here, too, quantities without versus with hats represent the true versus observed field. We notice two additive terms and one multiplicative bias term. The additive terms |$f_{\text{syst}}$| and |$f^{\text{add}}_{\text{syst}}/\bar{n}_{\rm g}$| can be combined into a single term that describes the additive systematics contamination to the overdensity field. Note that the combined additive terms may differ from the multiplicative term – they are not, by definition, identical.

In practice, we can obtain direct estimates of the overdensities and of observable quantities that can produce systematics by pixelizing the observed survey footprint. That is, all observed galaxies can be binned into equal area pixels from which these quantities can be calculated.

2.3 Methods for systematics correction

Many methods have been developed to address the challenging problem of correcting for systematic contamination in clustering, in response to the need for robust tools to deal with contamination of potentially unknown form. For reference, Weaverdyck & Huterer (2021) provide an extensive analysis and a review of multiple decontamination methods, along with their own method using ElasticNet. For consistency, we stick closely to the formulations and notation used in Weaverdyck & Huterer (2021). Here, we provide a summary of some commonly used methods, with a goal of motivating our own.

2.3.1 Mode projection and template subtraction

Mode projection was introduced by Rybicki & Press (1992) and is applied to directly correct the galaxy overdensity power spectra (⁠|$C_{\ell }$|⁠) for the presence of systematic contaminants by assigning infinite variance to the systematic templates, hence modifying the signal covariance matrix. However, this method suffers from the difficulty of needing to invert the modified covariance matrix. Elsner, Leistedt & Peiris (2016) proposed an adaptation of this method that avoided this challenge. For a single contaminant template t and some contamination parameter |$\alpha$|⁠, the contamination model assumes that the additive terms in equation (2) can be written as |$f_{\text{syst}}\!(\pmb {\theta }) + f^{\text{add}}_{\text{syst} }\!(\pmb {\theta })/\bar{n}_{\rm g}\!(\pmb {\theta }) = \alpha t (\pmb {\theta })$|⁠. The mode projection formalism neglects the multiplicative term in equation (2) under the assumption that |$f_{\text{syst}} \delta _{\rm g} \ll 1$|⁠, so the overdensity equation becomes

$$\begin{eqnarray} \hat{\delta }_{\rm g}(\pmb {\theta }) = \delta _{\rm g}(\pmb {\theta }) + \alpha t(\pmb {\theta }). \end{eqnarray}$$
(3)

The contamination parameter is estimated using the covariance of the template map and the galaxy field and the variance of the template map itself,

$$\begin{eqnarray} \alpha = \frac{\hat{\sigma }_{\text{tg}}^2}{\sigma _{\text{tt}}^2}, \end{eqnarray}$$
(4)

where |$\sigma _{\text{tg}}^2$| here represents the covariance of the t and |$\hat{\delta }_{\rm g}$| maps over the whole footprint. In the full-sky case, the covariance is given by

$$\begin{eqnarray} \sigma _{\text{tg}}^2 = \frac{1}{4\pi }\sum _{\ell = 0}^{\infty } (2\ell + 1) \hat{C}_{\ell }^{\text{tg}}. \end{eqnarray}$$
(5)

We can write the same expression for |$\sigma _{\text{tt}}^2$| by replacing the cross-power spectrum with the auto power spectrum of the template maps. Note that in practice the covariances in equation (4) can be calculated at the pixel-level directly rather than by measuring the power spectra. The method can be straightforwardly extended to the case of multiple systematics, as well as multiple galaxy maps in the case of a tomographic analysis. Using this method, a cleaned version of the overdensity field can be obtained from equation (3), and the contamination-corrected galaxy power spectrum can be estimated from the resulting cleaned map. Note that calculating the power spectrum of the cleaned map will result in a biased estimator, and Elsner et al. (2016) detail how to properly debias the estimated power spectrum of the systematics-corrected map. This method is currently implemented in the open-source galaxy clustering code NaMaster (Alonso, Sanchez & and 2019). It is also worth noting that pseudo-Cl mode projection and ordinary least squares regression at the pixel level have been shown to be equivalent methods (Weaverdyck & Huterer 2021).

Template subtraction (Ross et al. 2011; Ho et al. 2012) takes a similar approach to mode projection, keeping the same model contamination of equation (3). However, the contamination parameters are calculated for each mode as

$$\begin{eqnarray} \alpha _{\ell } = \frac{\hat{C}^\text{tg}_{\ell }}{C^\text{tt}_{\ell }}, \end{eqnarray}$$
(6)

and then applied directly to the observed power spectra to obtain a biased estimate of the power spectra:

$$\begin{eqnarray} C^\text{gg}_{\ell } = \hat{C}^\text{gg}_{\ell }-\alpha _{\ell }^2 C^\text{tt}_{\ell }. \end{eqnarray}$$
(7)

Elsner et al. (2015) detail how to properly debias the estimated power spectra in both the full-sky and partial sky limit.

2.3.2 Iterative regression and ElasticNet

A more recent method developed by Elvin-Poole et al. (2018) uses a simple linear regression approach to iteratively clean the overdensity maps. For a systematic labelled i, with template |$t_i$|⁠, pixels are placed in evenly spaced bins based on the value of the template within that pixel. Using all of the evenly spaced bins (indexed j), regression parameters for each systematic i|$a_i, b_i$| are fit from the equation:

$$\begin{eqnarray} \frac{\langle \hat{n}_{\rm g} \rangle _j }{\bar{\hat{n}}_{g}} = a_i \langle t_i \rangle _j + b_i. \end{eqnarray}$$
(8)

Mock catalogues are used to estimate the covariance matrix of the binned quantities needed for fitting. This method is applied iteratively by identifying the systematics that more highly correlate with the density field and cleaning the map of the more dominant systematics first. It does this by assigning weights to all galaxies based on their pixel template value: |$w = 1/(a_i t + b_i)$|⁠. It then iteratively proceeds to the next most dominant contaminant performing the same process, which in practice is equivalent to applying a product of weights to the density field. The process stops once a desired significance level (for the defined null tests) is achieved. A final set of weights (the product of each iteration) is then assigned to each galaxy when calculating the corrected estimate of the two-point correlation function.

Finally, Weaverdyck & Huterer (2021) introduced the use of ElasticNet, a linear regression model with L1 and L2 regularization, as a mitigation method. The method uses an additive contamination model to estimate the contamination parameters for each template initially. Then, an additional map-level correction for the templates assumed to be multiplicative is done to account for multiplicative systematics. They also derive the impact of multiplicative systematics on the overdensity covariance and incorporate it into the likelihood. The likelihood for fitting the templates uses a contamination model from equation (2) with |$f_{\text{syst}}^{\text{add}} = 0$| and |$f_{\text{syst}} \!(\pmb {\theta }) = \pmb {\alpha }\boldsymbol {\mathsf {T}} \!(\pmb {\theta })$|⁠, where |$\pmb {\alpha }$| consists of a vector of contamination parameters and |$\boldsymbol {\mathsf {T}}$| is a matrix containing the systematics templates for every pixel. The contamination parameters are fit for by minimizing the loss function

$$\begin{eqnarray} \text{Loss} = \frac{1}{2N_{\text{pix}}}||\hat{\delta }_{\rm g}\!(\pmb {\theta })-\pmb {\alpha }\boldsymbol {\mathsf {T}} \!(\pmb {\theta })||^2 + \lambda _1||\pmb {\alpha }||_1 + \frac{\lambda _2 }{2}||\pmb {\alpha }||_2^2 , \end{eqnarray}$$
(9)

where |$\hat{\delta }_{\rm g}$| represents the observed galaxy overdensity, and |$||\cdot ||_1$| and |$||\cdot ||_2$| are the L1-norm and L2-norm, respectively. These two norms are meant to incentivize sparsity in the estimated parameters (L1-norm) and address possible correlations (L2-norm) between template maps. The correction to the galaxy power spectra then follows a similar procedure to that of mode projection described earlier. However, as previously mentioned, the cleaned maps are estimated by taking into account the multiplicative bias for the templates assumed to have this effect, which was neglected in previous methods. The result is a cleaned version of the overdensity map, which we call here |$\hat{\delta }^{\text{clean}}_{\rm g}$|⁠:

$$\begin{eqnarray} \hat{\delta }^{\text{clean}}_{\rm g} \!(\pmb {\theta }) = \frac{\hat{\delta }_{\rm g} \!(\pmb {\theta }) -f_{\text{syst}} \!(\pmb {\theta })}{1+ f_{\text{syst}}\!(\pmb {\theta })} . \end{eqnarray}$$
(10)

When correcting for the impact of systematics on the angular correlation function, a set of weights for each pixel can be produced in a similar manner to iterative regression, where the weight map is given by |$\pmb {w}_T = (1+\pmb {\alpha }\boldsymbol {\mathsf {T}})^{-1}$|⁠, and |$w_{T,j}$| is the weight of pixel j. It is important to note that this method does not need mock catalogues, and finds the regularization parameters through a form of cross-validation and partition of the maps.

While ElasticNet represents an advance in correcting for both additive and multiplicative systematics in the galaxy overdensity, it relies on a priori information regarding which templates are multiplicative without a means for empirically choosing between the two. This becomes a special case of equation (2), which shows that if there are additive and multiplicative systematics in the number density, the result is different additive and multiplicative systematics in the overdensity. Therefore, incorrect a priori choices of which templates are multiplicative can lead to biases in the corrected overdensity map.

2.3.3 Non-linear methods

There are a variety of non-linear machine-learning methods (e.g. Rezaie et al. 2020; Johnston et al. 2021) that attempt to correct for systematic contamination in galaxy clustering. For example, Rezaie et al. (2020) use a similar contamination model as equation (2), but instead of assuming a specific linear form for the contamination, use a fully connected forward neural network to solve the regression problem. The learned contamination function can then be applied to weight each pixel and correct the power spectra for systematics contamination. Johnston et al. (2021) use self-organizing maps (Kohonen 1990) to mimic the systematic-contaminated maps and subtract the spurious modes from the clustering signal.

2.4 Summary of methods

The correction techniques discussed in the previous subsection have achieved notable success in reducing the impact of systematic contamination on clustering measurements and have been applied to a wide range of survey data and simulations. Most of these methods are built on several key assumptions, such as the ability to treat systematics contamination as fully additive, or fully multiplicative, rather than a combination of both. While methods with these assumptions have often been sufficient for existing data sets, it is by no means clear that they will be sufficient for data sets with smaller statistical uncertainties. For example, neglecting multiplicative terms (that is, |$\delta _{\rm g}$| terms in equation 2) in the correction, or assuming they are the same as the additive term, could result in uncorrected biases (Shafer & Huterer 2015; Weaverdyck & Huterer 2021) in cases where those terms exist and differ from the additive systematic term.

In addition, understanding the existence and level of both the additive and multiplicative systematics terms in the contamination model can potentially help us understand empirically how systematics contaminate the observed galaxy number density. This knowledge could be used to identify areas for improvements in image-processing algorithms.

These challenges highlight the need for further advances in correction techniques and modelling that can address the more general model for systematics in galaxy overdensities shown in equation (2). This motivates us to develop a generalized and flexible approach that can accurately model and remove systematic contamination in clustering while providing some understanding of the functional dependence on |$\delta _{\rm g}$|⁠. This provides a level of interpretability to the results that non-linear ML methods, such as neural networks, often lack.

3 METHODS

This section describes the method introduced in this work. First, we introduce the specific contamination model used to quantify and correct for systematic contamination in the one- and two-point functions. The likelihood function chosen to characterize the galaxy overdensity field is defined with the inclusion of possible contamination, along with statistical methods to compare different modelling choices. Next, we highlight the impact of noise and how to remove biases it introduces in the correction of two-point correlation functions. Finally, we describe possible selection effects when dealing with real data and mocks.

3.1 Systematics contamination model

We will start from equation (2) and initially consider the case of one systematic, described by some template t. In our formulation of the systematics model, we choose to define the systematic functions in terms of template overdensities |$\delta _t \!(\pmb {\theta })$|⁠. Similar to other works, we choose |$f_{\text{syst}} \!(\pmb {\theta }) = \alpha \delta _t \!(\pmb {\theta })$|⁠, but now consider |$f^{\text{add}}_{\text{syst}} \!(\pmb {\theta }) = \beta \delta _t \!(\pmb {\theta })$|⁠. So equation (2) becomes as follows:

$$\begin{eqnarray} \hat{\delta }_{\rm g} \!(\pmb {\theta }) = \delta _{\rm g} \!(\pmb {\theta }) + a \delta _t \!(\pmb {\theta }) + b \delta _t \!(\pmb {\theta }) \delta _{\rm g} \!(\pmb {\theta }), \end{eqnarray}$$
(11)

where a and b are the contamination parameters, absorbing the previous constants: |$a = \alpha + {\beta }/{\bar{n}_{\rm g}}$| and |$b = \alpha$|⁠. This allows us to construct a nested model, for which the cases of purely additive (⁠|$b= 0$|⁠) and purely multiplicative (⁠|$a=b$|⁠) number density contamination are special cases. Since a given systematic template might have both additive and multiplicative effects on the number density, generalization beyond these simple models is physically motivated and allows us to test our assumptions about what contamination models we use. Also note that we have chosen to consider only terms linear in |$\delta _t$| since we can ‘feature engineer’ through non-linear transforms of one or more base templates such that the dependence is linear (Elvin-Poole et al. 2018; Weaverdyck & Huterer 2021).

This formalism can be generalized for the case of |$N_{\text{sys}}$| systematics maps, for which we would have

$$\begin{eqnarray} \hat{\delta }_{\rm g} \!(\pmb {\theta }) \approx \delta _{\rm g} \!(\pmb {\theta }) \left(1 + \sum _i^{N_{\text{sys}}} b_i \delta _{t_i} \!(\pmb {\theta }) \right) + \sum _i^{N_{\text{sys}}} a_i \delta _{t_i} \!(\pmb {\theta }). \end{eqnarray}$$
(12)

From now on, we will refer to the case where |$\lbrace a,b\rbrace$| are free as the ‘Combined’ model, the form when |$a = b$| as the ‘Multiplicative’ model, and the form when |$b = 0$| as the ‘Additive’ model, i.e.

$$\begin{eqnarray} \hat{\delta }_{\rm g} \!(\pmb {\theta }) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}\delta _{\rm g} \!(\pmb {\theta }) + \sum _i^{N_{\text{sys}}} a_i \delta _{t_i} \!(\pmb {\theta }), \text{ Additive} \\\delta _{\rm g} \!(\pmb {\theta })(1 + \sum _i^{N_{\text{sys}}} a_i\delta _{t_i} \!(\pmb {\theta })) + \sum _i^{N_{\text{sys}}} a_i\delta _{t_i}\!(\pmb {\theta }), \text{Multiplicative}\\\delta _{\rm g} \!(\pmb {\theta })(1 + \sum _i^{N_{\text{sys}}} b_i\delta _{t_i} \!(\pmb {\theta })) + \sum _i^{N_{\text{sys}}} a_i\delta _{t_i}\!(\pmb {\theta }), \text{Combined} \end{array}\right. \end{eqnarray}$$
(13)

As explained in Section 2.3, most current methods assume either additive or multiplicative contamination models in the galaxy number density. If the template maps are uncorrelated with each other and with the underlying true galaxy overdensity distribution, the observed galaxy two-point function would then be

$$\begin{eqnarray} \hat{w} (\theta) = \langle \hat{\delta }_{\rm g}\hat{\delta }_{\rm g}\rangle = \langle \delta _{\rm g}\delta _{\rm g}\rangle \left[1 + \sum _i^{N_{\text{sys}}} b_i^2\langle \delta _{t_i}\delta _{t_i}\rangle \right] + \sum _i^{N_{\text{sys}}} a_i^2\langle \delta _{t_i}\delta _{t_i}\rangle , \end{eqnarray}$$
(14)

and our estimate for the corrected two-point correlation function would be

$$\begin{eqnarray} w_{\text{corr}} (\theta) = \frac{\hat{w} (\theta)-\sum _i^{N_{\text{sys}}} a_i^2\langle \delta _{t_i}\delta _{t_i}\rangle }{1 + \sum _i^{N_\text{sys}} b_i^2\langle \delta _{t_i}\delta _{t_i}\rangle } . \end{eqnarray}$$
(15)

So in order to correct the clustering signal for systematics, we must estimate the contamination parameters |$\lbrace a_i, b_i \rbrace$| and measure the spatial correlation functions of the zero-mean template overdensities. Note that model misspecification can lead to biased corrections of the correlation function, as we will see in Section 5. Some of our main goals are to introduce a formalism with the flexibility to handle different types of systematics, to robustly correct the two-point function despite the increase in the number of parameters, and to gain an understanding of the functional contamination model empirically from data.

3.2 Correlated templates

The above formalism assumes systematics templates are uncorrelated, but generalizes easily to correlated systematics. To handle the case of correlated systematics templates, we use the eigenvectors of the template maps’ pixel covariance matrix to create an orthogonal set of template maps, |$\delta _t^{^{\prime }}$|⁠. Equations (14) and (15) can then be rewritten in the rotated frame to mitigate the clustering signal:

$$\begin{eqnarray} w_{\text{corr}}(\theta) = \frac{\hat{w}(\theta)-\sum _i^{N_{\text{sys}}} a_{i}^{^{\prime }2}\langle \delta ^{^{\prime }}_{t_i}\delta ^{^{\prime }}_{t_i}\rangle }{1 + \sum _i^{N_\text{sys}} b_{i}^{^{\prime }2}\langle \delta ^{^{\prime }}_{t_i}\delta ^{^{\prime }}_{t_i}\rangle }, \end{eqnarray}$$
(16)

where |$\lbrace a_{i}^{^{\prime }}, b_{i}^{^{\prime }} \rbrace $| are the contamination parameters in this new space. The contamination parameters in the original template space can be retrieved, if needed, by applying the inverse rotation matrix on these new parameters. We want the original parameters back as a way of evaluating parameter estimation and possibly understanding the functional form of contamination from systematics. Details of the derivation of the correlated case are in Appendix  A.

3.3 Likelihood function

To estimate the contamination parameters, we work directly with galaxy and systematic overdensity maps. We use healpy (Zonca et al. 2019) and HEALPix2 (Górski et al. 2005) to pixelize the data and create such maps from galaxy catalogues. Hence, the notation |$\delta _{t,j}$| refers to the average value of |$\delta _t$| from all galaxies in pixel j. The size of the pixels depends on the sample: it should be chosen such that pixels have a non-negligible number of galaxies, but are small enough to retain information about clustering on the smallest scales of interest. In this work, we used pixels of NSIDE  = 512, which have an area of |$\sim 47$| arcmin|$^2$|⁠.

We use a maximum-likelihood estimation approach to estimate the parameters of interest.

The simplest approach is to model the probability density as a Gaussian distribution, |$p(\delta _{\rm g}) \sim \mathcal {N}(0, \sigma _{\rm g})$|⁠. The conditional probability distribution |$p({\hat{\delta }_{\rm g}}|\boldsymbol {\mathsf {\delta _t}}, \pmb {a},\pmb {b})$| for uncorrelated pixels is then

$$\begin{eqnarray} {p_{\text{Gauss}}}({\hat{\delta }_{\rm g}}|\boldsymbol {\mathsf {\delta _t}}, \pmb {a},\pmb {b}, \sigma _{\rm g}) \propto \frac{1}{\prod _{j = 1}^N (|1+\pmb {b} \cdot \pmb {\delta }_{t,j}|)}\\ \quad \times \exp \left\lbrace \frac{-1}{2\sigma _{\rm g}^2}\sum _{j=1}^N \left(\frac{\hat{\delta }_{g, j} -\pmb {a} \cdot {\pmb {\delta }_{t,j}} }{1 + \pmb {b} \cdot {\pmb {\delta }_{t,j}} } \right)^2 \right\rbrace , \end{eqnarray}$$
(17)

where |$\boldsymbol {\mathsf {\delta _t}} = \lbrace \pmb {\delta }_{t_1}, ..., \pmb {\delta }_{t_{N_{\text{syst}}}} \rbrace$| and |$\pmb {a^T} = \lbrace a_1, ... , a_{N_{\text{syst}}}\rbrace$|⁠, |$\pmb {b^T} = \lbrace b_1, ... , b_{N_{\text{syst}}} \rbrace$|⁠, |$\pmb {\delta }_{t,j}$| is the jth column of the matrix |$\boldsymbol {\mathsf {\delta _t}}$|⁠, and |$\sigma _{\rm g}$| is the galaxy overdensity pixel-map variance. The summation goes over all N pixels in the map. The rows of the matrix |$\boldsymbol {\mathsf {\delta _t}}$| represent each systematic, while the columns represent the systematic values at a particular pixel. The denominator term is a normalizing factor needed to account for the multiplicative form of rearranging equation (12). In order to parametrize the |$\delta _{\rm g}$| distribution as a likelihood we need to understand what we expect its general form to be. By definition, |$\delta _{\rm g} \ge -1$| and |$\langle \delta _{\rm g} \rangle = 0$|⁠, so the distribution is bounded from below but is allowed to take arbitrarily large positive values. This means our distribution will naturally be skewed. For this reason, we also explore modelling the conditional probability of each pixel as a skewed Gaussian distribution. The resulting loglikelihood is

$$\begin{eqnarray} -\ln {\mathcal {L}}(\hat{\delta }_{\rm g}|\pmb {\delta _t}, \pmb {a},\pmb {b}, \sigma _{\rm g}, \gamma) = \frac{N}{2} \ln \left(2\pi \sigma _{\rm g}^2\right) + \sum _{j = 1}^N \ln (|1+\pmb {b} \cdot \pmb {\delta }_{t,j}|) \\\quad +\frac{1}{2\sigma _{\rm g}^2}\sum _{j=1}^N \left(\delta _{g,j}- \xi \right)^2-\sum _{j=1}^N \ln \left(1 + \text{erf}\left(\frac{\gamma (\delta _{g,j}-\xi) }{\sigma _{\rm g} \sqrt{2}}\right)\right),\\ \end{eqnarray}$$
(18)

where |$\delta _{\rm g}$| contains the in terms |$\lbrace {\hat{\delta }_{\rm g}},\boldsymbol {\mathsf {\delta _t}}, \pmb {a}, \pmb {b} \rbrace$| by re-arranging equation (12), |$\gamma$| is the skewness parameter, and |$\xi = - \sigma \frac{\gamma }{\sqrt{1 + \gamma ^2}}\sqrt{\frac{2}{\pi }}$| is the centre of the distribution, in this case, defined by assuming that the mean value of |$\hat{\delta }_{\rm g}$| is zero. The second term in the likelihood serves as a normalization factor. Note that this formula assumes pixels in our galaxy overdensity map are uncorrelated, even though we know this is not true. However, we show later in Section 5 that this approximation still enables us to accurately mitigate the systematics in the two-point correlation functions. For the case of the combined systematic model with |$N_\text{sys}$| templates, we fit |$2(N_\text{sys} + 1)$| parameters via the maximum likelihood. Both |$\pmb {a}$| and |$\pmb {b}$| consist of |$N_\text{sys}$| parameters used in mitigation, while |$\sigma$| and |$\gamma$| are nuisance parameters describing the overdensity distribution.

3.4 Distinguishing between likelihood models

To summarize, we have created a nested likelihood model described by equation (18) that can reduce to a simple Gaussian in the case of |$\gamma = 0$|⁠, and that reduces to additive or multiplicative systematic terms for particular relationships between |$a_i$| and |$b_i$| for systematic i. This formulation of our model allows us to easily compare models in terms of how well they describe the data using likelihood ratio tests. For shorthand, we represent all model parameters by |$\Theta = \lbrace \pmb {a},\pmb {b}, \sigma , \gamma \rbrace$|⁠. We define this ratio as

$$\begin{eqnarray} \lambda _\text{LR} &=& 2\ln \left(\frac{\max _{ \Theta }\mathcal {L}(\hat{\delta }_{\rm g}|\boldsymbol {\mathsf {\delta _t}},\Theta)}{\max _{\Theta _0 }\mathcal {L}(\hat{\delta }_{\rm g}|\boldsymbol {\mathsf {\delta _t}},\Theta _0)} \right) \\&=& 2 [\ln \mathcal {L}(\hat{\delta }_{\rm g}|\boldsymbol {\mathsf {\delta _t}},\hat{\Theta })-\ln \mathcal {L}(\hat{\delta }_{\rm g}|\boldsymbol {\mathsf {\delta _t}},\hat{\Theta }_0)], \end{eqnarray}$$
(19)

where |$\mathcal {L}$| is the likelihood defined in equation (18), |$\Theta$| and |$\Theta _0$| are the unrestricted and restricted parameter spaces, and the hatted variables represent their best-fitting parameter estimates. So, for example, when we want to test whether the additive systematic model is sufficient to describe systematics due to template |$t_i$|⁠, we can define |$\Theta _0 = \lbrace \Theta : b_i = 0 \rbrace$|⁠. Similarly when we want to test the sufficiency of a Gaussian likelihood to describe the data, we can define |$\Theta _0 = \lbrace \Theta : \gamma = 0 \rbrace$|⁠. In a likelihood ratio test formulated this way, the null hypothesis is that the data can be adequately described by a simpler model, which usually translates to a model with fewer parameters or restrictions. If the null hypothesis is true, the quantity |$\lambda _\text{LR}$| defined earlier is asymptotically |$\chi ^2$| distributed. In order to reject the null hypothesis (thereby preferring a more complex model to describe the data), we need to define some critical value z such that we can reject the null hypothesis if |$\lambda _\text{LR} \gt z$|⁠. This critical value is often taken to be |$z \approx F^{-1}(1-\alpha , r)$|⁠, where F is the cumulative distribution function of a |$\chi ^2$| distribution, |$\alpha$| is the probability of incorrectly rejecting the null hypothesis, and r the number of degrees of freedom. The value of |$\alpha$| is selected based on the desired risk of incorrectly rejecting a simpler model for a more complicated one.

This test provides an approach to model selection, and hence a general understanding of the validity of the assumptions underlying our models, that takes as default the simpler (null) model, and hence requires strong evidence to reject that model in favour of the more complicated alternative. This reflects our general preference for avoiding overparameterized models. These tests are enabled by the nested structure of the models under consideration. However, ultimately we must draw conclusions about the success of our method based on how well they enable us to correct the two-point correlation functions.

3.5 Noise debiasing

The correction for biases in the clustering signal in equation (16) uses the squares of the estimated contamination parameters (⁠|$a^2, b^2$|⁠). Under the assumption that our systematics model correctly describes the systematics contamination, the estimated parameter |$\hat{a}_i$| can be modelled as the true parameter |$a_i$| plus zero-mean noise: |$\hat{a}_i = a_i + n_i$|⁠. In this case, |$\hat{a}_i$| is an unbiased estimate for |$a_i$|⁠, but its square |$\hat{a}_i^2$| is a biased estimator for |$a_i^2$|⁠:

$$\begin{eqnarray} \langle \hat{a}_i^2 \rangle = \langle (a_i + n_i)^2 \rangle = \langle a_i^2 \rangle + \langle n_i^2 \rangle . \end{eqnarray}$$
(20)

The unbiased squared terms that should be used in the estimated correction in equation (16) are therefore

$$\begin{eqnarray} a_i^2 = \hat{a}_i^2-\text{Var}[\hat{a}_i], \end{eqnarray}$$
(21)

where |$\text{Var}[\hat{a}_i]$| here represents the noise variance of the parameter |$a_i$|⁠. The same considerations and debiasing procedure apply to the b parameters. In the limit of purely additive systematics, the noise debiasing presented here is in essence equivalent to the debiasing methods developed in mode projection and template subtraction. Note that failure to debias the squared parameters can lead to significant overcorrection of the two-point function. This is shown quantitatively in fig. B1 of Appendix  B. This illustrates the necessity of removing the noise bias in the estimate of the squared parameters in order to obtain an unbiased estimate of the clustering signal. This particular debiasing method differs from that of methods like ElasticNet, which deal with the bias-variance tradeoff by using data-calibrated L1 and L2 regularization, as mentioned in Section 2.3.2. Meanwhile, this approach balances this tradeoff using a direct de-biasing technique similar to that of mode projection and template subtraction. This approach is straightforward to implement if mock catalogues are available to estimate the variance in the systematics parameters |$\lbrace \hat{a}_i, \hat{b}_i\rbrace$|⁠. However, in the absence of mock catalogues, we will discuss in Section 6 a bootstrapping approach to estimating the noise without the need for mock catalogues. It is important to note that the parameter noise is influenced by multiple factors: (1) the specific form of the likelihood for and power spectra of |$\delta _{\rm g}$| and |$\delta _t$|⁠, (2) the survey area coverage and pixelization, and (3) the number of parameters being jointly estimated. Different contaminants will, in general, have different noise levels, and being able to empirically estimate the noise in the systematics parameters is crucial.

3.6 Selection effects

In both mock catalogues and real data, we need to account for selection effects that systematically modulate the observed number density. For example, these effects can be caused by observing conditions and survey design. Different completeness levels across the survey area can bias the selection of observed galaxies. We account for these effects using random catalogues to normalize the observed galaxy number density in every pixel.

$$\begin{eqnarray} \hat{n}_{g}\!(\pmb {\theta }) \rightarrow f_{\text{selec}} \!(\pmb {\theta })\hat{n}_{g}\!(\pmb {\theta }); f_{\text{selec}}\!(\pmb {\theta }) =\frac{N^{\text{tot}}_{g,\text{rand}}}{N^{\text{tot}}_{\rm g} } \frac{1}{\hat{n}_{g,\text{rand}}\!(\pmb {\theta })}, \end{eqnarray}$$
(22)

where |$N^{\text{tot}}_{\rm g}$| is the total number of galaxies in our sample and |$N^{\text{tot}}_{g,\text{rand}}$| is the total number of random galaxies. Their ratio serves as a normalizing factor given that our random catalogues have higher number density than our survey catalogues. In addition, we remove pixels in which our random catalogue has very few galaxies to avoid extreme outliers and division by zero. Specifically, we calculate the maximum number of galaxies in a pixel for our random catalogue, and discard all pixels that contain fewer than 10 per cent of this quantity. In practice this resulted in removing less than |$1~{{\ \rm per\ cent}}$| of the total area of the mocks used, which are described in Section 4.1. Note that since we carry out this process using the random catalogues, the results simply remove any survey regions that fall into pixels with very little survey coverage.

4 DATA, SOFTWARE, AND ANALYSIS

This section describes the data and software choices used to test the methodology described in the previous section. We start by describing the mock extragalactic galaxy catalogue used and the process of generating mock systematic template maps. Following this, we explain the software used to fit for the systematics model using the one-point function and estimate the two-point function. Finally, we describe the process of quantifying cosmological impact using the data and tools in this section.

4.1 Mock extragalactic galaxy catalogue

To validate our methodology we use the KiDS-like halo occupation distribution (HOD) N-body simulations (KiDS–HOD) produced by Harnois-Deraps et al. (2018). This set of galaxy mock catalogues is produced from the wider set of SLICS (Scinet LIght Cone Simulations) described in Harnois-Déraps & van Waerbeke (2015). The 1025 N-body simulations used in SLICS are produced by the gravity solver CUBEP|$^3$|M (Harnois-Déraps et al. 2013), where each run follows |$1536^3$| particles inside a grid cube of comoving side length |$L_{\text{box}} = 505 \,\, h^{-1}$| Mpc. The N-body simulation produces a full light cone reconstruction, which requires simulating the non-linear evolution of the particles from a desired redshift, as well as generating halo catalogues. The generated halo catalogues are then used, along with a given HOD model, to assign a certain galaxy population to every host halo. The HOD model used to produce these mocks is described in Smith et al. (2017) and it is referred to as the GAMA HOD model in Harnois-Deraps et al. (2018). For the KiDS–HOD galaxy mocks, the HOD model is calibrated to reproduce key properties of the KiDS Survey (Hildebrandt et al. 2016). The galaxies are assigned up to redshift |$z_{\text{spec}} = 2.0$| and contain photometric redshifts and lensing information as well. Note that we chose to implement our method directly on the two-point function (in configuration space), which is why we chose to work directly with galaxy positions, rather than looking at contamination in the power spectra. We used a set of 119 publicly available KiDS-HOD mock catalogues3 with total sky coverage of |$100 \,\, \text{deg}^2$|⁠, along with an associated random catalogue. The random catalogue has a density approximately 10 times higher than that of the mocks. We work with only one low-redshift tomographic bin (⁠|$0.2 \lt z_{\text{spec}} \lt 0.4$|⁠) as a proof of concept for the methodology presented in this work. Table 1 summarizes key numbers for the mocks used for this work. The catalogues are pixelized with NSIDE  = 512 (pixel area |$\sim 49\,\, \text{arcmin}^{2}$|⁠), meaning that a typical pixel has |$\sim 40$| galaxies.

Table 1.

KiDS–HOD mocks redshift binning used and galaxy number density information. Since we have 119 realizations and one random catalogue, number and density information on the mock catalogues is given as an average over all realizations.

Mocks Information
z range|$0.2 \lt z \lt 0.4$|
|$N_{\text{realizations}}$|119
|$\overline{N}_{\text{gal}}$|303 371
|$\overline{n}_{\text{gal}}$| (arcmin|${^{-2}}$|⁠)0.84
Number of random galaxies3 036 000
Mocks Information
z range|$0.2 \lt z \lt 0.4$|
|$N_{\text{realizations}}$|119
|$\overline{N}_{\text{gal}}$|303 371
|$\overline{n}_{\text{gal}}$| (arcmin|${^{-2}}$|⁠)0.84
Number of random galaxies3 036 000
Table 1.

KiDS–HOD mocks redshift binning used and galaxy number density information. Since we have 119 realizations and one random catalogue, number and density information on the mock catalogues is given as an average over all realizations.

Mocks Information
z range|$0.2 \lt z \lt 0.4$|
|$N_{\text{realizations}}$|119
|$\overline{N}_{\text{gal}}$|303 371
|$\overline{n}_{\text{gal}}$| (arcmin|${^{-2}}$|⁠)0.84
Number of random galaxies3 036 000
Mocks Information
z range|$0.2 \lt z \lt 0.4$|
|$N_{\text{realizations}}$|119
|$\overline{N}_{\text{gal}}$|303 371
|$\overline{n}_{\text{gal}}$| (arcmin|${^{-2}}$|⁠)0.84
Number of random galaxies3 036 000

4.2 Mock systematic maps

In order to develop and test our method in a realistic scenario, it is important to generate systematics template maps that emulate the structure and distribution of systematics encountered in real data that will bias the observed number density. To do so, we use the PSF area size (a proxy for seeing) as our test case, given that this is a well-known observational parameter that can bias the observed galaxy number density depending on how the detection process is carried out, especially for ground-based surveys where regions with worse seeing can exhibit more severe challenges in deblending overlapping galaxy light profiles. We start by measuring the PSF area size autocorrelation function from HSC Public Data Release 1 (Aihara et al. 2018a, b). From a pixelized map of seeing size, with NSIDE  = 512, we can construct a map of the seeing-size overdensity, which is what we use to calculate the autocorrelation function using the estimator described in Section 4.3. We then fit the correlation function to a declining exponential, as it fits the data quite well in the angular scales between 1 and 500 arcmin, which fall within the scales of interest for measuring clustering. With this fit and the use of CAMB (Lewis, Challinor & Lasenby 2000), we go from the correlation function to the power spectrum (⁠|$w^{tt} \rightarrow C_{\ell }^{tt}$|⁠), where t here refers to the seeing size overdensity template we have constructed from the data. We then fit the power spectrum to an exponential function, as it once again provides a good visual fit to the data in the scales we are interested for clustering (⁠|$40 \lt \ell \lt 1500$|⁠), and find the best fit to be |$C_{\ell }^{tt} \propto \exp [-\ell /6]$|⁠. However, since the scales probed by HSC are much larger than those we measure in the KiDS–HOD mocks, we adjust the scale of the exponential decay to |$1/500$|⁠, to emulate the overall behaviour of the correlation function in the range of scales for this analysis. Note that for this validation test, we are not looking to closely fit for the seeing power spectrum, but rather to have an approximate idea of its spatial correlations as a function of scale. We can then use HEALPY (Zonca et al. 2019) to generate full-sky template overdensity maps with NSIDE  = 512, and only use the pixels in the same area as our KiDS catalogues. The above description was for the case of a single systematic. However, we will be interested in generating multiple systematics from different power spectra to introduce a realistic level of variety in the spatial correlations of the systematics. We use five different power spectra to represent five ‘families’ of systematics, where some of the forms are similar to those used in Weaverdyck & Huterer (2021):

  • |$C_{\ell }^{tt} \propto e^{-\ell /500}$|

  • |$C_{\ell }^{tt} \propto e^{-(\ell /250)^2}$|

  • |$C_{\ell }^{tt} \propto (\ell +1)^{-2}$|

  • |$C_{\ell }^{tt} \propto (\ell +1)^{-1}$|

  • |$C_{\ell }^{tt} \propto (\ell +1)^0$|

Fig. 2 shows the correlation function for synthetic systematic maps generated using the above power spectra, as well as the HSC PSF area size correlation function, calculated from HSC PDR1, for comparison. Note that the full-sky maps generated using these power spectra are cut to match the area of the KiDS-HOD mocks, resulting in smaller available scales than those measured for HSC. We see that the overall shape of the correlation function differs for each systematic family, while still being consistent with the expected behaviour we see from real data. For example, the flattening of the correlation function at smaller scales and exponential decay at larger scales is seen both the real data and in the synthetic maps. The actual scale on which the correlation function decline differs in the HSC data and the mocks by design, as described above. This figure shows that despite the limited area coverage of the mock catalogues that drove us to use smaller scales than we would in a realistic analysis, the shape of systematic correlations across the scales used is non-trivial and depicts the complexity we can expect for real systematics. Maps can then be generated using a choice from these families, and in practice, all maps are normalized to have equal variance. We can see that for a fixed variance, there will be different patterns in the spatial correlations for these families of power spectra. For example, family (iii) has less power for higher |$\ell$| modes or smaller spatial scales than most of the other families, while family (v) can be identified as white noise.

Example of the correlation function for systematics generated through synthetic map-making (solid lines) using one of the five systematics families (different colours) described in Section 4.2 and for the PSF area size residuals in HSC (black points). For visualization purposes, the correlation functions are normalized by their value at $\theta = 1\ \text{arcmin}$. Note that the solid lines are measured on scales appropriate for the analysis done using KiDS-like mocks, which has considerably less area than HSC, resulting in the elimination of larger scales. As we can see, the overall form of the two-point function from our synthetic maps varies amongst different systematic families, albeit with some similarities between certain families. Comparing the solid lines with the real data, we see that the overall exponential behaviour of the correlation function and flattening at small scales is similar amongst the systematic families and real data, with the obvious difference that the exponential decay occurs at different scales by design.
Figure 2.

Example of the correlation function for systematics generated through synthetic map-making (solid lines) using one of the five systematics families (different colours) described in Section 4.2 and for the PSF area size residuals in HSC (black points). For visualization purposes, the correlation functions are normalized by their value at |$\theta = 1\ \text{arcmin}$|⁠. Note that the solid lines are measured on scales appropriate for the analysis done using KiDS-like mocks, which has considerably less area than HSC, resulting in the elimination of larger scales. As we can see, the overall form of the two-point function from our synthetic maps varies amongst different systematic families, albeit with some similarities between certain families. Comparing the solid lines with the real data, we see that the overall exponential behaviour of the correlation function and flattening at small scales is similar amongst the systematic families and real data, with the obvious difference that the exponential decay occurs at different scales by design.

4.3 Measuring galaxy clustering and contamination parameters

We estimate the galaxy two-point correlation function using the Landy & Szalay estimator (Landy & Szalay 1993) implementation in TreeCorr4 (Jarvis, Bernstein & Jain 2004):

$$\begin{eqnarray} w(\theta) = \frac{\rm DD-2DR + RR}{\rm RR}, \end{eqnarray}$$
(23)

where D and R represent the galaxy and random catalogues respectively, and DD, RR, and DR are the number of galaxy pairs. In our specific implementation, the two-point function is calculated in 15 logarithmically separated bins in angle |$\theta$| between 0.06 and 30 arcmin, a choice made given the small area of our mocks. The pixel size (⁠|$\sim \!7$| arcmin) used to estimate the contamination parameters is much larger than the smallest scales used to measure clustering here. The pixel size was chosen in the context of the scales used for current survey science using galaxy clustering (Rodríguez-Monroy et al. 2022). The reason we include scales much smaller than the pixel size in our analysis is due to the limited size of the mock catalogues (⁠|$100 \text{ deg}^2$|⁠). Capping our analysis to scales commonly used for measuring clustering in surveys (⁠|$\sim 5{-}200$| arcmin) would have strongly limited the range of scales we can use for this analysis, given that larger scales are greatly affected by low-statistics noise. In addition, the large pixel scale does not cause an issue in our tests since this pixel size was self-consistently used both to construct the template maps and to correct for their contamination. For this reason, we include smaller scales than the pixel scale in our analysis, and comment on the fidelity of the corrections in Section 5.2. Covariance matrices for the estimated two-point functions are calculated using all 119 mocks. That is, the covariance of two angular separation bins is given by

$$\begin{eqnarray} C_{\alpha \beta } = \frac{1}{N_{\text{realizations}}}\sum _k^{N_{\text{realizations}}}(w_k(\theta _{\alpha })-\bar{w}(\theta _{\alpha }))(w_k(\theta _{\beta })-\bar{w}(\theta _{\beta })), \end{eqnarray}$$
(24)

where |$N_{\text{realizations}} = 119$| in our case,

the subscripts |$\alpha , \beta$| refers to the angular separation bins |$\theta _{\alpha }$| and |$\theta _{\beta }$|⁠, |$w_k$| is the two-point function of the kth mock catalogue, and |$\bar{w}(\theta _{\alpha })$| is the mean correlation function of bin |$\alpha$| over all mock catalogue realizations.We estimate the contamination parameters described in equation (12) at the level of the one-point function (i.e. the PDF of |$\delta _{\rm g}$| values) by sampling the parameter space using a Monte carlo Markov Chain (MCMC) algorithm with emcee (Foreman-Mackey et al. 2013) and choosing the parameters corresponding to the maximum-likelihood point. Even though parameters can be more efficiently obtained by using an optimizer (such as gradient descent), we use MCMC to obtain estimates of the parameter noise from single mock realizations (without the need to assume a Gaussian likelihood) and compare them to those from multiple mock realizations. The algorithm needs a user-specified number of walkers to explore the parameter space for a certain number of iterations. The number of walkers should generally be at least twice the number of parameters, while the number of iterations needed to converge might depend on the number of pixels used. For this work, we used 250 walkers and 1500 iterations. We evaluate convergence heuristically by running the MCMC for 5000 iterations, on a single mock realization, using the maximum number of parameters considered in this work. We confirm that the mean and variance of each parameter, calculated across all chains, stabilizes to the 5  per cent level on average after 1500 iterations. We also confirmed that increasing the number of walkers or iterations did not considerably decrease the integrated autocorrelation time of the MCMC chain, which was approximately 90 for a 52-parameter inference.

4.4 Quantifying cosmological impact

We quantify the possible cosmological impact of residual contamination in the clustering signal by modelling the corrected estimate of the clustering signal as a function of the true signal modulated by an amplitude A:

$$\begin{eqnarray} w_{\text{corr}}(\theta) = A w_\text{true}(\theta) \equiv (1+\Delta A) w_\text{true}(\theta), \end{eqnarray}$$
(25)

where |$\Delta A = A-1$| quantifies the bias in the corrected clustering signal. This can be understood in the context of 3|$\times$|2pt analysis, where clustering and galaxy-galaxy lensing are used to constrain the galaxy bias, as explained in Section 2.1. The most relevant dependence in this context is that of |$\sigma _8$|⁠, a measure of clustering strength, which in the linear regime is related to the galaxy bias, b, and galaxy clustering, |$w_{gg}$|⁠, by the relationship: |$w_{gg}\propto b^2 \sigma _8^2$|⁠. Measuring any biases on the amplitude of |$w_{gg}$| will determine how accurately the |$b\sigma _8$| combination can be measured. In our validation case of 119 mocks, the value of A can be obtained for every individual mock catalogue using the corrected and true estimates of the two-point function by minimizing the |$\chi ^2$|⁠:

$$\begin{eqnarray} \chi ^2 = (w_{\text{corr}}-Aw_{\text{true}})^T \,\, \boldsymbol {\mathsf {C}}_{\text{corr}}^{-1} (w_{\text{corr}}-Aw_{\text{true}}), \end{eqnarray}$$
(26)

We use the full range of scales from the measured clustering (0.06–30 arcmin) in this fitting process. Doing this fit for every mock catalogue, we can compute the average over all realizations, |$\bar{A}$|⁠, to obtain a measure of the bias. The covariance matrix |$\boldsymbol {\mathsf {C}}_{\text{corr}}$| here is taken from all realizations of the corrected clustering signal and is calculated using equation (24). We are interested in the estimated bias on the corrected two-point function, |$\Delta \bar{A} = \bar{A}-1$|⁠, as the success metric to quantify the significance of residual bias in the corrected two-point function. Note that using the realization-specific |$w_{\text{true}}$| and |$w_{\text{corr}}$| to estimate A for every mock catalogue largely removes the impact of cosmic variance on A, reducing the uncertainties in our analysis.

4.5 Summary: step-by-step procedure

We summarize the previous sections with a step-by-step procedure for generating and analysing the data of a single mock catalogue.

  • Measure the true correlation function of the extragalactic galaxy catalogue described in Section 4.1 and generate pixelized maps with |$\text{NSIDE} = 512$|⁠.

  • Generate |$N_{\text{sys}}$| full-sky template maps using HEALPY from the desired selection of the five systematics families described in Section 4.2. Mask the full-sky map to only cover the same area as the galaxy mock catalogue. Compute the correlation function for all systematics by assigning each galaxy the corresponding template values for the pixel in which the galaxy is located.

  • From the pixelized versions of the galaxy field (⁠|$\delta _{\rm g}$|⁠) and systematics maps (⁠|$\delta _t$|⁠), we generate the contaminated galaxy field (⁠|$\hat{\delta }_{\rm g}$|⁠) using equation (12) from a chosen set of contamination parameters. We then calculate the contaminated clustering signal using equation (14).

  • With the observed galaxy overdensity field and systematics maps as inputs, the contamination parameters are jointly estimated as prescribed in Section 4.3 by using an MCMC algorithm with the chosen likelihood model (Gaussian or skewed Gaussian) and contamination model (additive, multiplicative, or combined).

  • The contaminated clustering signal is corrected using equation (15).

This procedure is followed for all 119 mock realizations, yielding an ensemble of results used to quantify decontamination quality, parameter estimation and noise, and cosmological impact.

5 RESULTS

In this section, we show the results of applying our method to mock galaxy catalogues. We will first apply our method to mock catalogues without any systematics, to explore differences between our use of a Gaussian or skewed Gaussian distribution for |$\delta _{\rm g}$| (Section 5.1). We will then test our systematics mitigation method in different scenarios varying in complexity to evaluate its performance. The outcome of that exercise, which will be detailed in subsections below, may be summarized briefly as follows:

  • Section 5.2 shows that systematics mitigation performance is especially sensitive to choice of systematic model.

  • The use of a nested contamination model (here defined as ‘Combined’) allows flexibility in defining a model that can describe the multisystematic case in Section 5.3.

  • Section 5.4 shows that our method can robustly mitigate systematic contamination in the clustering signal to the sub-per cent level, reducing the bias on the two-point function to below |$0.1\sigma$|⁠.

5.1 No systematics

Before investigating systematics mitigation, we first explore the underlying distribution we are trying to model. The likelihood function in equation (18) is based on certain assumptions about the true |$\delta _{\rm g}$| distribution. For the case, where we cannot assume |$|\delta _{\rm g}|\ll 1$|⁠, we know that the underlying distribution cannot be Gaussian, as |$\delta _{\rm g}$| is mathematically bounded to be greater than |$-1$| but can take any positive value. As seen in Fig. 3, we use the true |$\delta _{\rm g}$| field from our mock catalogues (filled histogram) and fit its distribution using both a Gaussian and a skewed Gaussian likelihood function. For both likelihoods, we fix any contamination parameters {|$a,b$|} to zero and allow |$\sigma$| to vary, while we fix |$\gamma = 0$| for the Gaussian likelihood and let it vary for the skewed Gaussian. The solid lines in the left-hand panel show that the skewed Gaussian likelihood provides a better fit to the data than the Gaussian likelihood, given the skewed distribution of |$\delta _{\rm g}$| values.

Filled histogram shows the distribution of true galaxy overdensity $\delta _{\rm g}$ for one systematics-free mock catalogue from the KiDS–HOD galaxy simulation. The solid lines show the results of maximum likelihood estimation fits for the distribution with a skewed Gaussian (green) or Gaussian (purple) likelihood. All contamination parameters are fixed to zero while fitting, while $\sigma$ is free to vary in both likelihood forms. The parameter $\gamma = 0$ for the Gaussian likelihood but is free to vary for the skewed Gaussian. The dotted lines show the same fit after removing pixels with $\delta _{\rm g} \ge 1.6$ as potential outliers.
Figure 3.

Filled histogram shows the distribution of true galaxy overdensity |$\delta _{\rm g}$| for one systematics-free mock catalogue from the KiDS–HOD galaxy simulation. The solid lines show the results of maximum likelihood estimation fits for the distribution with a skewed Gaussian (green) or Gaussian (purple) likelihood. All contamination parameters are fixed to zero while fitting, while |$\sigma$| is free to vary in both likelihood forms. The parameter |$\gamma = 0$| for the Gaussian likelihood but is free to vary for the skewed Gaussian. The dotted lines show the same fit after removing pixels with |$\delta _{\rm g} \ge 1.6$| as potential outliers.

However, we found that this difference was mostly due to potential outliers, which once removed (shown as the dashed lines) provided better fits to the distribution and agreement in the estimated variance. Physically, the outliers represent regions of very high galaxy density. The cut |$\delta _{\rm g} \lt 1.6$| was made to test this. Note that in a real cosmological analysis, larger scales will be used and the variance will therefore be smaller.

It is important to highlight that this cut was made in the true |$\delta _{\rm g}$| distribution, which we do not have in real data. Therefore, this cut is purely for illustration purposes and is not applied at any point in the analyses later in this section. However, it is important to point out that the fits with the Gaussian distribution were especially sensitive to these large |$\delta _{\rm g}$| values. As we can see from the right panel of Fig. 3, the outliers contributed a considerable portion of the overall estimated variance despite being a small fraction (⁠|$\sim \! 1~{{\ \rm per\ cent}}$|⁠) of the pixels. In contrast, the skewed Gaussian likelihood seems to be less sensitive to outliers. This consideration is relevant because real data will have outliers for a variety of reasons (e.g. survey window complexity not represented in these mock catalogues). Our results indicate that the skewed Gaussian likelihood may be able to more accurately estimate the true variance of the |$\delta _{\rm g}$| distribution and possibly improve estimates of the systematic contamination parameters modulating the variance, more specifically the b parameter in equation (11).

5.2 One systematic

We next test the method for a scenario with a single source of systematic contamination. We consider separately the cases of a single additive and a single multiplicative systematic. The true overdensity map is contaminated using a single systematic template map generated from the power spectrum (i) described in SubSection 4.2 to produce the observed galaxy overdensity map. The observed two-point function is calculated using equation (23). In each realization, we estimate the contamination using the parameters from the maximum likelihood estimation approach described in Section 3.3 by fitting for a single systematic map, then correct the two-point function using equation (15). We use different versions of the likelihood model in equation (18) to test different assumptions about the distribution of |$\delta _{\rm g}$| and its contamination. In Fig. 4, we show the fractional difference between the uncorrected, corrected and true clustering signals to evaluate performance. For context, the systematic correlation function is approximately two orders of magnitude smaller than the signal.

Fractional difference between the corrected and true two-point correlation functions in the presence of a single additive (top row) or multiplicative (bottom row) systematic contaminant. Right and left panels show the same success metric but are separated for visualization clarity. The black dash–dotted line shows the uncorrected two-point function, with contamination of at most 2 per cent. Solid lines show the results after correction using a Gaussian likelihood for the MLE, while dashed lines show the correction using a skewed Gaussian likelihood. The different colours show different contamination models used: additive (grey, left panels), multiplicative (orange, left panels), and combined (blue, right panels). The shaded contours show the $\pm 1\sigma$ regions for their respective curves based on multiple realizations of mock catalogues. In both rows, we see that using the incorrect contamination model leads to residual biases after the systematics correction, while correction with the combined model results in equivalent performance to using the true contamination model. For both the true and combined model, the residual systematics are below the 0.1 per cent level. There is no significant difference in performance between the skewed and Gaussian likelihoods when using the combined systematics model.
Figure 4.

Fractional difference between the corrected and true two-point correlation functions in the presence of a single additive (top row) or multiplicative (bottom row) systematic contaminant. Right and left panels show the same success metric but are separated for visualization clarity. The black dash–dotted line shows the uncorrected two-point function, with contamination of at most 2 per cent. Solid lines show the results after correction using a Gaussian likelihood for the MLE, while dashed lines show the correction using a skewed Gaussian likelihood. The different colours show different contamination models used: additive (grey, left panels), multiplicative (orange, left panels), and combined (blue, right panels). The shaded contours show the |$\pm 1\sigma$| regions for their respective curves based on multiple realizations of mock catalogues. In both rows, we see that using the incorrect contamination model leads to residual biases after the systematics correction, while correction with the combined model results in equivalent performance to using the true contamination model. For both the true and combined model, the residual systematics are below the 0.1 per cent level. There is no significant difference in performance between the skewed and Gaussian likelihoods when using the combined systematics model.

As expected, assuming the wrong model for the systematic contamination can result in large residual biases even after applying our systematics mitigation scheme, while choosing the correct model results in the removal of the systematic to below the 0.1 per cent level. The combined model performs almost identically to choosing the correct model, without substantially increased uncertainties despite the small increase in model complexity (one additional parameter). The similarities in performance between the true and combined model suggest that the combined model is a viable alternative for use in real data. The choice of the likelihood form (Gaussian versus skewed Gaussian) seems to have no significant impact on performance, so choosing the simpler Gaussian model is preferred in this case. However, given the generalized contamination model chosen, we felt it was worth revisiting the question of likelihood choice and test its impact directly. The marginal differences between likelihoods found here reinforce previous works’ conclusions regarding the Gaussian approximation for systematics mitigation in galaxy clustering (Rezaie et al. 2021; Weaverdyck & Huterer 2021).

As described in Section 3.5, we emphasize the importance of debiasing the estimated model parameters prior to estimating the corrected correlation function. We have confirmed that skipping this step results in substantial overcorrection, increasing the size of the residual biases by an order of magnitude in this case. The severity of the degradation will depend on the size of the parameter estimation noise. We also note that the estimated correction is not degraded below the pixel size used to estimate the contamination parameters, confirming our choice to use scales below the pixel size for this validation.

We used likelihood ratio tests to see whether we can identify any model preference from the data, focusing on the choice of combined versus the simpler additive and multiplicative models. We compare the use of both distributions in Table 2 for each systematic model. The columns show in what fraction of all mock catalogue realizations the likelihood ratio test could not reject the correct (and simpler) model. In practice, the inability to reject a simpler model can be used as a form of model selection in favour of that model. In general, this test correctly identifies the contamination model in the single systematic case; when using a skewed Gaussian likelihood, it correctly identified the additive or multiplicative model 100 and 99 per cent of the time, respectively. The Gaussian likelihood model resulted in 97 and 93 per cent accuracy. Here, too, there are small differences between using either likelihood model, but both produce strong results overall. This is a positive sign that likelihood ratio tests work well for simple cases such as the single systematic contamination and can potentially be used for model selection. In practice, we do not know what the true model is, so this tool can be used to compare systematic models. For a single systematic, the fitting can be done using all three contamination models used here (additive, multiplicative, and combined). Since the combined model serves as a nested model, the likelihood ratio for the additive and multiplicative models can be calculated to evaluate if either form is preferred over the other.

Table 2.

Model comparison using likelihood ratio tests for single systematic cases.

Likelihood formPer cent correct
AdditiveMultiplicative
Skewed Gaussian100 per cent99 per cent
Gaussian97 per cent93 per cent
Likelihood formPer cent correct
AdditiveMultiplicative
Skewed Gaussian100 per cent99 per cent
Gaussian97 per cent93 per cent

Notes. The columns show in what fraction of all mock catalogue realizations the likelihood ratio test preferred the correct (and simpler) model over the combined model.

Table 2.

Model comparison using likelihood ratio tests for single systematic cases.

Likelihood formPer cent correct
AdditiveMultiplicative
Skewed Gaussian100 per cent99 per cent
Gaussian97 per cent93 per cent
Likelihood formPer cent correct
AdditiveMultiplicative
Skewed Gaussian100 per cent99 per cent
Gaussian97 per cent93 per cent

Notes. The columns show in what fraction of all mock catalogue realizations the likelihood ratio test preferred the correct (and simpler) model over the combined model.

Finally, we note that our mock catalogues include a realistic cosmological clustering signal, meaning the values of galaxy overdensity in each pixel are correlated. Our results suggest that even though our likelihood model for |$\delta _{\rm g}$| omits these correlations, this model mis-specification does not affect the fidelity of the systematic correction. The favourable results for our method with multiple systematics, as shown in subsequent subsections, further support this conclusion.

5.3 Multiple systematics

We now move to the treatment of multiple systematics. When dealing with real survey data (e.g. DES), a multitude of systematic templates (Rodríguez-Monroy et al. 2022) are considered during the mitigation process, motivating our extention to the case of multiple systematics. We use a non-trivial number of uncorrelated contaminants, 25, to test our method. We generate 5 independent templates from each of the 5 systematic families described in Section 4.2, resulting in 25 independent templates.

The contamination model used is chosen with a random number generator such that each systematic has equal chance of being additive or multiplicative, resulting in 10 additive and 15 multiplicative systematics for this analysis. The non-zero contamination parameters are chosen from a Gaussian centered at zero and standard deviation of 0.15. This particular value of the standard deviation is motivated by real observations of seeing in HSC and chosen to result in an approximate 10  per cent contamination level at larger scales when using the systematic families described in Section 4.2. This level of contamination is reasonable considering the results of Rodríguez-Monroy et al. (2022). Using these choices, we produce the observed galaxy overdensity field for each realization of the mocks using equation (12). We then fit for the contamination parameters jointly for all systematics using the combined, additive, and multiplicative model, as well as the true contamination model for comparison. To clarify, in the additive and multiplicative case all systematics are assumed to follow that specific model, while in the true contamination model, each systematic follows their true assigned model when fitting.

We compare performance of the different contamination models in Fig. 5. We show only the results using the Gaussian likelihood for the fits, as we found negligible difference when using the skewed Gaussian form. Both panels show that the combined model produces sub-per cent mitigation of the contaminated two-point function (to a very small fraction of the statistical uncertainty) and equivalent performance to using the true contamination model within the statistical uncertainties. On the other hand, using the additive and multiplicative models consistently across all systematics results in biased corrections of the clustering signal (model mis-specification bias). We conclude, therefore, that assuming a single contamination model in the presence of systematics with distinct types of contamination can lead to biased estimates of the correlation function. In this case, the safest choice when the form of systematics is not known is to use the combined model for all of them. We note that the increased degradation at larger scales is due to the small area of the KiDS mocks. This was confirmed using larger simulated mocks described Section 6.2, where this degradation is not present at these scales. We also verified that the combined model can still recover the truth and outperforms the simpler models in the case of no contamination (⁠|$a = b = 0$|⁠) for a random selection of templates.

Top panel: fractional difference between the average corrected and true two-point function in the presence of 25 systematic contaminants, 10 of which are additive and 15 multiplicative. The vertical symlog scale shows cases of possible overcorrection/undercorrection. The black dash–dotted line shows the uncorrected two-point function, with contamination reaching a maximum of $\sim$10 per cent. Solid lines show the estimated correction using a Gaussian likelihood for the MLE and the combined (blue), additive (grey), multiplicative (orange), or true (pink) contamination model. The shaded contours show the $1\sigma$ regions of their respective lines. Bottom panel: the difference between the corrected and true two-point function divided by the standard deviation of the true clustering across all realizations for each separation bin. Both panels show that the combined model robustly mitigates contamination to the sub-per cent level with respect to the clustering signal and well within $1\sigma$ on all scales, and on average is equivalent to using the true contamination model. The performance of the true and combined models is consistent within the statistical uncertainties.
Figure 5.

Top panel: fractional difference between the average corrected and true two-point function in the presence of 25 systematic contaminants, 10 of which are additive and 15 multiplicative. The vertical symlog scale shows cases of possible overcorrection/undercorrection. The black dash–dotted line shows the uncorrected two-point function, with contamination reaching a maximum of |$\sim$|10 per cent. Solid lines show the estimated correction using a Gaussian likelihood for the MLE and the combined (blue), additive (grey), multiplicative (orange), or true (pink) contamination model. The shaded contours show the |$1\sigma$| regions of their respective lines. Bottom panel: the difference between the corrected and true two-point function divided by the standard deviation of the true clustering across all realizations for each separation bin. Both panels show that the combined model robustly mitigates contamination to the sub-per cent level with respect to the clustering signal and well within |$1\sigma$| on all scales, and on average is equivalent to using the true contamination model. The performance of the true and combined models is consistent within the statistical uncertainties.

Finally, we consider whether we can learn anything about the form of our contaminants (e.g. are some purely additive or multiplicative?) using our methodology. We avoid using likelihood ratio tests for this, as with multiple contaminants, comparing forms for each one is not as trivial as in the single systematic case. To see if we can learn something about the underlying contamination model we use the estimated parameters directly, i.e. the best-fitting values for |$\lbrace a_i, b_i\rbrace$| from using the combined model. For example, if we estimate the contamination parameters and uncertainties for systematic |$t_i$| to be: |$\lbrace \hat{a}_i = 0.5\pm 0.03, \hat{b}_i = 0.01\pm 0.05\rbrace$|⁠, we may infer that the contamination from this particular template is additive, as a and b differ significantly from each other, and b is consistent with 0 within the uncertainties. Fig. 6 shows a corner plot of a representative sample of estimated parameters constructed from the fitting in all 119 mocks for 6 of the 25 total systematics. The blue (additive systematic) and red (multiplicative systematic) vertical and horizontal lines show the true parameter value. We see that on average, the combined model can recover the true parameters within the |$1\sigma$| contours. Therefore, we can use the parameter estimates directly to learn about the forms of different systematics for which we have templates. Since this is done over multiple realizations, we use the parameter estimates for each mock to do this. However, in real data, we could use a bootstrap approach to generate similar cornerplots and study the contamination model from the data empirically. Using corner plots based on the MCMC parameters of a single realization are not equivalent to those produced from multiple realizations, as the full parameter variability is not captured by the MCMC error estimates.

Each panel shows a density plot of the estimated contamination parameters, $\lbrace a_i, b_i\rbrace$, using a Gaussian likelihood, from all 119 mock catalogue realizations. In other words, every point on a particular panel represents the maximum-likelihood estimate from all Monte carlo Markov Chain (MCMC) samples of a single mock catalogue. The contour regions show the $1\sigma$ (68 per cent) and $2 \sigma$ (95 per cent) regions. The six systematics shown (out of a total of 25 contaminants) form a representative sample of the different systematic families (shown by the scaling of their power spectra, $C_{\ell }^{tt}$) used to generate the systematic maps. The vertical and horizontal solid lines show the true parameters used for contamination, with blue (red) representing additive (multiplicative) systematics. The dashed blue line shows the zero point of the b parameter, where an additive systematic should live in parameter space. We see that across all panels the true parameters fall within $1\sigma$ regions of the estimated parameters and we can use the inferred parameters to identify additive and multiplicative systematics.
Figure 6.

Each panel shows a density plot of the estimated contamination parameters, |$\lbrace a_i, b_i\rbrace$|⁠, using a Gaussian likelihood, from all 119 mock catalogue realizations. In other words, every point on a particular panel represents the maximum-likelihood estimate from all Monte carlo Markov Chain (MCMC) samples of a single mock catalogue. The contour regions show the |$1\sigma$| (68 per cent) and |$2 \sigma$| (95 per cent) regions. The six systematics shown (out of a total of 25 contaminants) form a representative sample of the different systematic families (shown by the scaling of their power spectra, |$C_{\ell }^{tt}$|⁠) used to generate the systematic maps. The vertical and horizontal solid lines show the true parameters used for contamination, with blue (red) representing additive (multiplicative) systematics. The dashed blue line shows the zero point of the b parameter, where an additive systematic should live in parameter space. We see that across all panels the true parameters fall within |$1\sigma$| regions of the estimated parameters and we can use the inferred parameters to identify additive and multiplicative systematics.

We also note the tighter constraints on the a parameters compared to the b parameters. This discrepancy is due to the difference in constraining the shifts in the mean of the overdensity distribution (determined by the a parameters) versus shifts in the variance (determined by the b parameters). This can be understood statistically, as estimating the variance of the distribution intrinsically depends on estimating the mean. Any noise in estimates of the mean will propagate to estimates of the variance.

As shown here, the introduction of the combined model allows for joint modelling and correction of a set of clustering systematics that have different functional forms. Similarly, by construction it will allow for modelling of systematics that are neither purely additive nor purely multiplicative in overdensity.

5.4 Cosmological impact

To quantify the potential cosmological impact of how our method performs with multiple systematics, we fit the uncorrected and corrected clustering signals (for different modelling choices) to a constant times the true signal, as expressed in equation (25). Following the prescription detailed in Section 4.4, we calculate the estimated bias, |$\Delta \bar{A}$|⁠, for different systematics contamination models in the multiple systematics case described above and report the results in Table 3. Although not shown, the differences between the estimated biases from using a Gaussian or skewed Gaussian likelihood are negligible, leading to a preference for the simpler Gaussian model. The bias-variance trade-off can be evaluated by comparing both columns in this table. As shown, the contaminated signal exhibits a few- per cent bias that is highly statistically significant. Applying a correction using the true and the combined contamination models both reduce the bias to well below |$0.1 \sigma$|⁠, with the caveat that the error on the mean from the combined model is approximately 1.4 times larger than that of the true model (increased variance). As expected, using the true contamination model is the optimal approach when it comes to reducing the bias without much increase in the variance. However, the combined model still dramatically outperforms the other contamination models and produces a bias consistent with zero, indicating that it is a sufficient correction for a mixture of different types of systematic contaminants for cosmological analysis.

Table 3.

Comparison of the estimated bias on the two-point function (⁠|$w_{\text{corr}}(\theta) = (1 + \Delta A)w_{\text{true}}(\theta)$|⁠) for different assumed contamination models used when correcting the signal with 25 different systematics, including a mix of additive and multiplicative systematics.

Contamination model|$\Delta \bar{A} \,\, (\times 10^{-2})$||$\sigma _{\bar{A}} \,\, (\times 10^{-2})$|
Uncorrected|$4.0$|0.0135
True–0.01230.0631
Combined–0.01660.0861
Multiplicative0.11230.0728
Additive0.72140.0374
Contamination model|$\Delta \bar{A} \,\, (\times 10^{-2})$||$\sigma _{\bar{A}} \,\, (\times 10^{-2})$|
Uncorrected|$4.0$|0.0135
True–0.01230.0631
Combined–0.01660.0861
Multiplicative0.11230.0728
Additive0.72140.0374

Notes. The first and second columns show the mean and error on the mean of A over all mocks. The uncorrected measurement is shown as a baseline for what our method needs to accomplish given the extremely statistically significant bias. Using the true contamination model (the correct choices of which systematics are additive or multiplicative) or the combined model both reduce the bias considerably (to below 0.1σ), while incorrectly applying the additive and multiplicative models to all templates results in a biased correction. The bias-variance trade-off can be observed by comparing both columns. The increased complexity of the combined model results in the highest variance, but in a considerable decrease on the bias when comparing with the additive and multiplicative models. The fitting results shown here use a Gaussian likelihood, but we confirm the differences with the skewed Gaussian form are small and within the errors from each other.

Table 3.

Comparison of the estimated bias on the two-point function (⁠|$w_{\text{corr}}(\theta) = (1 + \Delta A)w_{\text{true}}(\theta)$|⁠) for different assumed contamination models used when correcting the signal with 25 different systematics, including a mix of additive and multiplicative systematics.

Contamination model|$\Delta \bar{A} \,\, (\times 10^{-2})$||$\sigma _{\bar{A}} \,\, (\times 10^{-2})$|
Uncorrected|$4.0$|0.0135
True–0.01230.0631
Combined–0.01660.0861
Multiplicative0.11230.0728
Additive0.72140.0374
Contamination model|$\Delta \bar{A} \,\, (\times 10^{-2})$||$\sigma _{\bar{A}} \,\, (\times 10^{-2})$|
Uncorrected|$4.0$|0.0135
True–0.01230.0631
Combined–0.01660.0861
Multiplicative0.11230.0728
Additive0.72140.0374

Notes. The first and second columns show the mean and error on the mean of A over all mocks. The uncorrected measurement is shown as a baseline for what our method needs to accomplish given the extremely statistically significant bias. Using the true contamination model (the correct choices of which systematics are additive or multiplicative) or the combined model both reduce the bias considerably (to below 0.1σ), while incorrectly applying the additive and multiplicative models to all templates results in a biased correction. The bias-variance trade-off can be observed by comparing both columns. The increased complexity of the combined model results in the highest variance, but in a considerable decrease on the bias when comparing with the additive and multiplicative models. The fitting results shown here use a Gaussian likelihood, but we confirm the differences with the skewed Gaussian form are small and within the errors from each other.

On the other hand, assuming the additive and multiplicative models in the presence of a mixture of different types of systematics produces a significantly biased correction. The correction using the additive model performs substantially worse than with the multiplicative model by a factor of |$\sim 4$|⁠. We can see why from Fig. 5, where the multiplicative model does better than the additive at small scales and much worse at large scales. However, due to the smaller errorbars at small scales, the fitting of A using equation (26) favours the scales at which the multiplicative model performs better, which explains the lower values of |$\Delta \bar{A}$| for that model. The increased complexity of the combined model results in the highest variance, but in a considerable decrease on the bias when compared with the additive and multiplicative models. Thus, the bias-variance tradeoff in this case favours the combined model.

5.5 Effect of correlated systematics

So far we have only showed the performance of this method for uncorrelated systematics. However, we know that in reality template maps are often correlated with each other. To test the performance of this method for the case of correlated templates, we introduce different levels of in-class correlation between systematics templates.5 That is, in the process of map production using HEALPY, we provide off-diagonal elements that describe the correlation of all maps with each other. We only add correlation between maps that are produced from the same power spectrum (what we mean by in-class correlation), where the off-diagonal elements are |$\rho _{\text{corr}} C^{t_it_j}_{\ell }$| if the maps |$t_i$| and |$t_j$| belong to the same class, and 0 otherwise. We consider this scenario for a range of values of |$\rho _{\text{corr}}$|⁠.

In our specific scenario, we are dealing with five systematic classes with five maps each. For our test, every map belonging to the same class will have the same level of correlation, |$\rho _{\text{corr}}$|⁠, with each other and 0 with all other maps. We then follow the method described in Section 3.1 to deal with such cases. Fig. 7 shows the average bias in the corrected correlation function amplitude as a function of the in-class correlation. We see that in the presence of correlated template maps, there is no loss in performance. All points are consistent within |$2\sigma$| with an unbiased correction of the two-point function. In addition, the results with the Gaussian and skewed Gaussian likelihoods are once again not statistically significant.

The average bias on the corrected clustering signal over all 119 mock catalogue realizations as a function of the correlation coefficient between systematic templates within a given power spectrum class. The solid points show this metric for the corrected two-point function using the combined contamination model and either a Gaussian (black) or skewed Gaussian (green) likelihood in the MLE. The error bars show the error on the mean. This figure shows that the excellent performance of the method remains unaltered even by relatively high levels of correlation between systematic template maps.
Figure 7.

The average bias on the corrected clustering signal over all 119 mock catalogue realizations as a function of the correlation coefficient between systematic templates within a given power spectrum class. The solid points show this metric for the corrected two-point function using the combined contamination model and either a Gaussian (black) or skewed Gaussian (green) likelihood in the MLE. The error bars show the error on the mean. This figure shows that the excellent performance of the method remains unaltered even by relatively high levels of correlation between systematic template maps.

6 DISCUSSION

6.1 Comparison with other methods

Our methodology has some features in common with previous work described in Section 2.3, but also some distinct differences. Unlike the methods of mode projection (Rybicki & Press 1992) and template subtraction (Ross et al. 2011), we explicitly account for multiplicative terms in the one-point function describing the observed galaxy overdensity in equation (11).

It is common practice to neglect the third term in this equation and consider only the first-order term. However, we show in this work how neglecting the presence of multiplicative bias can lead to non-negligible biases in the corrected correlation functions if they are present.

The method presented in this paper also considers distinct parameters contributing to the additive and multiplicative contamination to the galaxy overdensity, allowing for empirical determination of the systematics model, rather than fitting a specific model (e.g. additive or multiplicative). This not only allows for the possibility of multiple sources of contamination for a given template, but also allows us to directly determine the functional form of the systematic contamination. The contamination parameters are also fitted for jointly, rather than independently for every systematic. This differs from iterative methods, which clean contaminants one by one using a form of linear regression. These methods rely on null hypothesis testing during the iterative process to avoid biased mitigation of contaminants. Other methods like ElasticNet incorporate the impact of multiplicative effects by using a two-step process to update the covariance for the assumed multiplicative parameters. In contrast, our method deals with this problem by taking into account all systematics simultaneously in the fitting process and self-consistently incorporating the multiplicative effects into the likelihood. This joint estimation enables the discrimination of additive and multiplicative templates directly.

In our current implementation of the method, we chose to mitigate the systematics directly at the two-point level, which differs from the iterative regression (Elvin-Poole et al. 2018) and ElasticNet methods (Weaverdyck & Huterer 2021) that mitigate at the map level either through map cleaning or pixel weights. In principle, our method allows for mitigation at the map level through cleaning of the observed overdensity map using equation (12). One common feature among corrections at these different levels is that all require a noise debiasing process for the reasons explained in Section 3.5. Regarding the use of pixel weights, the last term in equation (1) when used with combined model prevents us from being able to cleanly express the contamination as a weighted correction for each galaxy. We acknowledge that mitigating at the two-point level requires calculation of the autocorrelation function for all systematics of interest, which is not as simple as using pixel weights. However, it is still possible to perform a pixel-weighted correction, similar to that of iterative methods or ElasticNet, if the parameter fits agree with an additive or multiplicative model. In that case, the weights can be calculated for all templates |$t_i$| with contamination parameter |$a_{t_i}$| for any given pixel j as

$$\begin{eqnarray} w_j = \frac{1}{1 + \sum _{t_i}a_{t_i}\delta _{t_i,j}}. \end{eqnarray}$$
(27)

Finally, our method allows for fitting the |$\delta _{\rm g}$| distribution using a Skewed Gaussian likelihood, which more closely resembles the real distribution. Even though we do not see any significant difference in systematics mitigation between methods after debiasing the parameters, we still provide an alternative fit to a distribution commonly assumed to be Gaussian for systematics treatment (see examples of non-Gaussinity in Rezaie et al. 2021, 2023). We emphasize the non-Gaussianity of the |$\delta _{\rm g}$| distribution in the fits for the true distribution in Section 5.1. There, the parameter that describes skewness was estimated to be |$\gamma \approx 5.1 \pm 0.2$|⁠, which represents a highly non-Gaussian distribution. We note that the specific level of non-Gaussianity will heavily depend on the pixelation scheme and galaxy density. Despite the high statistical significance with which the non-Gaussianity of the |$\delta _{\rm g}$| distribution is detected, the effectiveness of systematics mitigation for the Gaussian likelihood indicates that for our scenario, model mis-specification of a non-Gaussian distribution as a Gaussian is not a problem when parameter noise is known or can be estimated accurately.

6.2 Practical applications on real data

Although our demonstration of the method used simulated data, the method is easily transferable to real LSS data. One open question is the choice of the pixel scale in relation to the characteristic scale of the systematics. More specifically, what pixelization is sufficient to properly characterize the systematic maps and mitigate contaminants? This work has produced and measured synthetic contamination at the same pixel scale; however, systematic template maps may be produced at finer scales in real data. The impact of pixel scale choices on the corrected clustering signal for systematics maps with varying characteristic scales was not explored here, but should be considered when working with real data.

We have also emphasized that it is important to quantify the noise in the estimated systematics parameters in order to accurately mitigate the contamination. This work used multiple realizations of mock catalogues for this purpose. For surveys that have mock catalogues with realistic clustering, such an approach would transfer over directly. However, not all surveys will have suitable mock catalogues. To apply the method without many realizations of mock catalogues, we propose the use of data-driven methods to estimate the noise in the systematics parameters.

Before applying data-driven methods to estimate the noise in the estimated systematics parameters, it is important to understand exactly what the noise depends on and how to evaluate its impact on systematics mitigation. The noise can depend on many factors, including the sky area coverage, the |$\delta _t$| distributions and power spectrum, and the form of the likelihood used to analyse the |$\hat{\delta }_{\rm g}$| distribution (including the number of parameters being fitted). For example, the top panel of Fig. 8 illustrates how the noise level in the inferred systematics parameters can vary based on the form of the likelihood (blue and orange points represent results with a Gaussian and skewed Gaussian likelihood, respectively) and on the spatial power spectra used to generate the systematic maps, normalized to a fixed variance in |$\delta _t$|⁠. As we can see, the noise can differ greatly depending on the power spectrum used, with some forms increasing the estimated noise variance by almost an order of magnitude. The choice of a Gaussian or skewed Gaussian likelihood does not significantly affect the noise in the a parameters, which control shifts in the mean of the distribution. However, the choice has a substantial impact on the noise in the estimated b parameters, which modulate its variance. Therefore, even though the choice of the likelihood (Gaussian or skewed Gaussian) does not significantly affect the systematics mitigation outcome, it is important to note that the noise estimates differ. This could be relevant in the scenario where the parameter variance cannot be accurately measured and corrected for, in which case it is helpful to adopt a model with smaller variance since the bias correction would be less sensitive to it. However, we found no correlation between the noise variance in the estimated systematics parameter and the true value of the parameter.

Top panel: average parameter estimation variance for each systematics class used to generate contaminating templates. The variance for each estimated parameter is calculated over all 119 realizations of the KiDS–HOD mocks for each class of systematic. Since we use five systematics per class, the points show the average noise from the five parameters corresponding to each systematic contaminant from the mocks. The crosses show the parameter variance taken from the MCMC posteriors for comparison. The different colours show the likelihood form used, either a Gaussian (blue) or a Skewed Gaussian (orange). The error bars are the errors on the mean. The left panel shows the variance on the a parameters (which modify the mean of the observed $\delta _{\rm g}$ distribution), while the right panel shows the variance for the b parameters (which modify its variance). As we can see, there is little difference between the estimated noise using either likelihood on the a parameters. However, the Skewed Gaussian has lower parameter noise for the b parameters. We attribute this to the Skewed Gaussian’s better recovery of the distribution’s variance when we do not remove outliers. In addition, we see that the variance recovered from the MCMC posteriors vastly underestimates the parameter noise. Bottom panel: the effect of miscalculating the parameter noise on the two-point function correction. Possible parameters are drawn from a Gaussian centred at the true parameter value and with variance represented by the different coloured lines. The correction to the observed correlation function is done using the estimated noise. The x-axis shows the ratio of the estimated noise to the true noise for both the a and b parameters, while the y-axis shows the bias in the corrected two-point function. We see that underestimating (or overestimating) the parameter noise causes the two-point function to be overcorrected (or undercorrected). However, the effect of the bias depends on the size of the noise itself. Having high parameter noise means the analysis is more sensitive to accurately estimating the noise, while the opposite is true at low parameter noise.
Figure 8.

Top panel: average parameter estimation variance for each systematics class used to generate contaminating templates. The variance for each estimated parameter is calculated over all 119 realizations of the KiDS–HOD mocks for each class of systematic. Since we use five systematics per class, the points show the average noise from the five parameters corresponding to each systematic contaminant from the mocks. The crosses show the parameter variance taken from the MCMC posteriors for comparison. The different colours show the likelihood form used, either a Gaussian (blue) or a Skewed Gaussian (orange). The error bars are the errors on the mean. The left panel shows the variance on the a parameters (which modify the mean of the observed |$\delta _{\rm g}$| distribution), while the right panel shows the variance for the b parameters (which modify its variance). As we can see, there is little difference between the estimated noise using either likelihood on the a parameters. However, the Skewed Gaussian has lower parameter noise for the b parameters. We attribute this to the Skewed Gaussian’s better recovery of the distribution’s variance when we do not remove outliers. In addition, we see that the variance recovered from the MCMC posteriors vastly underestimates the parameter noise. Bottom panel: the effect of miscalculating the parameter noise on the two-point function correction. Possible parameters are drawn from a Gaussian centred at the true parameter value and with variance represented by the different coloured lines. The correction to the observed correlation function is done using the estimated noise. The x-axis shows the ratio of the estimated noise to the true noise for both the a and b parameters, while the y-axis shows the bias in the corrected two-point function. We see that underestimating (or overestimating) the parameter noise causes the two-point function to be overcorrected (or undercorrected). However, the effect of the bias depends on the size of the noise itself. Having high parameter noise means the analysis is more sensitive to accurately estimating the noise, while the opposite is true at low parameter noise.

In the absence of mock catalogues, we propose the use of the block bootstrap method to estimate the parameter noise. Block bootstrap means that the pixelated maps are divided into approximately equal area patches, where each bootstrap sample represents a version of the maps made from a random selection (with replacement) of the patches. This allows us to create multiple realizations from a single map and estimate the parameter noise one would get from having multiple independent mocks. We wish to test this data-driven approach on higher and more realistic area coverage than our current KiDS mocks. To do so, we use the same mechanism to generate systematic template maps from the set of five systematic classes described in the multiple systematic case in Section 5.3. In order to obtain different area coverage |$\delta _{\rm g}$| maps, we used version 2.3.0 of CCL6 (Chisari et al. 2019) to generate galaxy–galaxy power spectra using a vanilla |$\Lambda$|CDM cosmological model (⁠|$\Omega_{\rm m} = 0.3, \Omega_{\rm b} = 0.05, \,\, \Omega_{\Lambda } = 0.7,\,\, \sigma _8 = 0.8, \,\, n_{\rm s} = 0.96$|⁠) with galaxy bias equal to 1 (⁠|$\delta _m = \delta _{\rm g}$|⁠) to produce Gaussian galaxy overdensity maps. This cosmological model is similar (while not identical) to that used in KiDS mocks, with the exception that our galaxy bias is fixed to 1. We use a Gaussian |$n(z)$| with mean redshift |$\mu _z = 0.3$| and |$\sigma _z = 0.05$| to resemble the redshift range used for the previous validation with the KiDS mocks. We choose four size configurations: 100, 400, 1000, and 2000 deg|$^2$|⁠, to show how this approach works at various sky area coverages. Although not identical, the produced mocks provide a realistic test of the block bootstrap method for estimating parameter noise. We wish to compare the parameter noise estimation using this bootstrap approach with estimates using multiple realizations. We transformed the Gaussian maps into lognormal maps to more accurately portray a realistic underlying distribution. This is done by taking some Gaussian map G, shifting it using its variance: |$S = G-0.5*\text{Var}[G]$|⁠, and transforming the shifted map to a lognormally distributed one: |$L =\exp (S) -1$|⁠. For the tests described here the shift parameter is |$0.5*\text{Var}[G] = 0.025$|⁠.

In this specific example, we use 100 bootstrap-resampled data sets (meaning 100 distinct versions of a map from one realization), with 10 block bootstrap patches. We confirm that this number of realizations leads to 5 per cent-level convergence of the parameter noise. The number of resampled data sets and patches should be chosen based on the number of parameters estimated and scales at which correlations between pixels are small. In this particular test, we estimate 51 parameters and correlations become negligible at |$\sim 1.5$|°, which motivates our chosen configuration.

In Fig. 9, we compare the accuracy of parameter noise estimation by comparing a bootstrap approach on a single mock catalogue realization against our previous approach using multiple realizations of mock catalogues as a function of sky area coverage. As we can see from the figure, the noise is approximately proportional to the inverse of the survey area. The figure also shows that using the block bootstrap on a single data set can be a good alternative to using multiple realizations of mock catalogues.

Comparison of the results of estimating systematics parameter noise using multiple realizations of mock catalogues versus a block bootstrap approach on a single mock catalogue realization. The figure shows the variance in the estimated parameters (scaled by the area) as a function of the sky area coverage. The different line colours represent different systematic classes. The triangle points show the variance from using 100 realizations of mock catalogues, while the circular points show the variance estimated using the block bootstrap on a single realization. The error bars show the error on the mean, and the lines (varying styles) track the results from mocks. We see that the bootstrap approach can effectively estimate the parameter noise for higher sky area coverage and low noise parameters, while it fails for small surveys with high parameter noise.
Figure 9.

Comparison of the results of estimating systematics parameter noise using multiple realizations of mock catalogues versus a block bootstrap approach on a single mock catalogue realization. The figure shows the variance in the estimated parameters (scaled by the area) as a function of the sky area coverage. The different line colours represent different systematic classes. The triangle points show the variance from using 100 realizations of mock catalogues, while the circular points show the variance estimated using the block bootstrap on a single realization. The error bars show the error on the mean, and the lines (varying styles) track the results from mocks. We see that the bootstrap approach can effectively estimate the parameter noise for higher sky area coverage and low noise parameters, while it fails for small surveys with high parameter noise.

The bootstrap estimates are quite good for higher area mocks, and for small area mocks with lower levels of parameter noise. However, the approach fails for small area surveys (⁠|$\sim$|100 deg|$^2$|⁠) with high parameter noise associated with systematics maps that have a lot of small-scale power.

In addition, the bootstrap estimates can depend on of the number of patches chosen. Creating patches that are too small might fail to capture large spatial correlations between template maps. Conversely, choosing very few large patches can result in challenges in inferring a significant number of systematics parameters and their variances. Our results show, however, that it is possible to achieve a balance that reliably estimates the systematics parameter noise for many realistic survey scenarios.

Note that we cannot simply take the parameter noise variance directly from a single realization by looking at the posterior generated from the MCMC samples. Because of our simple adopted likelihood model, which neglects correlations in overdensity between pixels, the variance of the posterior coming from the MCMC samples does not reflect sources of uncertainty such as cosmic variance. Using the MCMC sample posterior noise will produce underestimates of the parameter noise, as seen in Fig. 8, and therefore lead to overcorrection of the two-point function.

7 CONCLUSIONS

In this paper, we presented a new method for characterizing and mitigating systematic contaminants in galaxy overdensity fields. We used a generalized and flexible contamination model that allows for multiple systematics to contaminate the overdensity field in additive, multiplicative, or ‘combined’ forms. This modelling choice allows us to quantify contamination without assuming that all systematics follow one particular form, and in addition helps us empirically determine the functional dependence of systematics on the galaxy overdensity. Using the generalized model can reduce model biases, producing unbiased estimates of the two-point correlation function in the cases where different systematics take different forms captured by our model. The ability to determine the functional form of the contamination may provide insight into their origin, so they can be reduced or entirely avoided in future.

As part of our approach, we analyse the one-point function of galaxy overdensities within pixels to determine the systematics contamination model, then use that to correct the two-point functions. In addition, we introduced a skewed Gaussian likelihood function to represent the galaxy overdensity distribution in an attempt to more closely model the true underlying conditional probability function. However, there were no significant differences in the quality of systematics mitigation between the skewed Gaussian and Gaussian likelihood. Therefore, we recommend using the simpler Gaussian likelihood.

The method was tested and validated using KiDS-like mock galaxy catalogues with realistic clustering, and mock systematic templates maps. Template maps with different systematic ‘families’ were produced to mimic the level of complexity we can expect to find in real data. Key outcomes of our tests were as follows:

  • Contaminating the true |$\delta _{\rm g}$| with one template map in two different ways (multiplicatively and additively), we found that applying the wrong model when mitigating systematics can result in significant residual biases after correcting the two-point function, while our proposed, more general model produced unbiased corrections and has comparable performance to the case where the true contamination model is known. We also showed that simple likelihood ratio tests can successfully identify the correct contamination model. Both the Gaussian and skewed Gaussian likelihood model for the overdensity field result in robust systematics mitigation, leading us to prefer the simpler Gaussian model for correction purposes even though it does not describe the overdensity field particularly well.

  • Multiple systematics: we contaminated the overdensity field with a non-trivial number of contaminants (25) with different power spectra, of which some were additive and some multiplicative systematics. We showed that the combined model once again produced unbiased estimates of the clustering signal, while assuming a single contamination model for all systematics produced biased corrections in this mixed case. In addition, we found that we can accurately determine the functional form of the systematic contamination for the individual systematics.

Finally, we considered the practical applications of the methods presented here in real data. The main challenge is to accurately estimate the systematics model parameter noise needed to obtain unbiased corrected clustering signals. Since this work used mock catalogues, parameter noise could be estimated well from multiple mock catalogue realizations. However, this will pose a challenge when the method is applied to real data, as mock catalogues are not always available and are computationally expensive. To overcome this challenge, we proposed a block bootstrap approach for estimating the parameter noise from a single realization of both the galaxy density field and systematic maps. We showed that the estimated parameter noise using the block bootstrap is comparable to the estimates using multiple mock catalogue realizations in all cases except for small area (⁠|$\sim$|100 deg|$^2$|⁠) with significant small-scale power in the systematics template maps. This is therefore a promising approach in applying this method to real data in future work.

Applications of this method on current imaging surveys would allow us to directly compare the results with commonly used mitigation techniques on real data. In addition, empirical determination of the functional form of real systematics could help us learn how systematics bias the observed galaxy overdensity field. Recent analyses like the DES Y3 clustering analysis (Rodríguez-Monroy et al. 2022), where concerns emerged regarding LSS systematics, are an interesting and highly relevant context for direct application of the method developed in this work, as a stepping stone towards future surveys.

ACKNOWLEDGEMENTS

FB and RM were supported in part by a grant from the Simons Foundation (Simons Investigator in Astrophysics, Award ID 620789) and in part by the Department of Energy grant DE-SC0010118. SD is supported by U.S. Department of Energy contract DE-SC0019248 and by NSF Award Number 2020295. CS is supported by NSF Award Number 2020295.

Some of the results in this paper have been derived using the healpy and HEALPix packages.

DATA AVAILABILITY

We would like to thank Joachim Harnois-Deraps for making public the SLICS mock data, which can be found at http://slics.roe.ac.uk/. Any additional data and software generated by the authors are available upon request.

Footnotes

1

This assumption amounts to assuming that the spatially varying additive term |$f_\text{syst}^\text{add}$| in equation (1) averages to 0 across the survey. If it did not, then in practice, our estimate of |$\bar{n}_{\rm g}$| would be modified.

3

See available SLICS products at: http://cuillin.roe.ac.uk/~jharno/SLICS/

5

The more challenging issue of systematics templates that correlate with the real cosmological density field is deferred to future work.

REFERENCES

Abbott
T. M. C.
et al. ,
2022
,
Phys. Rev. D
,
105
,
023520

Aihara
H.
et al. ,
2018a
,
Publ. Astron. Soc. Japan
,
70
,
S4

Aihara
H.
et al. ,
2018b
,
Publ. Astron. Soc. Japan
,
70
,
S8

Alonso
D.
,
Sanchez
J.
,
and
A. S.
,
2019
,
MNRAS
,
484
,
4127

Baleato Lizancos
A.
,
White
M.
,
2023
,
J. Cosmol. Astropart. Phys.
,
2023
,
044

Chisari
N. E.
et al. ,
2019
,
ApJS
,
242
,
2

DESI Collaboration
et al. .,
2022
,
AJ
,
164
,
207

Desjacques
V.
,
Jeong
D.
,
Schmidt
F.
,
2018
,
Phys. Rep.
,
733
,
1

Elsner
F.
,
Leistedt
B.
,
Peiris
H. V.
,
2015
,
MNRAS
,
456
,
2095

Elsner
F.
,
Leistedt
B.
,
Peiris
H. V.
,
2016
,
MNRAS
,
465
,
1847

Elvin-Poole
J.
et al. ,
2018
,
Phys. Rev. D
,
98
,
042006

Euclid Collaboration
et al. .,
2022
,
A&A
,
662
,
A112

Filippenko
A. V.
,
2005
, in
Sion
E. M.
,
Vennes
S.
,
Shipman
H. L.
eds,
White dwarfs: Cosmological and Galactic Probes
.
Springer, Dordrecht
p.
97

Foreman-Mackey
D.
,
Hogg
D. W.
,
Lang
D.
,
Goodman
J.
,
2013
,
Publ. Astron. Soc. Pac.
,
125
,
306

Górski
K. M.
,
Hivon
E.
,
Banday
A. J.
,
Wandelt
B. D.
,
Hansen
F. K.
,
Reinecke
M.
,
Bartelmann
M.
,
2005
,
ApJ
,
622
,
759

Harnois-Déraps
J.
,
van Waerbeke
L.
,
2015
,
MNRAS
,
450
,
2857

Harnois-Déraps
J.
,
Pen
U.-L.
,
Iliev
I. T.
,
Merz
H.
,
Emberson
J. D.
,
Desjacques
V.
,
2013
,
MNRAS
,
436
,
540

de Jong
J. T. A.
et al. ,
2013
,
The Messenger
,
154
,
44

Harnois-Deraps
J.
et al. ,
2018
,
MNRAS
,
481
,
1337

Hildebrandt
H.
et al. ,
2016
,
MNRAS
,
465
,
1454

Ho
S.
et al. ,
2012
,
ApJ
,
761
,
14

Ivezić
Ž.
et al. ,
2019
,
ApJ
,
873
,
111

Jarvis
M.
,
Bernstein
G.
,
Jain
B.
,
2004
,
MNRAS
,
352
,
338

Johnston
H.
et al. ,
2021
,
A&A
,
648
,
A98

Kohonen
T.
,
1990
,
Proc. IEEE
,
78
,
1464

Krause
E.
et al. ,
2021
,
preprint
()

Landy
S. D.
,
Szalay
A. S.
,
1993
,
ApJ
,
412
,
64

Lewis
A.
,
Challinor
A.
,
Lasenby
A.
,
2000
,
ApJ
,
538
,
473

Perlmutter
S.
et al. ,
1999
,
ApJ
,
517
,
565

Planck Collaboration VI
,
2020
,
A&A
,
641
,
A6

Porredon
A.
et al. ,
2021
,
Phys. Rev. D
,
103
,
043503

Rezaie
M.
,
Seo
H.-J.
,
Ross
A. J.
,
Bunescu
R. C.
,
2020
,
MNRAS
,
495
,
1613

Rezaie
M.
et al. ,
2021
,
MNRAS
,
506
,
3439

Rezaie
M.
et al. ,
2023
,
MNRAS
.

Riess
A. G.
et al. ,
1998
,
AJ
,
116
,
1009

Rodríguez-Monroy
M.
et al. ,
2022
,
MNRAS
,
511
,
2665

Ross
A. J.
et al. ,
2011
,
MNRAS
,
417
,
1350

Rozo
E.
et al. ,
2016
,
MNRAS
,
461
,
1431

Rybicki
G. B.
,
Press
W. H.
,
1992
,
ApJ
,
398
,
169

Sevilla-Noarbe
I.
et al. ,
2021
,
ApJS
,
254
,
24

Shafer
D. L.
,
Huterer
D.
,
2015
,
MNRAS
,
447
,
2961

Smith
A.
,
Cole
S.
,
Baugh
C.
,
Zheng
Z.
,
Angulo
R.
,
Norberg
P.
,
Zehavi
I.
,
2017
,
MNRAS
,
470
,
4646

Spergel
D.
et al. ,
2015
,
preprint
()

The Dark Energy Survey Collaboration
,
2005
,
preprint (astro-ph/0510346)

Weaverdyck
N.
,
Huterer
D.
,
2021
,
MNRAS
,
503
,
5061

Weinberg
D. H.
,
Mortonson
M. J.
,
Eisenstein
D. J.
,
Hirata
C.
,
Riess
A. G.
,
Rozo
E.
,
2013
,
Phys. Rep.
,
530
,
87

Zonca
A.
,
Singer
L.
,
Lenz
D.
,
Reinecke
M.
,
Rosset
C.
,
Hivon
E.
,
Gorski
K.
,
2019
,
J. Open Source Softw.
,
4
,
1298

APPENDIX A: CORRELATED SYSTEMATICS

In order to transform our template maps into an orthogonal space, we construct the pixel covariance matrix of our template maps

$$\begin{eqnarray} C_{ij} = \langle \delta _{t_i}\delta _{t_j} \rangle \end{eqnarray}$$
(A1)

We then define the rotation matrix from the eigenvectors of the covariance matrix: |$\boldsymbol{\sf C} = \boldsymbol{\sf R}\boldsymbol{\sf D} \boldsymbol{\sf R}^T$|⁠. We define the uncorrelated systematic maps as

$$\begin{eqnarray} \delta _t^{^{\prime }} = \boldsymbol{\sf R}^T \delta _t \end{eqnarray}$$
(A2)

and the one-point function in equation (12) can be reformulated as

$$\begin{eqnarray} a_1\delta _{t_1} + ... + a_N\delta _{t_N} = a^T\delta _t = a^T \boldsymbol{\sf R} \boldsymbol{\sf R}^T \delta _t = a^{^{\prime }T} \delta _t^{^{\prime }} \end{eqnarray}$$
(A3)

where |$a^{^{\prime }T} = a^T \boldsymbol{\sf R}$|⁠. This means that we can fit for the parameters |$a_{\text{rot}}$| as if the systematics are uncorrelated and then transform back the parameters by applying the rotation matrix again: |$a^T = a^{^{\prime }T} \boldsymbol{\sf R}^T$|⁠. We do the same for the b parameters. The corrected estimate of the two-point function is then

$$\begin{eqnarray} w_{\text{corr}} = \frac{\hat{w}-\sum _i^{N_{\text{sys}}} a_{i}^{^{\prime }2}\langle \delta ^{^{\prime }}_{t_i}\delta ^{^{\prime }}_{t_i}\rangle }{1 + \sum _i^{N_\text{sys}} b_{ i}^{2^{\prime }}\langle \delta ^{^{\prime }}_{t_i}\delta ^{^{\prime }}_{t_i}\rangle } \end{eqnarray}$$
(A4)

Alternatively, the observed clustering signal in the original systematics space would be

$$\begin{eqnarray} \hat{w}(\theta) &= w (\theta) \left(1 + \sum _i^{N_{\text{sys}}} b_i^2 \langle \delta _{t_i}\delta _{t_i} \rangle + \sum _i^{N_{\text{sys}}} \sum _{j \ne i}^{N_{\text{sys}}} b_i b_j \langle \delta _{t_i}\delta _{t_j} \rangle \right)\\ & + \sum _i^{N_{\text{sys}}} a_i^2 \langle \delta _{t_i}\delta _{t_i} \rangle + \sum _i^{N_{\text{sys}}} \sum _{j \ne i}^{N_{\text{sys}}} a_i a_j \langle \delta _{t_i}\delta _{t_j} \rangle \end{eqnarray}$$

and the estimated correction becomes

$$\begin{eqnarray} w_{\text{corr}} = \frac{\hat{w}-\sum _i^{N_{\text{sys}}} a_i^2 \langle \delta _{t_i}\delta _{t_i} \rangle -\sum _i^{N_{\text{sys}}} \sum _{j \ne i}^N a_i a_j \langle \delta _{t_i}\delta _{t_j} \rangle }{1 + \sum _i^{N_{\text{sys}}} b_i^2 \langle \delta _{t_i}\delta _{t_i} \rangle + \sum _i^{N_{\text{sys}}} \sum _{j \ne i}^{N_{\text{sys}}} b_i b_j \langle \delta _{t_i}\delta _{t_j} \rangle }. \end{eqnarray}$$
(A5)

If working with correlated systematics and choosing not to use an orthogonal set of template maps, mitigation is prescribed by equation (A5), where the unbiased cross-terms are

$$\begin{eqnarray} a_i a_j = \hat{a}_i \hat{a}_j-\text{Cov}[\hat{a}_i, \hat{a}_j]. \end{eqnarray}$$
(A6)

APPENDIX B: PARAMETER NOISE AND DEBIASING

We emphasized throughout the paper the importance of debiasing the estimated contamination parameters in order to obtain an unbiased correction of the two-point function. Here, we show the resulting correlation function if this debiasing step is ignored altogether. Fig. B1 shows the results obtained from Section 5.3 if the debiasing step is omitted. As we can see, omitting this step results in a per cent-level bias in the corrected clustering signal – with an opposite sign, and a magnitude that is approximately 30 per cent of the original uncorrected systematic. This result further shows the importance of this step in the mitigation method presented in this work.

Fractional difference between the average corrected and true two-point function in the presence of 25 systematic contaminants, 10 of which are additive and 15 multiplicative. The plot shows similar information to that of Fig. 5, but now comparing the correction method with and without parameter debiasing. The plot shows the importance of the debiasing step, as neglecting it produces a biased (and overcorrected) estimate of the corrected two-point function.
Figure B1.

Fractional difference between the average corrected and true two-point function in the presence of 25 systematic contaminants, 10 of which are additive and 15 multiplicative. The plot shows similar information to that of Fig. 5, but now comparing the correction method with and without parameter debiasing. The plot shows the importance of the debiasing step, as neglecting it produces a biased (and overcorrected) estimate of the corrected two-point function.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.