Abstract

We introduce a novel technique, called ‘granulometry’, to characterize and recover the mean size and the size distribution of H ii regions from 21-cm tomography. The technique is easy to implement, but places the previously not very well-defined concept of morphology on a firm mathematical foundation. The size distribution of the cold spots in 21-cm tomography can be used as a direct tracer of the underlying probability distribution of H ii region sizes. We explore the capability of the method using large-scale reionization simulations and mock observational data cubes while considering capabilities of Square Kilometre Array 1 (SKA1) low and a future extension to SKA2. We show that the technique allows the recovery of the H ii region size distribution with a moderate signal-to-noise ratio from wide-field imaging (SNR ≲ 3), for which the statistical uncertainty is sample variance dominated. We address the observational requirements on the angular resolution, the field of view, and the thermal noise limit for a successful measurement. To achieve a full scientific return from 21-cm tomography and to exploit a synergy with 21-cm power spectra, we suggest an observing strategy using wide-field imaging (several tens of square degrees) by an interferometric mosaicking/multibeam observation with additional intermediate baselines ( ∼ 2-4 km) in an SKA phase 2.

1 INTRODUCTION

The 21-cm signal arising from the spin flip transition in the neutral hydrogen (H i) atoms promises to unearth an unprecedented amount of information about the cosmic dawn (CD) and the epoch of reionization (EoR). It will help us to resolve many outstanding questions regarding this particular phase of the history of our Universe such as: how did reionization progress with time? What was the morphology of H i distribution in the intergalactic medium (IGM) during the different phases of reionization? Which were the major sources that drive reionization? (for reviews see, e.g. Furlanetto, Oh & Briggs 2006; Morales & Wyithe 2010; Pritchard & Loeb 2012).

The first generation of radio interferometers such as the Giant Metrewave Radio Telescope (GMRT)1 (Paciga et al. 2013), Low-Frequency Array (LOFAR)2 (Yatawatta et al. 2013), Murchison Widefield Array (MWA)3 (Bowman et al. 2013), and Precision Array for Probing the Epoch of Reionization (PAPER)4 (Ali et al. 2015) target the detection of this signal through statistical estimators such as the power spectrum and variance. While these statistical estimators ensure an optimum signal-to-noise ratio (SNR) to increase the detection possibility, it will not be possible to characterize the signal completely through these measurements alone, as the signal from the EoR is expected to be highly non-Gaussian in nature (e.g. Ciardi & Madau 2003; Furlanetto, Zaldarriaga & Hernquist 2004b; Bharadwaj & Pandey 2005; Cooray 2005; Iliev et al. 2006; Mellema et al. 2006b; Harker et al. 2009; Mondal et al. 2015; Dixon et al. 2016). As a result, many completely different underlying H i distributions could give rise to the same power spectrum and variance for the signal, which would lead to a large degeneracy and ambiguity in the estimated astrophysical parameters through these statistics. This is why it is essential to quantify and characterize the signal beyond these one- and two-point statistics.

One way to achieve this is through a hierarchy of the higher order correlation functions (such as three- and four-point correlation functions) and their corresponding Fourier analogues (Bharadwaj & Pandey 2005; Pillepich, Porciani & Matarrese 2007; Cooray, Li & Melchiorri 2008; Muñoz, Ali-Haïmoud & Kamionkowski 2015; Mondal et al. 2015; Mondal, Bharadwaj & Majumdar 2017), which can be directly estimated using the observed visibilities in a radio interferometer.

Another alternative to understanding and characterizing the signal more fully is through the images of the 21-cm signal (thus the H i distribution in the IGM) at different stages of reionization. This is more popularly known as ‘21-cm tomography’ (see e.g. Madau, Meiksin & Rees 1997), and it will essentially provide us with a time lapse movie of the state of the H i in a certain portion of the sky during the EoR. As radio interferometry is only sensitive to the fluctuations in the signal, not its absolute value, regions of ionized hydrogen will appear as negative ‘cold spots’ in these images. Thus, studying the properties of cold spots in the 21-cm tomographic data will provide a direct insight into ‘the morphology of reionization’, which is characterized by the shapes, sizes, and spatial distribution of the H ii regions. These studies will help us to directly quantify the ionized volume-filling fraction, the mean size, and the bubble size distribution of the H ii regions.

For this reason, the 21-cm tomographic imaging during the CD and the EoR has been identified as one of the major science goals of the future radio interferometer Square Kilometre Array5 (SKA) (Koopmans et al. 2015; Mellema et al. 2015; Wyithe, Geil & Kim 2015). Even currently operating interferometers such as LOFAR and MWA are expected to produce relatively coarser resolution images at a moderate SNR (Zaroubi et al. 2012; Malloy & Lidz 2013). However, the tomographic data cubes still need to be analysed in a statistical sense in order to derive astrophysical parameters.

The data analysis strategy of 21-cm tomographic images is a relatively unexplored field. Previous works have focused on topological analysis of 21-cm tomography through the Minkowski functionals (Gleser et al. 2006; Lee et al. 2008; Friedrich et al. 2011a; Hong et al. 2014; Yoshiura et al. 2016) or by extracting the size, shape, and location of the H ii region using the matched filter technique on the raw observed visibilities and images (Datta, Bharadwaj & Choudhury 2007; Datta et al. 2008; Datta, Bharadwaj & Choudhury 2009; Majumdar et al. 2011; Datta et al. 2012a, 2016; Majumdar, Bharadwaj & Choudhury 2012; Malloy & Lidz 2013). The latter have explored both targeted (where a very bright high-redshift ionizing source has already been identified through observations at other wavelengths) and blind (where the location of the ionizing source is not known) searches for H ii regions on the observed visiblities or on the image planes. For example, Malloy & Lidz (2013) have demonstrated that by using this matched filtering method, it is also possible to extract the size distribution of the 21-cm cold spots from an observational data set. Some additional work has been done on studying size distributions in simulations of reionizations (Friedrich et al. 2011b; Lin et al. 2016), but the application of these techniques to 21-cm tomography has not yet been explored.

One of the main challenges in recovering the statistical measures of the morphology (e.g. shape or size distribution) of the H ii regions from 21-cm tomography is due to the fact that there is no unique definition of the ‘size or shape of an H ii region’. Several methods and definitions of H ii region size and shape have been used to extract the size distribution from simulated 21-cm data (Iliev et al. 2006; McQuinn et al. 2007; Mesinger & Furlanetto 2007; Zahn et al. 2007; Friedrich et al. 2011a; Majumdar et al. 2014; Lin et al. 2016). So far, there does not seem to be a consensus on how these methods can actually trace the underlying size distribution of the |${\rm H\,{\small {II}}}$| regions. Apart from the ambiguity of the definition of an |${\rm H\,{\small {II}}}$| region, the other major challenge in obtaining the size distribution is to identify the ‘true’ ionized cells from all the cold spots present in the 21-cm tomographic data. As radio interferometers can only capture fluctuations in the signal, the tomographic data will have mean subtracted signal represented by positive and negative pixels in the image plane at a certain redshift. A pixel containing neutral gas but having density below the mean density could appear as a negative pixel. In addition, the mean of the 21-cm signal (which corresponds to the ‘zero’ of the signal in that data volume) will also vary with redshift as reionization progresses from its early to late stages. These two intrinsic features of the observed signal will produce additional confusion in identifying true ionized pixels from 21-cm images.

The main goal of this paper is to introduce a novel granulometric analysis method in the context of the EoR 21-cm tomography for the first time. We focus on measuring the |${\rm H\,{\small {II}}}$| region size distribution from 3D data cube and 2D images, hence, directly characterizing the aspect of the morphology of |${\rm H\,{\small {II}}}$| regions. This granulometric technique is based on a well-defined method in mathematical morphology and stochastic geometry (Serra 1983; Dougherty & Lotufo 2003; Chiu et al. 2013), which provides a well-formulated definition and the characterization of ‘size’ of the objects based on the idea of sieving (Matheron 1975). We further investigate the possibility and requirement to recover the |${\rm H\,{\small {II}}}$| region size distribution in 21-cm tomography from future radio interferometric observations.

This paper is organized as follows. Section 2 describes the 21-cm and reionization models. Section 3 introduces granulometry in the context of our data analysis methodology. Section 4 describes our mock data cube under a simple model for radio interferometry. Section 5 presents the theoretical aspects of the granulometric analysis of 21-cm tomography. Sections 6 and 7 present the observational prospects and requirements for measuring the |${\rm H\,{\small {II}}}$| region size distribution using 3D image cubes and 2D image slices from 21-cm tomography. The implications for baseline design and observing strategy are discussed in Section 8. Finally, our conclusions are summarized in Section 9.

2 EOR 21-cM SIGNAL

The spin flip transition of the hydrogen ground state corresponds to the 21-cm line with ν21 = 1420.4 MHz, whose emission or absorption at redshift z is observed at a frequency 1420.4(1 + z) − 1MHz. The differential brightness temperature of 21-cm line against the cosmic microwave background (CMB) is given by (e.g. Field 1958, 1959; Madau et al. 1997; Pritchard & Loeb 2012)
\begin{equation} \,\delta T_{21}=T_0(z)(1+\delta _{\rm b})x_{\rm {H\,{\small I}}}\left(1-\frac{T_{\rm CMB}(z)}{T_{\rm S}}\right), \end{equation}
(1)
where the pre-factor |$T_0(z)\approx 27{\,\rm mK}(\frac{\Omega _{\rm b} h^2}{0.023})(\frac{0.15}{\Omega _{\rm m}h^2}\frac{1+z}{10})^{1/2}$| depends on cosmological parameters and redshift. For simplicity, we ignore the effect of redshift–space distortions due to the peculiar velocities of matter along the line of sight.
The spin temperature TS characterizes the relative populations of the two spin states. This population is determined by collisions and radiative excitations. Here, we assume that the spin temperature is coupled to the kinetic temperature of gas and that the gas temperature is always above the CMB temperature. These conditions are expected to be valid for most of the reionization period (likely z < 12). This simplifies equation (1) to
\begin{equation} \delta T_{21}=T_0(z)(1+\delta _{\rm b})x_{\rm {H\,{\small I}}}, \end{equation}
(2)
which only depends on the neutral fraction of hydrogen and the baryon density fluctuations. We adopt this so-called full-coupling approximation throughout the paper.

2.1 Models for 21-cm maps from EoR

To demonstrate the use of granulometric analysis in 21-cm tomography, we employ two models to characterize the distribution of ionized regions: (1) a toy model in which we impose a lognormal bubble size distribution and (2) the results from detailed radiative transfer (RT) simulations.

2.1.1 Lognormal bubble model

In this simple approach, the reionization process is modelled as a percolation of spherical |${\rm H\,{\small {II}}}$| regions in a homogeneous IGM density field. This simple model is constructed by randomly distributing spherical ionized bubbles in a simulation box, and does not include any correlation of the bubble distribution with the underlying density field. Motivated by McQuinn et al. (2007) and Friedrich et al. (2011b), the sizes of the radii of Nb bubbles are randomly drawn from a lognormal distribution of the bubble sizes R,
\begin{equation} \frac{{\rm d}P(R)}{{\rm d}R}=\frac{1}{\sqrt{2\pi \sigma _{\rm R}^2}R}\exp \left[-\frac{(\ln R-\ln \bar{R})^2}{2\sigma _{\rm R}^2}\right], \end{equation}
(3)
where Nb, |$\bar{R}$|⁠, and σR are the three parameters of the model. The ionization profile of each bubble is assumed to be a spherical top-hat function |$x_{\rm {H\,{\small {II}}}}(\mathbf {r})=\Pi (\boldsymbol {r}-\boldsymbol {r}_i|R_{i})$| of radius Ri centred at a position ri. Note that when multiple bubbles overlap we take the maximum of all the ionized fractions (=1).

Finally, we use Monte Carlo realizations of the lognormal bubble model in a volume Vbox = (1 h − 1 cGpc)3 with 2563 cells and 3.9 h − 1 cMpc resolution to create a differential brightness temperature map.

2.1.2 Radiative transfer simulation

To demonstrate our new data analysis methodology for 21-cm tomography, we use snapshots from a full RT simulation performed within the PRACE4LOFAR project.6 Reionization is modelled by post-processing an N-body simulation with an RT calculation. The N-body code used is cubep3m (Harnois-Déraps et al. 2013) and the RT code is c2ray (Mellema et al. 2006a). We assume a flat Λcold dark matter cosmology with the Wilkinson Microwave Anisotropy Probe 5 parameters |$h=0.7,\ \Omega _{\rm m}=0.27,\ \Omega _\Lambda =0.73,\ \Omega _{\rm b}=0.044,\ \sigma _8=0.8,\ {\rm and }\,n_{\rm s}=0.96$| (Komatsu et al. 2009), which are also consistent with Planck data (Planck Collaboration XIII 2015).

The simulation we use throughout this paper is similar to the LB1 model in Dixon et al. (2016), where details regarding simulation techniques and methods can be found. However, the volume considered here is a larger 500 h − 1 cMpc on each side for which the N-body simulation was run with 61923 particles with mass resolution 4.05 × 107 M (Dixon et al. in preparation). The minimum dark matter halo mass used in the RT simulation is 1 × 109 M (25 particles). The RT simulation was performed on smoothed and gridded density fields consisting of 3003 cells. The ionizing photons from the sources (galaxies) are assumed to linearly scale with the host halo mass such that the number of ionizing photons released into the IGM is gγ = 1.7 per baryon per 107 yr. This RT simulation of reionization runs from z = 21 to 6.

We use the resulting snapshots of the |${\rm H\,{\small {II}}}$| fraction and the gas density (on a 3003 grid) for the analysis presented in this paper. We select the z = 6.8 snapshot as our fiducial reference redshift unless otherwise stated. At this redshift, the volume-averaged ionized fraction corresponds to 〈xH IIV = 0.40 (see Section 5.3 for a discussion on the dependence of our results on this choice).

3 DATA ANALYSIS METHOD

This section describes basic concepts involved in our data analysis methodology of 21-cm tomography using the granulometric technique. The same analysis method is used both for noiseless and noisy data.

3.1 Granulometry

Granulometry is a technique in mathematical morphology and image analysis that measures a size distribution of objects (Serra 1983; Dougherty & Lotufo 2003). The central idea is based on the concept of sieving (Matheron 1975). This provides a mathematically well-defined measure of the size distribution of a collection of objects, in our case |${\rm H\,{\small {II}}}$| regions and 21-cm cold spots, the latter defined as regions where the differential brightness temperature is less than a certain threshold value. The tool we used for our granulometric analysis is publicly available online.7

3.1.1 Binary images and data cubes

Since we need to define objects, the first step is the creation of a binary field of the quantity of interest (either ionization fraction or 21-cm signal). A binary field of the |${\rm H\,{\small {II}}}$| fraction map, denoted by |$X_{\rm H\,{\small {II}}}(\boldsymbol {r})$|⁠, is defined as:
\begin{eqnarray} X_{\rm H\,{\small{II}}}(\boldsymbol {r})= \left\lbrace \begin{array}{cc}1 &\quad \mbox{if $x_{\rm H\,{\small{II}}}(\boldsymbol {r})\ge x_{\rm H\,{\small{II}}}^\mathrm{th}$},\\ 0 &\quad \mbox{if $x_{\rm H\,{\small{II}}}(\boldsymbol {r})<x_{\rm H\,{\small{II}}}^\mathrm{th}$}. \end{array} \right. \end{eqnarray}
(4)
where |$x_{\rm H\,{\small{II}}}^\mathrm{th}$| is a given ionization threshold. The ionized regions are marked as ones in the binary field. Throughout this paper, we take |$x_{\rm H\,{\small{II}}}^\mathrm{th}=0.5$| to mark the 50 per cent transition between neutral and ionized phase.
Similarly, binary fields of the 21-cm signal, denoted by X21(r), are produced introducing a threshold value for the pixels of the mean subtracted 21-cm signals,
\begin{eqnarray} X_{21}(\boldsymbol {r})= \left\lbrace \begin{array}{cc}1 &\quad \mbox{if $\Delta T_{21}(\boldsymbol {r})<\Delta T_{21}^\mathrm{th}$},\\ 0 &\quad \mbox{if $\Delta T_{21}(\boldsymbol {r})\ge \Delta T_{21}^\mathrm{th}$}, \end{array} \right. \end{eqnarray}
(5)
where ΔT21 = δT21 − 〈δT21〉 is the mean subtracted 21-cm brightness temperature fluctuation. Because we are interested in the size distribution of 21-cm cold spots, we define the pixels below the threshold as ones. Throughout this paper, we take |$\Delta T_{21}^\mathrm{th}=0$| and we call ‘cold spots’ all regions below this value. We discuss the possibility of varying this threshold value in Section 5.3. We also define voids in the density field in the same way as 21-cm cold spots, i.e. a connected pixels below the mean, but using the density field instead of 21-cm signal. Note that r is the Cartesian coordinate converted from the angular and frequency separations in the data cube.
The total volume filled by 21-cm cold spots or |${\rm H\,{\small {II}}}$| regions is then given by (here the subscript indicating the physical quantity is dropped),
\begin{equation} V[X]=\int X(\boldsymbol {r}){\rm d}^3r. \end{equation}
(6)
The volume-filling factors of |${\rm H\,{\small {II}}}$| regions and 21-cm cold spots in a 3D tomographic data cube are then given by QH ii = V[XH ii]/Vbox and Q21 = V[X21]/Vbox, respectively. The filling factor |$Q_{\rm 21}^{\rm 2{\rm D}}$| for a 2D 21-cm image can be defined in an analogous way.

3.1.2 Size distribution measurement

The granulometric analysis technique is applied on the binary field. The basic idea is to probe the field with a so-called structuring element of a certain shape. This (loosely speaking) represents the shape of holes in a sieve. For our analysis, we choose the structuring element to be a sphere of radius R (or a disc of radius R when analysing a 2D image), defined as SR = Π(rr0|R), where r0 is the coordinate of the centre of the structuring element. The mathematical formulation of the concept of sieving corresponds to a morphological opening operation8 denoted by a symbol ∘ (e.g. Dougherty & Lotufo 2003). Hereafter, we refer to the morphological opening operation as sieving. A brief introduction is presented in Appendix A.

The sieving of the binary field X through a ‘hole’ of radius R is thus mathematically expressed as
\begin{equation} X^{\prime }(\boldsymbol {r})=X\circ S_{\rm R}. \end{equation}
(7)
The new binary field X΄(r) represents the parts of the original binary field X that remain after sieving (see Section 5.1 for an example). Structures smaller than the radius of the structuring element are removed from the sieved binary field X΄(r). In astrophysical terms, the sieved ionization fraction field or 21-cm image only contains the |${\rm H\,{\small {II}}}$| regions or cold spots larger than the radius of the structuring element.
This concept of sieving defines the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions in a mathematically well-formulated way. In the granulometric analysis, the cumulative size distribution F(<R), i.e. the fraction of structures whose size is smaller than a radius R, is given by the fraction of volume removed by sieving (e.g. Serra 1983; Dougherty & Lotufo 2003),
\begin{equation} F(<R)=1-\frac{V[X\circ S_{\rm R}]}{V[X]}. \end{equation}
(8)
Granulometry is basically a method to measure the size distribution by counting, using successive sieving operations on a binary field with an increasing size of the structuring element. In other words, granulometry counts the number of objects that ‘fit’ in the structure of interest. For example, a large, irregularly connected |${\rm H\,{\small {II}}}$| region is decomposed into a sum of smaller spherical objects, whereas a large spherical |${\rm H\,{\small {II}}}$| region is counted as one object. The number of spherical objects should also be minimized to describe the original structure.
The (differential) size distribution, dF(<R)/dR, is given by differentiating the cumulative size distribution. For practical applications, we use the above granulometric analysis on a discrete (pixelized) field. A structuring element then has a discrete radius Ri = i × ΔR, where ΔR = L/Npix, L is the comoving size of the data cube or image, Npix is the number of pixels per dimension, and i = 1, 2, …, Npix/2. The differential size distribution is estimated from the discrete cumulative size distribution,
\begin{equation} \frac{{\rm d}F(<R_i)}{{\rm d}R}\equiv \frac{F(<R_{i+1})-F(<R_i)}{\Delta R}. \end{equation}
(9)
Hereafter, dF21(<R)/dR and dFH ii(<R)/dR denote the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions measured from the corresponding binary fields, X21 and XH ii, respectively.

An advantage of the granulometric measure of the size distribution is that it attempts to recover the true underlying probability distribution function of sizes (Serra 1983). In Section 5.1, we verify this property using an explicit example. In the terminology of Lin et al. (2016), the granulometry method is an unbiased estimator of the true size distribution.9 We compare the granulometric method with other size estimators in Section 8.3.

3.1.3 Moments of size distributions and the normalization

The mean size of |${\rm H\,{\small {II}}}$| regions or cold spots is given as the first moment of the size distribution,
\begin{equation} \langle R\rangle =\int _0^\infty R\frac{{\rm d}F(<R)}{{\rm d}R}{\rm d}R. \end{equation}
(10)
The higher order moments of a size distribution can be defined in a similar way to characterize the variance and skewness of the distribution.
In addition, as we will show in Section 5, it is convenient to normalize the size distribution with the volume-filling factor such that
\begin{equation} \frac{{\rm d}Q(<R)}{{\rm d}R}\equiv Q\frac{{\rm d}F(<R)}{{\rm d}R}. \end{equation}
(11)
When the size distribution is normalized to the volume-filling factor, one can interpret this quantity as the fraction of the total volume-filling factor contributed by regions of size R.

3.2 Error estimation

So far, we have only introduced the theory for granulometric analysis. When measuring the |${\rm H\,{\small {II}}}$| region size distributions from observation of 21-cm cold spots, we must also assess the associated error as any measurement comes with uncertainty. The statistical uncertainties of the results come both from the sample variance of the signal and the thermal noise of the instrument. We estimate the error in the measured size distribution of 21-cm cold spots using an ensemble of mock data cubes with noise.

The thermal noise covariance matrix for the cold-spot size distribution is calculated using Nnoise = 100 Monte Carlo realizations of noise (Section 4.4). Each independent noise cube is added to a 21-cm data cube. We then perform the measurement for each of these noise-added mock data cubes. The noise covariance matrix |$C_{ij}^N$| for each pair of pixelized radius bins, Ri and Rj, is given by
\begin{eqnarray} C_{ij}^N&=&\frac{1}{N_{\rm noise}}\sum _{n=1}^{N_{\rm noise}} \left[\left.\frac{{\rm d}Q(<R_i)}{{\rm d}R}\right|_n-\left\langle \frac{{\rm d}Q(<R_i)}{{\rm d}R}\right\rangle \right]\nonumber \\ &&\times \left[\left.\frac{{\rm d}Q(<R_j)}{{\rm d}R}\right|_n-\left\langle \frac{{\rm d}Q(<R_j)}{{\rm d}R}\right\rangle \right], \end{eqnarray}
(12)
where 〈dQ(<Ri)/dR〉 is the average over Nnoise realizations.
The sample variance of the error covariance matrix is calculated using the sub-volume method (Norberg et al. 2009). For the analysis of a full 3D data cube, a mock 21-cm data cube is split into Nsample = 8 equal sub-volumes. We calculate the covariance matrix of each sub-volume, and the total covariance matrix of the full volume is estimated as an average of the matrices after re-scaling by |$N^{-1/2}_{\rm sample}$|⁠. For the analysis of a 2D image, we randomly select Nsample = 100 slices along the random principal axis. We repeatedly perform the measurement using the random sub-samples. The sample variance of the covariance matrix |$C_{ij}^S$| for each pair of pixelized radius bins, Ri and Rj, is given by
\begin{eqnarray} C_{ij}^S&=&\frac{1}{N_{\rm sample}}\sum _{m=1}^{N_{\rm sample}} \left[\left.\frac{{\rm d}Q(<R_i)}{{\rm d}R}\right|_m-\left\langle \frac{{\rm d}Q(<R_i)}{{\rm d}R}\right\rangle \right]\nonumber \\ &&\times \left[\left.\frac{{\rm d}Q(<R_j)}{{\rm d}R}\right|_m-\left\langle \frac{{\rm d}Q(<R_j)}{{\rm d}R}\right\rangle \right]. \end{eqnarray}
(13)

The total error covariance matrix is then given by |$C_{ij}=C_{ij}^N+C_{ij}^S$|⁠. The error is estimated according to the covariance matrix. Note that the sample variance error, which scales with the survey volume as |$\propto V_{\rm survey}^{-1/2}$|⁠, typically dominates. The thermal noise error has a more complicated dependency on the survey volume as it couples with the sample variance error (see Section 7.1), but it is usually a small contribution to the total error budget. We present 1σ error bars in this paper unless otherwise stated.

4 MOCK INTERFEROMETRIC OBSERVATIONS

In this section, we describe how we construct our mock data cubes, which represent the signal as observed with an idealized SKA-like radio interferometer.

4.1 Instrument

We consider EoR experiments with an SKA-like radio interferometer. Our reference array configuration is based on the SKA1-low baseline design documented by Dewdney (2016).10 SKA1-low has a frequency coverage between 50 and 350 MHz, corresponding to the redshift range 3 ≲ z ≲ 27. The assumed design consists of a total of 512 stations. The final configuration is still being developed but here we assume that approximately half of the stations are distributed within a radius of ∼ 1 km; therefore, we assume that the number of core stations is Nst = 256. We assume a complete uv-coverage within a baseline length of 2 km (we refer to this as the maximum baseline length bmax ). The station diameter is taken to be Dst = 35 m. We assume an idealized instrument with an effective area which below the critical frequency νcrit is the same as the geometric area of the station Aeff = π(Dst/2)2, and above it falls off as (νcrit/ν)2 where νcrit = 110 MHz. The system temperature is given by the sum of the receiver noise, Trcvr = 40 K, and the sky temperature |$T_{\rm sky}=60(\nu /300{\,\rm MHz})^{-2.55}{\,\rm K}$|⁠. We refer to this idealized instrument as ‘SKA1-low’. The instrumental parameters are summarized in Table 1.

Table 1.

SKA1-low (and SKA2) instrument parameters.

Configuration
Frequency coverage50–350 MHz
Number of core stations, Nst256
Max. baseline, bmax2 kma
Station diameter, Dst35 m
Effective area, Aeff962 m2
Critical frequency, νcrit110 MHz
System temperature, Tsys|$40{\,\rm K}+60(\nu /300{\,\rm MHz})^{-2.55}{\,\rm K}$|
Angular resolution, θA2.58(ν/200 MHz) − 1 arcmin
Primary beam’s FWHM3.12(ν/200 MHz) − 1 deg
Configuration
Frequency coverage50–350 MHz
Number of core stations, Nst256
Max. baseline, bmax2 kma
Station diameter, Dst35 m
Effective area, Aeff962 m2
Critical frequency, νcrit110 MHz
System temperature, Tsys|$40{\,\rm K}+60(\nu /300{\,\rm MHz})^{-2.55}{\,\rm K}$|
Angular resolution, θA2.58(ν/200 MHz) − 1 arcmin
Primary beam’s FWHM3.12(ν/200 MHz) − 1 deg

aOur model SKA2 increases the maximum baseline to 4 km.

Table 1.

SKA1-low (and SKA2) instrument parameters.

Configuration
Frequency coverage50–350 MHz
Number of core stations, Nst256
Max. baseline, bmax2 kma
Station diameter, Dst35 m
Effective area, Aeff962 m2
Critical frequency, νcrit110 MHz
System temperature, Tsys|$40{\,\rm K}+60(\nu /300{\,\rm MHz})^{-2.55}{\,\rm K}$|
Angular resolution, θA2.58(ν/200 MHz) − 1 arcmin
Primary beam’s FWHM3.12(ν/200 MHz) − 1 deg
Configuration
Frequency coverage50–350 MHz
Number of core stations, Nst256
Max. baseline, bmax2 kma
Station diameter, Dst35 m
Effective area, Aeff962 m2
Critical frequency, νcrit110 MHz
System temperature, Tsys|$40{\,\rm K}+60(\nu /300{\,\rm MHz})^{-2.55}{\,\rm K}$|
Angular resolution, θA2.58(ν/200 MHz) − 1 arcmin
Primary beam’s FWHM3.12(ν/200 MHz) − 1 deg

aOur model SKA2 increases the maximum baseline to 4 km.

In addition, we consider a future SKA2-like instrument extending the core array of SKA1-low to a 2 km radius, and achieving the complete uv-coverage within the maximum baseline of bmax = 4 km. This improves the angular resolution by a factor of 2, but keeps the field of view (FoV) of the single pointing the same as the SKA1-low. We refer to this instrument as ‘SKA2’.11

4.2 21-cm signal

The 21-cm signal is calculated following equation (2) using the results of the RT simulations described in Section 2.1.2. Our fiducial analysis uses the redshift snapshot at z = 6.8 (corresponding to a frequency of 182 MHz), where the simulation box has an extension of ∼ 4.6 deg on a side at this redshift. The spatial resolution is Δr = 1.67 h − 1 cMpc. Each pixel therefore corresponds to an angular size of 0.92 arcmin. We construct a simulated 21-cm data cube using a coeval snapshot from the RT simulation.12 Each 2D slice is one pixel width and it corresponds to a bandwidth B = ν21H(zr/[c(1 + z)2] ≈ 0.15 MHz, where H(z) is the Hubble parameter and c is the speed of light.

4.3 Angular and frequency resolution

The angular resolution of the radio interferometric observation is characterized by the maximum baseline of the array as θA = λ/bmax radians (see Table 1). For SKA2, the angular resolution is increased by a factor of 2. We implement the angular response of the interferometer by convolving with a Gaussian point spread function (PSF), R(θ), with full width at half-maximum (FWHM) corresponding to θA = 2.58(ν/200 MHz) − 1 arcmin.

The frequency resolution is determined by the design of the instrument, and for SKA1-low it is expected to be better than 1 kHz. However, in practice, when analysing the signal a lower frequency resolution is used to increase the SNR. Here, we assume that the data are smoothed in the frequency direction with a Gaussian kernel of exactly the same physical size as the angular PSF. For the chosen redshift, this implies an FWHM of 453 kHz.

4.4 Noise

The point-source sensitivity of an interferometer is given by (e.g. Thompson, Moran & Swenson 2001, equation 6.62)
\begin{equation} \sigma _{\rm S}=\frac{2k_{\rm B}T_{\rm sys}}{\epsilon A_{\rm eff}\sqrt{N_{\rm st}(N_{\rm st}-1)Bt_{\rm int}}}, \end{equation}
(14)
where tint is the integration time of an observation and ε is the efficiency factor described in Section 4.1. For imaging (i.e. 21-cm tomography), we are concerned with the rms brightness temperature sensitivity (in units of K) of an image at angular scale |$\Omega _{\rm A}=(\pi /4\ln 2)\theta _{\rm A}^2$| (Condon & Ransom 2016),
\begin{eqnarray} \sigma _{\rm N}&=&\left(\frac{\sigma _{\rm S}}{\Omega _{\rm A}}\right)\frac{\lambda ^2}{2k_{\rm B}} \nonumber \\ & \approx & 7.85\left(\frac{t_{\rm int}}{1000{\,\rm hr}}\right)^{-1/2}\left(\frac{B}{0.453{\,\rm MHz}}\right)^{-1/2}\left(\frac{\theta _{\rm A}}{2.83^{\prime }}\right)^{-2}\,{\rm mK},\nonumber\\ \end{eqnarray}
(15)
at the observed frequency 182 MHz (z = 6.8) and on the scale of the maximum angular resolution element of the SKA1-low.13 This noise estimate is somewhat optimistic as it assumes full uv-coverage and more detailed calculations suggest noise levels which are between a factor 1.5 and 2 higher.

As explained in Section 3.2, we produce 100 Monte Carlo realizations of the noise cubes. We generate white (Gaussian) noise fields, which have the same spatial (angular) scale and frequency range as the 21-cm data cube (Section 4.2). The rms noise level 〈(ΔTN)2〉 at the scale of the resolution element is then normalized according to equation (15). Because of the assumption of white noise, the noise power spectrum scales as |$\Delta _{\rm N}^2(k)\propto k^3$|⁠.

To quantify the image quality, we define the SNR for a data cube as the ratio of the rms fluctuations between an image cube and a noise cube on the scale of resolution element |${\rm SNR}(\theta _{\rm A})=\sqrt{\langle (\Delta T_{21})^2\rangle /\langle (\Delta T_{N})^2\rangle }$|⁠.

4.5 Foregrounds

We assume that the various foreground signals are perfectly removed from our data cube. Chapman et al. (2015) discussed the effect of different foreground removal techniques on the reconstructed 21-cm images, showing that good quality reconstructed 21-cm data cubes are in principle obtainable. Studying the impact of foreground residuals on the 21-cm tomographic analysis is beyond the scope of this paper.

5 GRANULOMETRIC ANALYSIS

In this section, we first present the results of granulometric analysis of one constructed and one simulated distribution of |${\rm H\,{\small {II}}}$| regions, as well as of the noiseless 21-cm signals associated with the latter. The goal is to understand the physical properties probed by the granulometric analysis and how well the 21-cm cold-spot size distribution traces the underlying size distribution of the |${\rm H\,{\small {II}}}$| regions.

5.1 A proof of concept: lognormal bubble model

As a proof of concept, we apply the granulometric measurement of size distribution of |${\rm H\,{\small {II}}}$| regions to a Monte Carlo realization of the 2D lognormal bubble model from Section 2.1.1. Fig. 1 shows an example of sieving the morphology of |${\rm H\,{\small {II}}}$| regions, represented by the white areas. The original distribution of |${\rm H\,{\small {II}}}$| regions, i.e. the binary image |$X_{\rm H\,{\small{II}}}(\boldsymbol {r})$|⁠, is shown in the left-hand panel. When it is sieved (XSR) with a disc of radius R = 27 h − 1 cMpc (middle panel), the structures smaller than the radius of the disc are removed. A larger disc of radius R = 39 h − 1 cMpc (right-hand panel) sieves most of the structures and only some of the largest |${\rm H\,{\small {II}}}$| regions remain.

Example of sieving for a 2D lognormal bubble model in a 1 h − 1 cGpc box on a side. The red circle shows the radius of the structuring element. The left-hand panel shows the original (unsieved) distribution of ${\rm H\,{\small {II}}}$ regions. The middle and right-hand panels show the distributions obtained by sieving the original image with a disc of radius 27 and 39 h − 1 cMpc, respectively.
Figure 1.

Example of sieving for a 2D lognormal bubble model in a 1 h − 1 cGpc box on a side. The red circle shows the radius of the structuring element. The left-hand panel shows the original (unsieved) distribution of |${\rm H\,{\small {II}}}$| regions. The middle and right-hand panels show the distributions obtained by sieving the original image with a disc of radius 27 and 39 h − 1 cMpc, respectively.

The histogram in Fig. 2 shows the differential size distribution measured with the sieving procedure. The differential size distribution clearly picks up the underlying probability distribution function of |${\rm H\,{\small {II}}}$| region sizes used by the lognormal model (dashed curve). The agreement is not perfect because of overlapping |${\rm H\,{\small {II}}}$| regions, which causes a departure from spherical shapes. Nevertheless, the lognormal model proves that the granulometric measure of the size distribution traces the underlying probability distribution of |${\rm H\,{\small {II}}}$| regions. From this, we conclude that the granulometric analysis is a very useful statistical tool to quantify the size distribution of |${\rm H\,{\small {II}}}$| regions with a mathematically well-motivated framework.

Differential size distributions of ${\rm H\,{\small {II}}}$ regions from granulometric analysis (solid line). The dashed curve shows the input probability distribution function of ${\rm H\,{\small {II}}}$ region sizes in the lognormal model.
Figure 2.

Differential size distributions of |${\rm H\,{\small {II}}}$| regions from granulometric analysis (solid line). The dashed curve shows the input probability distribution function of |${\rm H\,{\small {II}}}$| region sizes in the lognormal model.

5.2 Size distribution of |${\rm H\,{\small {II}}}$| regions: RT simulation

Having demonstrated the potential of granulometry, we next apply the analysis to the RT simulation (Section 2.1.2). Fig. 3 (histogram) shows the differential size distribution of |${\rm H\,{\small {II}}}$| regions measured from the ionization fraction field |$x_{\rm {H\,{\small {II}}}}(\bf {r})$| at z = 6.8. The vertical line shows the mean size of |${\rm H\,{\small {II}}}$| regions. The black curve is an analytic fitting formula (a modified Schechter function),
\begin{equation} R\frac{{\rm d}F(>R)}{{\rm d}R}=\left(\frac{R}{R_0}\right)^\alpha \exp \left[-\left(\frac{R}{R_\ast }\right)^\beta \right], \end{equation}
(16)
where R*, α, and β are free parameters and R0 is determined from the normalization |$\int _0^\infty \frac{{\rm d}F(>R)}{{\rm d}R}{\rm d}R=1$|⁠, which gives an analytic expression R0 = R*[Γ(α/β)/β]1/α, where Γ is the Gamma function. The granulometric measurement of the size distribution of |${\rm H\,{\small {II}}}$| regions is remarkably well described by this modified Schechter function. The position of the peak size, the power-law slope at small sizes, and the exponential cut-off at large sizes are captured perfectly. The best-fitting parameters are R* = 10.0 h − 1 cMpc, α = 0.71, and β = 2.78. The mean radius then has the analytic expression |$\langle R\rangle =R_\ast [\Gamma (\frac{1+\alpha }{\beta })/\Gamma (\frac{\alpha }{\beta })]\simeq 4.12\,h^{-1}\,\rm cMpc$|⁠.14 We have also tested the validity of the modified Schechter function for other redshifts, when the average neutral fraction is different, and find that the size distribution is fit very well by the modified Schechter function over a wide range of redshifts.
Differential size distribution of ${\rm H\,{\small {II}}}$ regions measured from the granulometric analysis of the ionization structure, $x_{\rm {H\,{\small {II}}}}(\bf {r})$, in the RT simulation (red histograms) at z = 6.8 (〈xH II〉V = 0.40). The black curves are the best-fitting modified Schechter functions (see the text). The vertical line indicates the mean radius of the size distribution. Top: linear scale plot. Bottom: log–log scale plot to show the exponential drop of the large size end and the power-law slope of the small size end of the ${\rm H\,{\small {II}}}$ regions.
Figure 3.

Differential size distribution of |${\rm H\,{\small {II}}}$| regions measured from the granulometric analysis of the ionization structure, |$x_{\rm {H\,{\small {II}}}}(\bf {r})$|⁠, in the RT simulation (red histograms) at z = 6.8 (〈xH IIV = 0.40). The black curves are the best-fitting modified Schechter functions (see the text). The vertical line indicates the mean radius of the size distribution. Top: linear scale plot. Bottom: log–log scale plot to show the exponential drop of the large size end and the power-law slope of the small size end of the |${\rm H\,{\small {II}}}$| regions.

There are physical motivations why the |${\rm H\,{\small {II}}}$| region size distribution follows the form of a modified Schechter function. First, suppose that the ultraviolet (1500 Å) luminosity function of galaxies responsible for driving reionization follows the Schechter function |$\phi (L_{1500})\propto (L_{1500}/L^\ast _{1500})^{\alpha _{\rm L}}\exp (-L_{1500}/L^\ast _{1500})$| with a characteristic luminosity |$L^{\ast }_{1500}$| and a faint-end slope αL. The ionizing photon luminosity |$\dot{N}_{\rm ion}$| (in units of s − 1) at <912 Å from a galaxy is given by the product |$\dot{N}_{\rm ion}=f_{\rm esc}\xi _{\rm ion}L_{1500}$| (e.g. Robertson et al. 2013), where fesc is the escape fraction and ξion is the ratio between the 1500 Å luminosity and the intrinsic ionizing photon production rate of a galaxy. Assuming that all galaxies have their own |${\rm H\,{\small {II}}}$| regions during the early pre-overlapping phase, the bubble number density dnb(R)/dR per unit radius of |${\rm H\,{\small {II}}}$| regions is given by |$\frac{{\rm d}n_{\rm b}(R)}{{\rm d}R}{\rm d}R=\phi (L_{1500}){\rm d}L_{1500}$|⁠. By estimating the radius of an |${\rm H\,{\small {II}}}$| region by counting photons (cosmological Strömgren sphere), |$R=[\frac{3\dot{N}_{\rm ion}t_{{\rm G}}}{4\pi \bar{n}_{\rm H}(0)}]^{1/3}\propto L_{1500}^{1/3}$|⁠, where tG is the time interval during which a galaxy is ionizing the IGM and |$\bar{n}_{\rm H}(0)$| is the comoving mean number density of hydrogen atoms. Therefore, the bubble number density scales as |${\rm d}n_{\rm b}(R)/{\rm d}R\propto R^{3\alpha _{\rm L}+2}\exp [-(R/R_\ast )^3]$|⁠, where the characteristic size R* is given by the ionizing properties of early galaxies, |$R_\ast =[\frac{3f_{\rm esc}\xi _{\rm ion}L^\ast _{1500}t_{{\rm G}}}{4\pi \bar{n}_{\rm H}(0)}]^{1/3}$|⁠. Hence, because dF(<R)/dR ∝ dnb(R)/dR, we expect the |${\rm H\,{\small {II}}}$| region size distribution to scale as |$R\,{\rm d}F(<R)/{\rm d}R\propto R^{3(\alpha _{\rm L}+1)}\exp [-(R/R_\ast )^3]$|⁠, as long as there is not too much overlap between bubbles. This is indeed a form of the modified Schechter function (16). A second physical motivation comes from the result of the excursion set formalism for reionization, in which the mass function for the ionized bubbles has a form of the modified Schechter function (Furlanetto, Zaldarriaga & Hernquist 2004a).

This relation between the |${\rm H\,{\small {II}}}$| region size distribution and the properties of ionizing sources allows us to provide physical interpretations for the shape of the size distribution. For smaller |${\rm H\,{\small {II}}}$| region sizes, R < R*, the distribution scales as a power law, |$R\frac{{\rm d}F(>R)}{{\rm d}R}\propto R^\alpha$|⁠. Our RT simulation gives a positive slope α = 0.71, while the above simple expectation gives a negative slope α = −3 for the faint-end slope of αL = −2. The positive value of the small size end of |${\rm H\,{\small {II}}}$| region size distribution can be interpreted as a result of the well-known characteristic of the reionization process, i.e. that the cosmological |${\rm H\,{\small {II}}}$| regions grow by merging many small ones. Consequently, the small size end of |${\rm H\,{\small {II}}}$| regions is redistributed to larger sizes as reionization progresses. For larger |${\rm H\,{\small {II}}}$| region sizes, R > R*, the interpretation is more complicated. The distribution shows an exponential drop off, scaling as ∝exp [( − R/R*)β], which could be (partially) due to the rapid decline in the population of luminous or clustered ionizing sources responsible for producing large |${\rm H\,{\small {II}}}$| regions. For a robust interpretation, a future study of the relation (if there is any) between the |${\rm H\,{\small {II}}}$| region size distribution and the properties and spatial distribution of ionizing sources is required; none the less, one may postulate that the |${\rm H\,{\small {II}}}$| region size distribution will contain the memory of the properties of ionizing sources.

5.3 Relation between |${\rm H\,{\small {II}}}$| regions and cold spots in 21-cm tomography

Observationally, we can only analyse 21-cm tomographic data. In tomographic images, the |${\rm H\,{\small {II}}}$| regions appear as cold spots, i.e. regions devoid of 21-cm emission. As only temperature fluctuations are observable in radio interferometry, those regions appear to have negatives values in the 21-cm images. Thus, intuitively the size distribution of cold spots could be directly translated into that of |${\rm H\,{\small {II}}}$| regions. However, a complication arises from the contribution of the density fluctuations to the 21-cm signal as low-density neutral regions can also give rise to negative values in the images (see equation 2). We would therefore like to answer the following question: how well do the volume-filling factor and the size distribution of 21-cm cold spots trace the volume-filling factor and size distribution of |${\rm H\,{\small {II}}}$| regions?

5.3.1 Volume-filling factors of |${\rm H\,{\small {II}}}$| regions and 21-cm cold spots

The 21-cm cold spots can encode the statistics of voids in matter density, as well as of ionized regions. To interpret the cold spots as a signature of |${\rm H\,{\small {II}}}$| regions, we should understand the nature of the void size distribution.

Fig. 4 shows the volume-filling factors of 21-cm cold spots (black solid line), |${\rm H\,{\small {II}}}$| regions (red dashed), and voids (blue dotted). The filling factors of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions are measured from the simulation box. Similarly, the filling factor of voids is calculated from the negative excursion sets δb(r) < 0 (underdense regions) of the density fluctuation field.

Volume-filling factors of 21-cm cold spots (black solid line), ${\rm H\,{\small {II}}}$ regions (red dashed), and voids (blue dotted) as a function of redshift in the RT simulation.
Figure 4.

Volume-filling factors of 21-cm cold spots (black solid line), |${\rm H\,{\small {II}}}$| regions (red dashed), and voids (blue dotted) as a function of redshift in the RT simulation.

The figure shows that in our simulation the volume-filling factor of 21-cm cold spots traces that of |${\rm H\,{\small {II}}}$| regions well for 6 < z < 7. However, when the filling factor of |${\rm H\,{\small {II}}}$| regions drops below 0.4, the filling factor of 21-cm cold spots starts to deviate from that of |${\rm H\,{\small {II}}}$| regions. At z > 9, it approaches the filling factor of voids, Qvoid.15 We refer to the difficulty in interpreting 21-cm cold spots due to the contributions from both |${\rm H\,{\small {II}}}$| regions and voids as the void–|${\rm H\,{\small {II}}}$| regions confusion.

The transition around a filling factor of 0.4 can be understood as follows. The level of the void contamination is controlled by the probability that the negative excursion of density fluctuation passes below the threshold given by the mean 21-cm signal 〈δT21〉 = T0xH I(1 + δb)〉 ≈ T0(1 − QH ii). Because the 21-cm signal from the neutral parts of the IGM is δT21 = T0(1 + δb), in order for the density fluctuations to appear as cold spots in the mean subtracted (observed) 21-cm signal, i.e. ΔT21 = δT21 − 〈δT21〉 < 0, the density fluctuations must be lower than the threshold value, δb < −QH ii. The rms density fluctuations in our simulation are σb ≈ 0.37[(1 + z)/8]−1. The volume-filling factor of |${\rm H\,{\small {II}}}$| regions QH ii ≈ 0.4 corresponds to this rms density fluctuation level. Below this value of QH ii, we expect a large contamination from the density fluctuations to 21-cm cold spots since many 1σb fluctuations can pass below the threshold level ≈−QH ii. This is the reason why the assumption that the 21-cm cold-spots filling factor is a good tracer of |${\rm H\,{\small {II}}}$| regions breaks down around QH ii ≈ 0.4(≈σb).

However, this is not a fundamental limitation of the granulometric analysis of tomographic data; the apparent confusion is associated with the somewhat arbitrary choice of the threshold value |$\Delta T^{\rm th}_{21}=0$| when creating a binary field. In a future improved granulometric analysis, the use of multiple threshold values should allow us to mitigate the apparent confusion to a large extent. In this introductory paper on granulometry however, we adopt a single threshold value |$\Delta T^{\rm th}_{21}=0$|⁠. In the following, we discuss the factors controlling the void–|${\rm H\,{\small {II}}}$| region confusion to help its mitigation in future work.

5.3.2 Size distributions of |${\rm H\,{\small {II}}}$| regions and 21-cm cold spots

Fig. 5 (top panel) shows the size distributions of 21-cm cold spots dQ21(<R)/dR, |${\rm H\,{\small {II}}}$| regions dQH ii(<R)/dR, and voids dQvoid(<R)/dR measured by the granulometric analysis of the RT simulation at z = 6.8. At this redshift, the void–|${\rm H\,{\small {II}}}$| region confusion is small. The size distribution of voids is measured by applying the same granulometric analysis to the negative excursion sets of the density fluctuation field, δb(r) < 0. The bottom panel shows the difference between the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions. Note that the size distributions shown in Fig. 5 (and hereafter unless otherwise stated) are normalized to the volume-filling factor, i.e. dQ(<R)/dRQdF(<R)/d R.16

Top: size distribution of 21-cm cold spots (black solid histogram), ${\rm H\,{\small {II}}}$ regions (red dashed histogram), and voids (blue dotted histogram) from the granulometric analysis of the RT simulation at z = 6.8 (〈xH II〉V = 0.40). The black and red curves with filled and open circles are the best-fitting modified Schechter functions for the size distributions of 21-cm cold spots and ${\rm H\,{\small {II}}}$ regions, respectively. Bottom: difference between the size distributions of 21-cm cold spots and ${\rm H\,{\small {II}}}$ regions, dQ21(<R)/d ln R − dQH ii(<R)/d ln R (black). The red line is the line of zero difference.
Figure 5.

Top: size distribution of 21-cm cold spots (black solid histogram), |${\rm H\,{\small {II}}}$| regions (red dashed histogram), and voids (blue dotted histogram) from the granulometric analysis of the RT simulation at z = 6.8 (〈xH IIV = 0.40). The black and red curves with filled and open circles are the best-fitting modified Schechter functions for the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions, respectively. Bottom: difference between the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions, dQ21(<R)/d ln R − dQH ii(<R)/d ln R (black). The red line is the line of zero difference.

As suggested by the results on the filling factors in the previous section, the size distribution of 21-cm cold spots traces that of |${\rm H\,{\small {II}}}$| regions very well for QH ii > 0.4. The slight deviation seen at R ≲ 5 h − 1 cMpc occurs because of the contamination by voids. This leads to a slight underestimation of the mean size of |${\rm H\,{\small {II}}}$| regions inferred from the mean size of 21-cm cold spots (by ∼10 per cent). The voids remain a minor contaminant to the overall 21-cm cold-spot size statistics of 21-cm tomography affecting the results only by approximately 10 per cent.

Fig. 6 is the equivalent of Fig. 5 at z = 7.6 (〈xH IIV = 0.16). As expected, we see that the contamination from voids dominates over the signature of |${\rm H\,{\small {II}}}$| regions in the 21-cm cold-spot size distribution. As noted in Section 5.3.1, this is because the underlying filling factor of |${\rm H\,{\small {II}}}$| regions is small during the first half of reionization.

Same as Fig. 5, but at z = 7.6 (〈xH II〉V = 0.16). Note that the void size distribution is virtually identical to the one at z = 6.8 shown in Fig. 5 because the density perturbation evolves very slowly. The void contamination is large at this redshift.
Figure 6.

Same as Fig. 5, but at z = 7.6 (〈xH IIV = 0.16). Note that the void size distribution is virtually identical to the one at z = 6.8 shown in Fig. 5 because the density perturbation evolves very slowly. The void contamination is large at this redshift.

The large difference between the shapes of the size distributions of |${\rm H\,{\small {II}}}$| regions and voids provides a way to avoid the void–|${\rm H\,{\small {II}}}$| region confusion. The void size distribution is always confined within R ≲ 5 h − 1 cMpc. In fact, the size distribution of the excursion sets of a Gaussian random field can be well understood and predicted given a priori knowledge of the matter power spectrum (Bardeen et al. 1986; Bond & Efstathiou 1987; Sheth & van de Weygaert 2004). Therefore, we can test the robustness of the identification of cold spots as |${\rm H\,{\small {II}}}$| regions against the null hypothesis of void size statistics.

Overall, we conclude that the size distribution of 21-cm cold spot traces that of |${\rm H\,{\small {II}}}$| regions very well during the second half of reionization. For the first half of reionization, the interpretation of 21-cm cold spots becomes increasingly difficult because of large contamination from voids for our canonical choice of threshold |$\Delta T_{12}^{\rm th}=0$|⁠. We note that this may be mitigated by choosing lower values for the threshold.

6 RECOVERY OF H ii REGION SIZE DISTRIBUTION: 3D DATA SETS

So far we have only considered the case of a pure simulated 21-cm signal. We now turn our attention to the prospects for recovering the |${\rm H\,{\small {II}}}$| region size distribution from 21-cm tomography using a SKA-like radio interferometer.

The recovery of the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions from real-world radio interferometric observations is subjected to noise, instrumental response, foregrounds, and observing strategy. We therefore ask the question: what are the requirements to recover the |${\rm H\,{\small {II}}}$| region size distribution from 21-cm tomographic data observed with the SKA?

6.1 Effect of the field of view: sample variance error

The first requirement addresses the FoV or sky coverage. Fig. 7 shows the effect of a finite FoV on the observed size distributions of 21-cm cold spots measured from the granulometric analysis of our mock image cube of 4.62 deg2 FoV with frequency width 44 MHz (red dashed lines), and 2.32 deg2 FoV with frequency width 22 MHz (black solid), using the SKA1-low with the limiting case of noise-free data. The frequency width is chosen to match with the size of the FoV so that the comoving lengths of the line-of-sight and perpendicular directions are equal. The error bars correspond to 1σ uncertainty due to the sample variance (Section 3.2). The red and black curves are the best-fitting modified Schechter functions, and the vertical dotted line indicates the FWHM of the angular resolution. Note however that the cubes we have analysed have a large-frequency width which implies that the real data will be subject to the light-cone effect (Barkana & Loeb 2006; Datta et al. 2012b). The results in this section should therefore be interpreted as four (one) pointed observations of an FoV of 4.62 deg2 (2.32 deg2) with a frequency width of approximately 10 MHz, for which the light-cone effect can most likely be neglected (Datta et al. 2014).

Effect of a finite FoV. Top: 21-cm cold-spot size distributions measured from the mock SKA1-low noise-free image cubes of 4.62 deg2 FoV (red dashed histogram) and 2.32 deg2 FoV (black solid histogram) at z = 6.8 (〈xH II〉V = 0.40). The 1σ error due to the sample variance is shown (Section 3.2). The black solid and red dashed curves are the best-fitting modified Schechter functions. The vertical dotted line indicates the angular resolution FWHM = 4 h − 1 cMpc (max. baseline 2 km). Bottom: absolute difference, ${\rm d}Q(>R)/{\rm d}\,{\rm ln}R|_{\rm 2.3^2deg^2}-{\rm d}Q(>R)/{\rm d}\,{\rm ln}R|_{\rm \rm 4.6^2deg^2}$, between the two FoVs.
Figure 7.

Effect of a finite FoV. Top: 21-cm cold-spot size distributions measured from the mock SKA1-low noise-free image cubes of 4.62 deg2 FoV (red dashed histogram) and 2.32 deg2 FoV (black solid histogram) at z = 6.8 (〈xH IIV = 0.40). The 1σ error due to the sample variance is shown (Section 3.2). The black solid and red dashed curves are the best-fitting modified Schechter functions. The vertical dotted line indicates the angular resolution FWHM = 4 h − 1 cMpc (max. baseline 2 km). Bottom: absolute difference, |${\rm d}Q(>R)/{\rm d}\,{\rm ln}R|_{\rm 2.3^2deg^2}-{\rm d}Q(>R)/{\rm d}\,{\rm ln}R|_{\rm \rm 4.6^2deg^2}$|⁠, between the two FoVs.

The figure shows that the smaller FoV (2.32 deg2) does not introduce any significant systematic bias compared to our fiducial FoV of 4.62 deg2. The slight under(over)estimation of the large(small) sizes of 21-cm cold spots remains within 10 per cent–20 per cent at most, which is within the 1σ error due to sample variance. The sample variance error increases for larger 21-cm cold-spot sizes because the number of large |${\rm H\,{\small {II}}}$| regions within an FoV is smaller. Note that below the angular resolution scale the error appears to be negligibly small. This is because the angular smoothing artificially induces any cold spots at that scale to have always the same size, and therefore the FoV-to-FoV variation is suppressed.

No systematic bias will be introduced as long as the FoV is large enough to include the largest scale of |${\rm H\,{\small {II}}}$| regions (the theoretical prediction for the characteristic size is of order of tens of comoving Mpc, e.g. ∼ 0.5 deg for ∼ 50 h − 1 cMpc. See, for example Wyithe & Loeb 2004; Geil et al. 2015; Dixon et al. 2016). Therefore, the single-pointing FoV of SKA1-low ( ∼ 3 deg) is large enough to measure the 21-cm cold-spot size distribution with a reasonable sample variance error when the 3D image cube is used.

6.2 Effect of angular resolution: smoothing bias

The second requirement comes from the finite angular resolution of the instrument, which strongly affects the recovery and interpretation of the 21-cm cold-spot and |${\rm H\,{\small {II}}}$| region size distributions.

Fig. 8 shows the effect of finite angular resolution on the 21-cm cold-spot size distribution at redshift z = 6.5 and 6.8 for a noiseless image cube with a 4.62 deg2 FoV and a frequency depth of 44 MHz, using an SKA1-low (blue squares) and SKA2 (green triangles) configuration. The frequency resolution is set to match the angular resolution in comoving coordinates. The size distribution of the pure simulated 21-cm tomography (red circles) represents the case of an ideal, fully resolved observation. The vertical lines indicate the angular resolutions of SKA1-low (2.8 arcmin, blue dotted) and SKA2 (1.4 arcmin, green dashed).

Top: 21-cm cold-spot size distribution with the angular resolution of SKA1-low (max. baseline 2 km, blue histogram) and SKA2 (max. baseline 4 km, green histogram) at z = 6.5 (left-hand panel) and 6.8 (right-hand panel). The red histogram shows the case of a fully resolved 21-cm signal. The red, blue, and green curves with open circles, triangles, and squares show the best-fitting modified Schechter functions. The vertical lines show the angular resolutions of SKA1-low (2.8 arcmin, blue dotted) and SKA2 (1.4 arcmin, green dashed) in comoving length. Bottom: absolute difference, abs. error = dQ(>R)/d ln R|data − dQ(>R)/d ln R|sim, between the data models and the simulation. The figure illustrates that the smoothing that results from a finite angular resolution merges small into large cold spots.
Figure 8.

Top: 21-cm cold-spot size distribution with the angular resolution of SKA1-low (max. baseline 2 km, blue histogram) and SKA2 (max. baseline 4 km, green histogram) at z = 6.5 (left-hand panel) and 6.8 (right-hand panel). The red histogram shows the case of a fully resolved 21-cm signal. The red, blue, and green curves with open circles, triangles, and squares show the best-fitting modified Schechter functions. The vertical lines show the angular resolutions of SKA1-low (2.8 arcmin, blue dotted) and SKA2 (1.4 arcmin, green dashed) in comoving length. Bottom: absolute difference, abs. error = dQ(>R)/d ln R|data − dQ(>R)/d ln R|sim, between the data models and the simulation. The figure illustrates that the smoothing that results from a finite angular resolution merges small into large cold spots.

The angular resolution introduces a systematic bias in the observed size distribution of 21-cm cold spots. Both at z = 6.5 and 6.8, the shape of the size distributions is systematically shifted towards larger sizes. This is somewhat counterintuitive as an obvious expectation is the suppression of the small end of a size distribution. This systematic bias occurs because the angular smoothing (or Gaussian PSF) actually mixes and merges many 21-cm cold spots (including both |${\rm H\,{\small {II}}}$| regions and voids). As a result, many small cold spots become a large one. We call this important effect the smoothing bias. Since the granulometric analysis is performed on the binary field satisfying ΔT21 < 0, smoothing does not always remove small cold spots. Instead, because of the mixing of cold spots, intrinsically small cold spots re-populate the larger part of a size distribution. Quantitatively, the impact of the smoothing bias is shown in the bottom panels of Fig. 8, where the difference between the simulation and the mock observations is plotted. The smoothing bias is larger for a larger angular smoothing scale. Importantly, the figure shows that the smoothing bias impacts all scales of the size distribution, even when the angular resolution scale is below the scale of interest.

The impact of a finite angular resolution is less severe when the characteristic size or mean size of the underlying (true) 21-cm cold-spot size distribution is much larger than the angular resolution, θAR*/DA(z) [or ≪〈R〉/DA(z)], where DA(z) is the angular diameter distance to redshift z. For example, at 〈xH iiV = 0.40, the angular resolution causes a systematic bias in size distribution by ∼50(20) per cent fractional error for SKA1-low (SKA2), while at 〈xH iiV = 0.63, the bias is only about ∼10 per cent. We note that a chromatic PSF will introduce an extra complication (Vedantham, Udaya Shankar & Subrahmanyan 2012) and possibly a systematic bias. However, addressing this is beyond the scope of this paper.

This analysis suggests that the effect of angular resolution must be taken into account when interpreting the 21-cm cold-spot size distribution. If an unbiased measurement (to the accuracy of order 10 per cent) of the size distribution is required, we should correct the angular resolution bias by a: (i) ‘baseline-design’ approach, where we ensure that the maximum baseline is large enough that the angular resolution bias in the size distribution and the characteristic size of cold spots remains sufficiently small, or a (ii) ‘forward-modelling’ approach, where we create a large suite of models and mock data cubes and directly fit the simulated size distribution in the observation using a Markov Chain Monte Carlo approach. However, as we will show in Section 8, a sufficiently high angular resolution must be achieved for both methods to work.

6.3 Effect of thermal noise: splitting bias

The third requirement is on the thermal noise. Fig. 9, showing a 2D image from the data cube, visually illustrates the effect of noise on the cold spots in 21-cm tomography. The thermal noise (middle panel) acts as an additive noise, and contaminates the underlying 21-cm cold spots (left-hand panel) by distorting the shape and size of the |${\rm H\,{\small {II}}}$| regions (right-hand panel).

Mock 2D images at z = 6.8 as observed by SKA1-low with σN = 4.35 mK rms noise level. The left-hand, middle, and right-hand panels show the image of the noiseless 21-cm signal, noise, and mock data with an SNR = 1.4. All the images have an FoV of 4.6 × 4.6 deg2 (500 h − 1 cMpc on a side).
Figure 9.

Mock 2D images at z = 6.8 as observed by SKA1-low with σN = 4.35 mK rms noise level. The left-hand, middle, and right-hand panels show the image of the noiseless 21-cm signal, noise, and mock data with an SNR = 1.4. All the images have an FoV of 4.6 × 4.6 deg2 (500 h − 1 cMpc on a side).

The effect of thermal noise on the observed 21-cm cold-spot size distribution is shown in Fig. 10, which plots the measured size distributions from a mock 3D image cube with SKA1-low (21-cm+noise, black solid), noiseless 21-cm signal (21 cm, red dashed), and noise cube (noise, blue dotted). The 3σ error due to the thermal noise (inner error bars) and the total uncertainty including the 3σ sample variance error (outer error bars) are shown. The left-hand (right) panel corresponds to the two mock observations with SKA1-low having an rms noise level of 4.35 mK (2.0 mK) per resolution element for an interferometer with a 2 km maximum baseline.

Top: 21-cm cold-spot size distributions measured from the pure 3D 21-cm data cube with a 2 km baseline angular resolution (red dashed), noise cube (blue dotted), and mock SKA1-low observation with σN = 4.35 mK (left-hand panel) and σN = 2.0 mK (right-hand panel) rms noise level at 2.3 arcmin resolution. The curves show the best-fitting modified Schechter functions. Bottom: absolute difference between the pure 21-cm signal and mock observations. The inner error bars indicate the 3σ statistical uncertainties due to the thermal noise. The outer error bars include 3σ statistical error from the sample variance (see Section 3.2). The grey dotted–dashed curve is the best-fitting modified Schechter function in the fully resolved 21-cm signal in the RT simulation.
Figure 10.

Top: 21-cm cold-spot size distributions measured from the pure 3D 21-cm data cube with a 2 km baseline angular resolution (red dashed), noise cube (blue dotted), and mock SKA1-low observation with σN = 4.35 mK (left-hand panel) and σN = 2.0 mK (right-hand panel) rms noise level at 2.3 arcmin resolution. The curves show the best-fitting modified Schechter functions. Bottom: absolute difference between the pure 21-cm signal and mock observations. The inner error bars indicate the 3σ statistical uncertainties due to the thermal noise. The outer error bars include 3σ statistical error from the sample variance (see Section 3.2). The grey dotted–dashed curve is the best-fitting modified Schechter function in the fully resolved 21-cm signal in the RT simulation.

First, as shown in the left-hand panel in Fig. 10, an SKA1-low with σN = 4.35 mK permits us to measure the 21-cm cold-spot size distribution using a low SNR 3D image cube (black solid) within ≲50 per cent of the noiseless case (red dashed).

The thermal noise, however, introduces another important systematic bias. The size distribution measured from the low SNR image cube is shifted systematically towards lower sizes beyond the statistical error. This bias occurs because the noise splits up the cold spots into smaller ones when the positive excursion sets of the noise fluctuation occur inside 21-cm cold spots. We thus call this systematic bias the splitting bias. The bottom panels in Fig. 10 indeed show that originally large 21-cm cold spots are split and re-populate the small end of the size distribution.

To understand the mechanism of the splitting bias, Fig. 11 shows the 1D profiles of the brightness temperature contrast for the same three data sets as above. The cold spots B and D are split into two separate ones in the mock observation. This occurs when the positive excursion sets of noise exceed the contrast of the underlying 21-cm cold spots, ΔTcoldspot ≈ 5-10 mK. Because the noise is a Gaussian random field, the probability that the positive excursion of noise exceeds the 21-cm cold-spot contrast is |$P(>\Delta T_{\rm coldspot})=\frac{1}{2}[1-{\rm erf}(\frac{\Delta T_{\rm coldspot}}{\sqrt{2}\sigma _{\rm N}})]\approx 1-12\hbox{\,per\,cent}$| for ΔTcoldspot ≈ 5-10 mK, where erf is the error function. This is a noticeable effect for σN = 4.35 mK (as shown in the left-hand panel of Fig. 10) where the ∼2σ peaks of noise contaminate the signal inducing the splitting bias. On the other hand, this means that only high sigma peaks act as contaminants to the granulometric measurement of the size distribution. This is the reason why a low SNR image cube still permits a reasonable recovery of the underlying size distribution of 21-cm cold spots.

Origin of a splitting bias due to thermal noise. The 1D profiles of the brightness temperature contrast of pure 21-cm signal (red dashed line), noise (blue dotted), and mock data, i.e. 21-cm+noise (black solid) along the perpendicular direction on the sky. The true (red shaded) and observed (black shaded) cold spots are marked as shaded regions. We label the four example 21-cm cold spots with A, B, C, and D for the ease of discussion.
Figure 11.

Origin of a splitting bias due to thermal noise. The 1D profiles of the brightness temperature contrast of pure 21-cm signal (red dashed line), noise (blue dotted), and mock data, i.e. 21-cm+noise (black solid) along the perpendicular direction on the sky. The true (red shaded) and observed (black shaded) cold spots are marked as shaded regions. We label the four example 21-cm cold spots with A, B, C, and D for the ease of discussion.

Note that an opposite effect also occurs: a creation of spurious cold spots due to the negative excursion sets of noise (for example see the cold spot C in Fig. 11). However, because these spurious cold spots are always small (R ≲ 5 h − 1 cMpc), the net effect of noise is a systematic bias shifting the size distribution towards smaller sizes.

At a noise level of σN = 2.0 mK (i.e. SNR = 3.1), shown in the right-hand panel of Fig. 10, the splitting bias becomes much smaller. The probability that contaminating noise peaks occurs for 21-cm cold spots with ΔTcoldspot ≈ 5-10 mK is only P(>ΔTcoldspot) < 1 per cent.

The level of the systematic bias entering into the cold-spot size distribution measurement does not scale linearly with the noise level (and hence, with the integration time). There is a jump in the quality of the recovery of the 21-cm cold-spot size distribution as a function of the rms noise/integration time. When the rms noise passes the threshold required by the expected brightness temperature contrast of 21-cm cold spots, the quality of the measurement jumps up. Above this threshold, increasing the integration time gradually improves the quality, but the gain is not as pronounced.

The statistical uncertainty of the thermal noise is negligibly small and is shown as the small inner error bars in Fig. 10. This is because in a large FoV (4.6 × 4.6 deg2) the variation of cold-spot sizes due to the thermal noise tends to average out. As the large end of the size distribution has fewer cold spots to average out the noise error, the statistical error bars increase slightly at larger sizes. On the other hand, the error bars become negligibly small for smaller cold spots as there are many samples.

7 RECOVERY OF H ii REGION SIZE DISTRIBUTION: 2D DATA SETS

In the previous section, we have considered a data cube that contains the full 3D information about the 21-cm signal. In real-world observations, such full 3D information representing a fixed redshift may not be available due to the light-cone effect and a bad quality or noise contamination in some frequency slices which must be discarded from the final analysis. Therefore, in this section, we consider more restricted observations of 2D images. We examine the prospects and requirements for recovering the |${\rm H\,{\small {II}}}$| region size distribution from a 2D tomographic image. Choosing only one frequency slice may be too drastic. However, we use this case to illustrate the effects of reducing the frequency width of the 21-cm data set.

The qualitative effects of the FoV, angular resolution, and noise on 2D images are the same as the case for 3D data cubes, i.e. (1) a finite FoV introduces a statistical error due to sample variance, but no systematic bias; (2) a limited angular resolution introduces a smoothing bias; and (3) thermal noise introduces a splitting bias, but a small statistical error. However, quantitatively the situation differs because the sample variance becomes extremely large for a 2D image and dominates the errors. Therefore, in the following sub-sections, we examine in detail the role of sample variance in 2D images.

7.1 Effect of the FoV for 2D images

The error on the 2D size distribution measurement is sample variance dominated. Fig. 12 shows the effect of a finite FoV on the 2D size distribution of 21-cm cold spots obtained from 2D images. 2D radii, R, of cold spots are defined as the apparent perpendicular lengths of the cold spots’ radii in the 2D image. We analyse a single-frequency image which corresponds to a channel width of 0.45 MHz. For a better recovery of the size distribution (black solid curve), the granulometric analysis of a larger FoV 21-cm image (red solid histogram, 4.62 deg2 FoV) is preferred. Averaging over many 2.32 deg2 FoV single-pointing images (black dashed curve) works equally well. The bottom panel shows that the sample variance error of 4.62 deg2 FoV data is smaller than for the 2.32 deg2 FoV data by a factor of 2, because of the factor of 4 increase in the area probed. Although the statistical error due to sample variance differs only by a factor of 2, the large end (R ≳ 10 h − 1 cMpc) of the size distribution is difficult to determine by a single-pointing FoV image. A large variation (due to the difficulty in sampling many large cold spots) could produce a catastrophic error in the measured 2D size distribution (see blue dashed histogram). While such a big error can also occur for large FoV data, the probability is much lower. Therefore, we conclude that increasing the area of the sky probed by increasing the FoV by an interferometric mosaicking or multibeaming (multiple EoR windows) technique will be important when analysing single-frequency 21-cm images (see Section 8 for discussion).

Effect of a finite FoV on the measured 2D size distribution from a 21-cm image in mock tomographic data. Top panel: the red solid (blue dashed) histogram shows the 21-cm cold-spot size distribution measured from a 2D image slice with 4.62 deg2 FoV (2.32 deg2 FoV) from SKA1-low at noiseless limit. The 1σ error bars due to the sample variance are shown. The red (blue) shaded regions brackets the maximum and minimum values appeared in 100 random images with 4.62 deg2 FoV (2.32 deg2 FoV). The black solid (dashed) curve shows the best-fitting modified Schechter function to the mean of the size distributions measured from 100 random images. Bottom panel: magnitude of the sample variance uncertainty determined from the covariance matrix. The figure shows that a larger FoV (or multiple FoVs) is preferred to measure the 21-cm cold-spot size distribution reliably.
Figure 12.

Effect of a finite FoV on the measured 2D size distribution from a 21-cm image in mock tomographic data. Top panel: the red solid (blue dashed) histogram shows the 21-cm cold-spot size distribution measured from a 2D image slice with 4.62 deg2 FoV (2.32 deg2 FoV) from SKA1-low at noiseless limit. The 1σ error bars due to the sample variance are shown. The red (blue) shaded regions brackets the maximum and minimum values appeared in 100 random images with 4.62 deg2 FoV (2.32 deg2 FoV). The black solid (dashed) curve shows the best-fitting modified Schechter function to the mean of the size distributions measured from 100 random images. Bottom panel: magnitude of the sample variance uncertainty determined from the covariance matrix. The figure shows that a larger FoV (or multiple FoVs) is preferred to measure the 21-cm cold-spot size distribution reliably.

Having a large effective FoV is also important for controlling the thermal noise uncertainty. The statistical uncertainty due to the thermal noise is smaller for a large FoV data as the thermal noise error couples with the FoV. This is because the noise randomly modifies the shape and size of 21-cm cold spots, introducing a statistical uncertainty in the measured size distribution (see Section 6.3). For a fixed rms noise level, a larger FoV allows the random modifications to be averaged out because there are many cold spots. As the number of larger cold spots is smaller in a small FoV data, the noise error increases for larger cold-spot sizes.

Therefore, having a large sky coverage by mosaicking/multibeaming is also advantageous for reducing the statistical error bars due to the thermal noise. Note that, however, a large sky coverage does not reduce the splitting bias (Section 6.3), which is only reduced by having a lower absolute value of the rms noise level after a longer integration time.

7.2 Performance of a successful recovery

Finally, we test the performance of the granulometric measurement of the 21-cm cold-spot size distribution once all the requirements are satisfied. Fig. 13 shows the expected results for SKA1-low about the granulometric measurement of the 21-cm cold spots using a mock 4.62 deg2 FoV image at z = 6.8 after interferometric mosaic imaging by patching many single-pointing data. The image is formed with a 0.45 MHz channel width. The black points show the mock measurements and the black curve shows the best-fitting modified Schechter function. The red curve and histogram refers to an ideal case in the absence of noise. The blue histogram shows the spurious noise cold-spot size distribution.

Top: 21-cm cold-spot size distributions measured from a noiseless 2D image with a 2 km baseline angular resolution (red dashed), noise image (blue dotted), and a mock SKA1-low imaging observation with σN = 4.35 mK (left-hand panel) and σN = 2.0 mK (right-hand panel) rms noise level. The curves show the best-fitting modified Schechter functions. Bottom: absolute difference between the noiseless 21-cm signal and mock observations. The inner error bars indicate the 1σ statistical uncertainties due to the thermal noise. The outer error bars include 1σ statistical error from the sample variance. The grey dotted–dashed curve is the best-fitting modified Schechter function in the fully resolved 21-cm signal.
Figure 13.

Top: 21-cm cold-spot size distributions measured from a noiseless 2D image with a 2 km baseline angular resolution (red dashed), noise image (blue dotted), and a mock SKA1-low imaging observation with σN = 4.35 mK (left-hand panel) and σN = 2.0 mK (right-hand panel) rms noise level. The curves show the best-fitting modified Schechter functions. Bottom: absolute difference between the noiseless 21-cm signal and mock observations. The inner error bars indicate the 1σ statistical uncertainties due to the thermal noise. The outer error bars include 1σ statistical error from the sample variance. The grey dotted–dashed curve is the best-fitting modified Schechter function in the fully resolved 21-cm signal.

With an rms noise σN = 4.35 mK (SNR = 1.4 at the resolution element), as shown in the left-hand panel, the splitting bias due to thermal noise is still noticeable on the measured size distribution (black). The splitting bias is within the statistical uncertainties when both sample variance and noise errors are included. For the SKA1-low data with σN = 2.0 mK,17 the splitting bias becomes negligibly small. Thus, successfully recovering the 21-cm cold-spot size distribution at 2 km baseline resolution level is possible with the SKA1-low when the mosaicking/multibeaming technique is used to increase the effective sky coverage.

Fig. 14 shows the expected improvement for SKA2 if it is extended to longer intermediate-scale baselines. The green points are the mock SKA2 measurements, and the green curve with triangles indicates the best-fitting modified Schechter function. As already discussed in Section 6.2, we should note that the finite angular resolution introduces a fundamental instrumental limitation for SKA1-low (the best-fitting modified Schechter function is shown as the blue curve with squares). The smoothing bias systematically shifts the measured size distribution to sizes larger than in the case of a fully resolved 21-cm signal (red curve with open circles). The bias could be as large as a factor of ∼2 for SKA1-low, which exceeds the total statistical uncertainties. For SKA2, assuming the complete uv-coverage extends to 4 km baselines, the data with σN ≈ 2 mK noise at 1.4 arcmin resolution reduce the smoothing bias. At R > 6 h − 1 cMpc, i.e. well above the angular resolution smoothing scale (FWHM ≈ 2 h − 1 cMpc), the SKA2 will be able to recover the underlying true cold-spot size distribution (red curve) well within the statistical uncertainties using a 4.62 deg2 FoV and SNR ≈ 3 image. At this quality, the sample variance error is the dominant source of uncertainty. Thus, increasing sky coverage is necessary to measure the size distribution better. For the smaller end of the size distribution, increasing angular resolution is necessary to prevent a large smoothing bias.

Performance of the granulometric measurement of 21-cm cold-spot size distribution once all the requirements are satisfied (see the text for details). The green filled points show the size distribution measured from SKA2 with a σN = 2.0 mK noise level. The frequency channel is chosen to be 0.23 MHz wide to match the 1.4 arcmin angular resolution of SKA2. The red, blue, and green curves show the best-fitting modified Schechter functions of the size distributions measured from the fully resolved noiseless 21-cm signal, SKA1-low, and SKA2. The inner error bars are only with 1σ noise error and the outer error bar is the 1σ noise+sample variance error.
Figure 14.

Performance of the granulometric measurement of 21-cm cold-spot size distribution once all the requirements are satisfied (see the text for details). The green filled points show the size distribution measured from SKA2 with a σN = 2.0 mK noise level. The frequency channel is chosen to be 0.23 MHz wide to match the 1.4 arcmin angular resolution of SKA2. The red, blue, and green curves show the best-fitting modified Schechter functions of the size distributions measured from the fully resolved noiseless 21-cm signal, SKA1-low, and SKA2. The inner error bars are only with 1σ noise error and the outer error bar is the 1σ noise+sample variance error.

8 DISCUSSIONS

In this section, we discuss the implications and the importance of measuring the |${\rm H\,{\small {II}}}$| region size distribution from 21-cm tomography to understand the EoR in a wider context. A possible observing strategy and baseline design are also discussed.

8.1 Synergy between 21-cm power spectra and tomography

Mellema et al. (2015) and Koopmans et al. (2015) speculated about the synergy between a 21-cm power spectrum analysis and tomography. Here, we present an explicit example to support their arguments.

Fig. 15 shows two noiseless 21-cm tomographic images with SKA1-low using the RT simulation and a Gaussian random field which has a 21-cm power spectrum identical to the one from the RT simulation. By eye, we can discern clear morphological differences between the two images. However, given that their power spectra are identical, how can we tell whether they are two different realizations of the same reionization process or intrinsically different models?

Top: 2D images of 21-cm signal in the tomography from the RT simulation (left) and a Gaussian random field (right) with the same power spectrum. The mock images are by SKA1-low in the noiseless limit. Bottom: binary images of cold spots. The white regions represents the cold-spots ΔT21 < 0.
Figure 15.

Top: 2D images of 21-cm signal in the tomography from the RT simulation (left) and a Gaussian random field (right) with the same power spectrum. The mock images are by SKA1-low in the noiseless limit. Bottom: binary images of cold spots. The white regions represents the cold-spots ΔT21 < 0.

Fig. 16 shows that the 21-cm cold-spot size distribution measurement from 21-cm tomography using the granulometric analysis has the capability to break this degeneracy in 21-cm power spectrum measurements. This clearly illustrates, although for an extreme example, that tomographic observations add valuable information to the measurement of the 21-cm signal.

Comparison of cold-spot size distributions for two models with identical power spectra. The results are from the RT simulation (red filled circles) and the Gaussian random field (blue open circles). The histograms are the measured size distributions and the curves are the best-fitting modified Schechter functions. The solid curves in the bottom panel show the 21-cm power spectra of the two models measured from the mock SKA1-low in the noiseless limit. The dashed curve shows the case without the effect of angular resolution. The figure clearly illustrates the ability to break the degeneracy in the power spectrum analysis by size distribution measured from 21-cm tomography.
Figure 16.

Comparison of cold-spot size distributions for two models with identical power spectra. The results are from the RT simulation (red filled circles) and the Gaussian random field (blue open circles). The histograms are the measured size distributions and the curves are the best-fitting modified Schechter functions. The solid curves in the bottom panel show the 21-cm power spectra of the two models measured from the mock SKA1-low in the noiseless limit. The dashed curve shows the case without the effect of angular resolution. The figure clearly illustrates the ability to break the degeneracy in the power spectrum analysis by size distribution measured from 21-cm tomography.

Of course, the information from the power spectrum and tomography are complementary to each other. For example, the 21-cm power spectrum contains information about the clustering of |${\rm H\,{\small {II}}}$| regions, which is not captured by size distribution. Furthermore, the higher order multipoles of the power spectrum can potentially quantify the strength and nature of correlation between the underlying matter distribution and the |${\rm H\,{\small {II}}}$| region distribution (Majumdar et al. 2016a,b). Therefore, we expect a clear synergy between radio interferometric observations of 21-cm power spectra and imaging tomography. Hence, we place an emphasis on optimizing and balancing the design of future 21-cm experiments for both power spectrum and tomography to achieve an optimal scientific return.

8.1.1 Role of intermediate baselines

How well tomographic data can distinguish different models will of course depend on the (angular) resolution of the images, which is set by the length of the longest baselines used in the construction of the tomographic data. Fig. 17 shows the effect of angular resolution on the 21-cm power spectrum and the cold-spot size distribution measured from 21-cm tomography. It shows the size distributions and power spectra measured with resolutions ≈ 5.70, 2.85, and1.43 arcmin. These correspond to maximum baseline lengths of 1 km (red), 2 km (blue), and 4 km (green) at z = 6.8. Note that all angular resolutions used are smaller than or comparable to the characteristic size [ ∼ 10 h − 1 cMpc ( ∼ 5.5 arcmin)] of |${\rm H\,{\small {II}}}$| regions.

Same as Fig. 16, but varying the maximum baseline used for the mock observation: 1 km (red lines and symbols) and 2 km (blue) for SKA1-low, and 4 km (green) for SKA2. Note that y-axis is shown in dF(<R)/d ln R to highlight the change in the shape. The figure illustrates the need for longer intermediate baselines for distinguishing the observed 21-cm cold-spot size distribution.
Figure 17.

Same as Fig. 16, but varying the maximum baseline used for the mock observation: 1 km (red lines and symbols) and 2 km (blue) for SKA1-low, and 4 km (green) for SKA2. Note that y-axis is shown in dF(<R)/d ln R to highlight the change in the shape. The figure illustrates the need for longer intermediate baselines for distinguishing the observed 21-cm cold-spot size distribution.

For a mock observation using a maximum baseline of 1 km, the size distributions of both the RT simulation and the Gaussian random field start to resemble each other, as they converge to the same shape with decreasing angular resolution. In fact, the only difference is due to the volume-filling factor of cold spots. A low angular resolution (large smoothing scale) dictates the shape of the size distribution dF(<R)/dR [and dQ(<R)/dR]. The signature of underlying |${\rm H\,{\small {II}}}$| region size distribution is erased, or not significant. While the particular level of angular resolution required to distinguish models differs depending on the stage of reionization, high angular resolution is clearly necessary for differentiating the shapes of size distributions. The power spectra are obviously indistinguishable by design at all angular resolutions.

Therefore, the merit of intermediate baselines and high angular resolution for this particular data set is clear from Fig. 17. Here, the measurement of the |${\rm H}\,{\small {II}}$| region size distribution can only break the degeneracy in the 21-cm power spectrum measurements when baselines longer than 2 km are used. Of course, this conclusion depends on the actual size distribution of ionized regions. For later stages, where the size distribution peaks at larger sizes, maximum baselines of 1 km would still suffice.

8.2 Observing strategy and baseline design

Overall, we suggest that a gradual roll-out of longer intermediate baselines in the SKA2 phase will be beneficial to quantify the reionization morphology from 21-cm tomography. Since the granulometric size distribution measurement can work with a moderate SNR (∼3) imaging, it is more important to sample a large range of k modes in order to eliminate the systematic biases than to increase the SNR of a limited range of large-scale k modes (which may be preferred for power spectra analysis). In this regard, a tiered radio survey (Koopmans et al. 2015) combining a deep narrow field (a single beam pointing) and a shallow wide field (multibeaming) could also benefit the measurement of |${\rm H\,{\small {II}}}$| region size distributions from 21-cm tomography. The tiered observation would deliver a layer of images with a sufficient SNR and FoV for a given angular scale for the |${\rm H\,{\small {II}}}$| size distribution measurement. Therefore, using the deep-narrow- and shallow-wide-field images, we may be able to separately measure the small and large ends of the size distribution, which could make effective use of the integration time and compromise with other science cases such as the power spectrum measurement. In addition, we expect a sweet spot in the baseline configuration, as the splitting bias from noise and the smoothing bias from finite angular resolution have an opposite effect on the size distribution. Further investigation of optimal observing strategy and baseline design for 21-cm tomography will certainly benefit the SKA EoR science cases. Naturally, the first aim should be a detection of a direct 21-cm image at large scales with compact dense core stations, and a large-scale pilot imaging is certainly valuable for testing the imaging algorithm/calibration/foreground removal for 21-cm tomography. Once the pilot phase is over, a larger gain will be achieved by extending to longer intermediate baselines, which would provide a robust physical interpretation and the unbiased size distribution of |${\rm H\,{\small {II}}}$| regions.

8.3 Comparison with other size measures

The literature contains several other algorithms for characterizing the size distribution of |${\rm H\,{\small {II}}}$| regions during reionization, e.g. (1) volumes of topologically connected regions, or the equivalent radii of these (a.k.a. the friends-of-friends method, Iliev et al. 2006; Shin, Trac & Cen 2008); (2) the largest radius at which the average ionized fraction around a pixel is above a certain threshold (e.g. 90 per cent, spherical average method, Zahn et al. 2007; McQuinn et al. 2007); (3) distance from a random pixel in |${\rm H\,{\small {II}}}$| regions to the edge of the |${\rm H\,{\small {II}}}$| region containing the pixel (mean-free path method, Mesinger & Furlanetto 2007); (4) equivalent radius of Watershed basin (Watershed algorithm, Lin et al. 2016).

In the terminology introduced by Lin et al. (2016), the mean-free path and spherical average methods are both biased and diffusive in that even in the case of an ensemble of equally sized, non-overlapping, spherical |${\rm H\,{\small {II}}}$| regions, they produce a range of sizes (Lin et al. 2016). The other two methods, just as granulometry, would reproduce the correct, unbiased size distribution.

The friends-of-friends method focuses on the volume of connected regions, which means that due to the percolative nature of reionization, already at around 10 per cent ionization most of the ionized volume will be part of one large region. It therefore is more suitable to test the percolation process (Furlanetto & Oh 2016).

The watershed method was recently proposed by Lin et al. (2016). It is commonly used in image processing and it segments the data into discrete bubbles. Lin et al. (2016) showed it to produce good bubble size distributions. However, unlike granulometry, it requires an additional tunable parameter to suppress Poisson noise.

The reason that so many different methods have been proposed is because there is no unique way to capture with a simple size distribution the complex morphology of ionized regions during reionization. Friedrich et al. (2011a) and Lin et al. (2016) compared these different methods and explored the similarities and differences between their results. Malloy & Lidz (2013) showed that the matched filter technique can be used to measure the bubble size distribution although the associated systematic bias is still to be understood. Each of these methods can in principle be used to derive its answer for the size distribution in real data and use this as a metric to compare to simulation results. However, because they each use a different approach we should not compare the results between different methods. Future studies will reveal which method(s) are most useful to constrain reionization parameters from 21-cm tomographic data.

9 CONCLUSIONS

Using 21-cm tomography, one can directly measure a fundamental quantity of the reionization process, i.e. the size distribution of the |${\rm H\,{\small {II}}}$| regions, in a model-independent way. The central questions that we have dealt with in this paper are – what can we learn from 21-cm tomography using a next generation radio interferometer such as the SKA? What are the observational requirements for achieving a good science return from such a 21-cm tomography? We summarize our main conclusions as follows.

  • Granulometric analysis of the 21-cm tomographic data allows the measurement of the |${\rm H\,{\small {II}}}$| region size distribution.

     We have introduced a novel technique, called ‘granulometry’, to quantify the |${\rm H\,{\small {II}}}$| region size distribution in a mathematically well-formulated framework, but which is in practice rather simple to implement. The technique attempts to trace the underlying probability distribution function of the |${\rm H\,{\small {II}}}$| region sizes. This places the previously not so well-defined concept of ‘the morphology of reionization’ on a firm mathematical foundation. Using mock interferometric observations of RT simulations, we have shown that the granulometric analysis of 21-cm tomographic data allows us to recover the |${\rm H\,{\small {II}}}$| region size distribution with the SKA; the theoretical and observational systematics, requirements, and observing strategies have also been examined in detail.

     The measured size distribution is well described by a modified Schechter function. Even with our simplest application of the granulometric analysis, the 21-cm cold spots work as an excellent tracer of |${\rm H\,{\small {II}}}$| regions during the second half of reionization (i.e. |$Q_{{\rm H\,{\small {II}}}} \ge 0.4$|⁠). The size statistics of cold spots can therefore be directly interpreted as that of the |${\rm H\,{\small {II}}}$| regions. For the first half of reionization, attention must be paid to the possible confusion between |${\rm H\,{\small {II}}}$| regions and the voids of density fluctuations in the 21-cm cold-spot size distribution. Although this is not a fundamental limitation of the granulometric analysis, an improved method to separate them for a physical interpretation of the |${\rm H\,{\small {II}}}$| region size statistics must be used in future work.

  • Observational requirements to recover the |${\rm H\,{\small {II}}}$| region size distribution from 21-cm tomography are attainable with the SKA. An ideal observing strategy would be a moderate SNR, wide-field mosaic/multibeamed imaging with additional longer intermediate baselines ( ∼ 2-4 km).

     To measure the cold-spot size distribution using 21-cm tomography of an actual radio interferometric observation, we considered two limiting cases: (1) the measurement from a 3D image cube and (2) the measurement from a 2D image slice. The fundamental requirements for recovering the true cold-spot size distribution is to have a high angular resolution (i.e. intermediate baselines of ∼ 2-4 km) and low sample variance (i.e. large sky coverage), both of which are achievable by the SKA by employing an appropriate observational strategy. Each of the requirements and systematics are discussed below.

     A finite angular resolution, even if it is below the characteristic scale of |${\rm H\,{\small {II}}}$| regions, introduces the most important source of systematic bias (‘smoothing bias’) for the measured 21-cm cold-spot size distribution at all scales. This smoothing bias skews the true size distribution to larger sizes. Although the current baseline distribution of SKA1-low (maximum 2 km baseline) limits the bias below approximately ∼50 per cent, a higher angular resolution is desirable for an accurate unbiased measurement. Controlling the smoothing bias is a key to successfully recover the statistical characterization of sizes and morphology of |${\rm H\,{\small {II}}}$| regions.

     The sample variance due to a finite FoV and frequency sampling is a main source of statistical uncertainties. Analysis of a 3D image cube suffers little from the finite FoV because of the large-frequency samples. The data from a single-pointing observation of SKA1-low allows us to measure the cold-spot size distribution with a small statistical uncertainty. On the other hand, for the analysis of a 2D image slice from 21-cm tomography, the sample variance must be reduced to small enough values for a reasonable measurement. A single-pointing observation suffers from a large sample variance in this case. In fact, because of the propagation of sample variance to the statistical uncertainty of the thermal noise, lowering the sample variance is also advantageous to reduce the statistical error. Mosaicking/multibeaming techniques for 21-cm imaging tomography will be desirable to increase the sky coverage (i.e. to lower sample variance).

     The thermal noise is not a major obstacle. This can be already manageable with SKA1-low. A moderate SNR (≲3) of 21-cm images still permits the recovery the cold-spot size distribution within ∼50 per cent. The required rms noise level is ∼ 4 mK, which should be achievable by the SKA1-low data. The largest observational systematics is the bias, instead of a statistical error, caused by the thermal noise (‘splitting bias’). The splitting bias artificially skews the measured size distribution to smaller sizes. However, its effect becomes negligible even for SNR ≈ 3 imaging (with the rms noise level of ∼ 2 mK). A long, but manageable, integration time reduces the effect of noise to a negligible level.

  • Requirements for the science beyond power spectra and the synergy between 21-cm power spectrum and tomography.

Our results put a significant importance on having intermediate baselines in radio interferometers. This important factor should be taken into account for the design of a future extension of SKA1-low to SKA2, and other radio telescopes. The intermediate baselines are needed to fully exploit the synergy between 21-cm power spectrum and tomography. For science beyond power spectra, the availability of intermediate baselines is a pre-requisite for distinguishing the size distributions and morphology of |${\rm H\,{\small {II}}}$| regions across different reionization models (Section 8.1).

The intermediate baselines are necessary for reducing a fundamental instrumental limitation by the angular smoothing bias. In addition, they may be beneficial for overcoming a more immediate challenge of calibration such as the scintillation noise due to ionospheric turbulence. Provided that the future extension of SKA will invest on additional longer intermediate baselines, a promising observing strategy for 21-cm tomography is an interferometric mosaicking/multibeaming imaging.

Finally, the granulometric analysis introduced in this paper is only the tip of the iceberg of the entire spectrum of tools in mathematical morphology and stochastic geometry. They provide powerful means for quantifying the morphology of reionization based on a firm mathematical foundation and theory. Armed with this foundation, the synergy between 21-cm power spectrum and tomography provides many opportunities for directly probing the reionization morphology and extending 21-cm science beyond power spectrum.

Acknowledgments

KK acknowledge support from Richard Ellis and the European Research Council Advanced Grant FP7/669253. KK would also like to thank for the hospitality offered by the Department of Astronomy at Stockholm University during one visit, where this work was initiated. Some of the work was also conducted during the workshop supported by the Munich Institute for Astro- and Particle Physics (MIAPP) of the DFG cluster of excellence ‘Origin and Structure of the Universe’. SM would like to acknowledge financial assistance from the European Research Council under ERC grant number 638743-FIRSTDAWN. We acknowledge PRACE for awarding us computational time under PRACE4LOFAR grants 2012061089 and 2014102339 and access to resource Curie based in France at CEA and to resource SuperMUC at LRZ. This work was supported by the Science and Technology Facilities Council (grant numbers ST/F002858/1 and ST/I000976/1); and the Southeast Physics Network (SEPNet).

6

This project was executed using two allocations of Tier-0 time, 2012061089 and 2014102339, awarded by the Partnership for Advanced Computing in Europe (PRACE).

8

This is a well-defined operation in mathematical morphology, which in turn can be formulated in terms of more fundamental set-theoretical operations, Minkowski addition (⊕) and subtraction (⊖), as XSR ≡ (XSR) ⊕ SR for a symmetric structuring element (e.g. Dougherty & Lotufo 2003). For the detailed mathematical foundation, see Matheron (1975), Serra (1983), Dougherty & Lotufo (2003), and Chiu et al. (2013). In practice, this opening operation (∘) is easy to use as it is implemented as a part of widely used high-level programming languages and standard libraries such as python and scipy packages.

9

It is obvious that for the cases of non-overlapping spherical or circular regions, the granulometric measure returns the true size distribution.

11

Note however that the design of the real phase 2 of SKA has not been determined yet and may not involve an increase of the core size. An alternative example could be an increase of the sensitivity for the same core size as SKA1-low.

12

For simplicity, we did not use a light-cone in this work. Although light-cone cubes including redshift–space distortions must be used for a more sophisticated assessment, we wanted to avoid extra complications because our main point is to introduce the new granulometric analysis techniques in 21-cm tomography. The size of the coeval volume corresponds to 43.6 MHz in the frequency direction, which corresponds to a light-cone extending from z = 6.1 to 8.13 if centred at z = 7. We are effectively assuming that the evolution of the |${\rm H\,{\small {I}}}$| structure is slow enough during this period so that we can use the entire frequency range to approximately represent the state of z = 7.

13

We are aware that this estimate of the SKA1-low sensitivity might be optimistic. When a more realistic set-up of interferometric imaging is taken into account, to achieve ∼ 3-5 mK rms noise level it could take an integration time longer than what estimated here. The most important parameter that directly affects our analysis and conclusion is the rms sensitivity, σN. The integration time must be regarded only as a rule of thumb. Therefore, we quote the rms sensitivity rather than the integration time in this paper.

14

The mean radius of |${\rm H\,{\small {II}}}$| regions evaluated from the fitting formula differs only by 2.5 per cent from the direct integration of the differential size distribution measured from the simulation. As the measured median radius is also 4.18 h − 1 cMpc, we only show the mean radius of |${\rm H\,{\small {II}}}$| regions.

15

Note that the void filling factor Qvoid is not exactly 0.5, which we naively expect from the linear perturbation theory. Because the density perturbation of the IGM is becoming mildly non-linear during the EoR, underdense regions (voids) fill up more volume than the overdense regions. Therefore, the void filling factor is slightly larger than 0.5.

16

This definition is more convenient when there is contamination from voids. Normalizing the size distribution to unity produces an artificial difference between the size distributions of 21-cm cold spots and |${\rm H\,{\small {II}}}$| regions at larger sizes R. While the void contamination is mostly confined at smaller R, when normalized to unity the size distribution at larger R appears to be lower to compensate for the increase at small R. This trivial error is avoided by normalizing the size distribution to the volume-filling factor.

17

Although this integration time could unlikely be achieved with SKA1-low for a single FoV, this rms level could be possible with SKA2 phase with a reasonable integration time. One could also apply larger angular or frequency smoothing to enhance the SNR (but must beware of the smoothing bias in the measurement).

18

Strictly speaking, dilation (erosion) and Minkowski addition (subtraction) are not an identical operation. The two operations are identical when a structuring element is symmetrical. Since we employ a sphere as a structuring element in this paper, we use the two terms interchangeably.

REFERENCES

Ali
Z. S.
et al. ,
2015
,
ApJ
,
809
,
61

Bardeen
J. M.
,
Bond
J. R.
,
Kaiser
N.
,
Szalay
A. S.
,
1986
,
ApJ
,
304
,
15

Barkana
R.
,
Loeb
A.
,
2006
,
MNRAS
,
372
,
L43

Bharadwaj
S.
,
Pandey
S. K.
,
2005
,
MNRAS
,
358
,
968

Bond
J. R.
,
Efstathiou
G.
,
1987
,
MNRAS
,
226
,
655

Bowman
J. D.
et al. ,
2013
,
Publ. Astron. Soc. Aust.
,
30
,
31

Chapman
E.
et al. ,
2015
,
Advancing Astrophysics with the Square Kilometre Array (AASKA14)
.
Giardini Naxos
,
Italy
, p.
5

Chiu
S. N.
,
Stoyan
D.
,
Kendall
W. S.
,
Mecke
J.
,
2013
,
Stochastic Geometry and its Applications
3rd edn.
John Wiley & Sons Inc
,
West Sussex, UK

Ciardi
B.
,
Madau
P.
,
2003
,
ApJ
,
596
,
1

Condon
J. J.
,
Ransom
S. M.
,
2016
,
Essential Radio Astronomy
.
Princeton Univ. Press
,
Princeton, NJ

Cooray
A.
,
2005
,
MNRAS
,
363
,
1049

Cooray
A.
,
Li
C.
,
Melchiorri
A.
,
2008
,
Phys. Rev. D
,
77
,
103506

Datta
K. K.
,
Bharadwaj
S.
,
Choudhury
T. R.
,
2007
,
MNRAS
,
382
,
809

Datta
K. K.
,
Majumdar
S.
,
Bharadwaj
S.
,
Choudhury
T. R.
,
2008
,
MNRAS
,
391
,
1900

Datta
K. K.
,
Bharadwaj
S.
,
Choudhury
T. R.
,
2009
,
MNRAS
,
399
,
L132

Datta
K. K.
,
Friedrich
M. M.
,
Mellema
G.
,
Iliev
I. T.
,
Shapiro
P. R.
,
2012a
,
MNRAS
,
424
,
762

Datta
K. K.
,
Mellema
G.
,
Mao
Y.
,
Iliev
I. T.
,
Shapiro
P. R.
,
Ahn
K.
,
2012b
,
MNRAS
,
424
,
1877

Datta
K. K.
,
Jensen
H.
,
Majumdar
S.
,
Mellema
G.
,
Iliev
I. T.
,
Mao
Y.
,
Shapiro
P. R.
,
Ahn
K.
,
2014
,
MNRAS
,
442
,
1491

Datta
K. K.
,
Ghara
R.
,
Majumdar
S.
,
Choudhury
T. R.
,
Bharadwaj
S.
,
Roy
H.
,
Datta
A.
,
2016
,
J. Astrophys. Astron.
,
37
,
27

Dixon
K. L.
,
Iliev
I. T.
,
Mellema
G.
,
Ahn
K.
,
Shapiro
P. R.
,
2016
,
MNRAS
,
456
,
3011

Dougherty
E. R.
,
Lotufo
R. A.
,
2003
,
Hands-on Morphological Image Processing
.
SPIE
,
Washington, USA

Field
G. B.
,
1958
,
Proc. IRE
,
46
,
240

Field
G. B.
,
1959
,
ApJ
,
129
,
536

Friedrich
M. M.
,
Mellema
G.
,
Alvarez
M. A.
,
Shapiro
P. R.
,
Iliev
I. T.
,
2011a
,
MNRAS
,
413
,
1353

Friedrich
M. M.
,
Mellema
G.
,
Alvarez
M. A.
,
Shapiro
P. R.
,
Iliev
I. T.
,
2011b
,
MNRAS
,
413
,
1353

Furlanetto
S. R.
,
Oh
S. P.
,
2016
,
MNRAS
,
457
,
1813

Furlanetto
S. R.
,
Zaldarriaga
M.
,
Hernquist
L.
,
2004a
,
ApJ
,
613
,
1

Furlanetto
S. R.
,
Zaldarriaga
M.
,
Hernquist
L.
,
2004b
,
ApJ
,
613
,
16

Furlanetto
S. R.
,
Oh
S. P.
,
Briggs
F. H.
,
2006
,
Phys. Rep.
,
433
,
181

Geil
P. M.
,
Mutch
S. J.
,
Poole
G. B.
,
Angel
P. W.
,
Duffy
A. R.
,
Mesinger
A.
,
Wyithe
J. S. B.
,
2015
,
MNRAS
,
462
,
804

Gleser
L.
,
Nusser
A.
,
Ciardi
B.
,
Desjacques
V.
,
2006
,
MNRAS
,
370
,
1329

Harker
G. J. A.
et al. ,
2009
,
MNRAS
,
393
,
1449

Harnois-Déraps
J.
,
Pen
U.-L.
,
Iliev
I. T.
,
Merz
H.
,
Emberson
J. D.
,
Desjacques
V.
,
2013
,
MNRAS
,
436
,
540

Hong
S. E.
,
Ahn
K.
,
Park
C.
,
Kim
J.
,
Iliev
I. T.
,
Mellema
G.
,
2014
,
J. Korean Astron. Soc.
,
47
,
49

Iliev
I. T.
,
Mellema
G.
,
Pen
U.-L.
,
Merz
H.
,
Shapiro
P. R.
,
Alvarez
M. A.
,
2006
,
MNRAS
,
369
,
1625

Komatsu
E.
et al. ,
2009
,
ApJS
,
180
,
330

Koopmans
L.
et al. ,
2015
,
Proc. Sci., The Cosmic Dawn and Epoch of Reionisation with SKA
.
SISSA
,
Trieste
,
PoS(AASKA14)001

Lee
K.-G.
,
Cen
R.
,
Gott
J. R.
,
III
,
Trac
H.
,
2008
,
ApJ
,
675
,
8

Lin
Y.
,
Oh
S. P.
,
Furlanetto
S. R.
,
Sutter
P. M.
,
2016
,
MNRAS
,
461
,
3361

Madau
P.
,
Meiksin
A.
,
Rees
M. J.
,
1997
,
ApJ
,
475
,
429

Majumdar
S.
,
Bharadwaj
S.
,
Datta
K. K.
,
Choudhury
T. R.
,
2011
,
MNRAS
,
413
,
1409

Majumdar
S.
,
Bharadwaj
S.
,
Choudhury
T. R.
,
2012
,
MNRAS
,
426
,
3178

Majumdar
S.
,
Mellema
G.
,
Datta
K. K.
,
Jensen
H.
,
Choudhury
T. R.
,
Bharadwaj
S.
,
Friedrich
M. M.
,
2014
,
MNRAS
,
443
,
2843

Majumdar
S.
,
Datta
K. K.
,
Ghara
R.
,
Mondal
R.
,
Choudhury
T. R.
,
Bharadwaj
S.
,
Ali
S. S.
,
Datta
A.
,
2016a
,
J. Astrophys. Astron.
,
37
,
32

Majumdar
S.
et al. ,
2016b
,
MNRAS
,
456
,
2080

Malloy
M.
,
Lidz
A.
,
2013
,
ApJ
,
767
,
68

Matheron
G.
,
1975
,
Random Sets and Integral Geometry
.
John Wiley & Sons Inc
,
West Sussex, UK

McQuinn
M.
,
Lidz
A.
,
Zahn
O.
,
Dutta
S.
,
Hernquist
L.
,
Zaldarriaga
M.
,
2007
,
MNRAS
,
377
,
1043

Mellema
G.
,
Iliev
I. T.
,
Alvarez
M. A.
,
Shapiro
P. R.
,
2006a
,
New Astron.
,
11
,
374

Mellema
G.
,
Iliev
I. T.
,
Pen
U.-L.
,
Shapiro
P. R.
,
2006b
,
MNRAS
,
372
,
679

Mellema
G.
,
Koopmans
L.
,
Shukla
H.
,
Datta
K. K.
,
Mesinger
A.
,
Majumdar
S.
,
2015
,
Proc. Sci., HI tomographic Imaging of the Cosmic Dawn and Epoch of Reionization with SKA
.
SISSA
,
Trieste
,
PoS(AASKA14)010

Mesinger
A.
,
Furlanetto
S.
,
2007
,
ApJ
,
669
,
663

Mondal
R.
,
Bharadwaj
S.
,
Majumdar
S.
,
Bera
A.
,
Acharyya
A.
,
2015
,
MNRAS
,
449
,
L41

Mondal
R.
,
Bharadwaj
S.
,
Majumdar
S.
,
2017
,
MNRAS
,
464
,
2992

Morales
M. F.
,
Wyithe
J. S. B.
,
2010
,
ARA&A
,
48
,
127

Muñoz
J. B.
,
Ali-Haïmoud
Y.
,
Kamionkowski
M.
,
2015
,
Phys. Rev. D
,
92
,
083508

Norberg
P.
,
Baugh
C. M.
,
Gaztañaga
E.
,
Croton
D. J.
,
2009
,
MNRAS
,
396
,
19

Paciga
G.
et al. ,
2013
,
MNRAS
,
433
,
639

Pillepich
A.
,
Porciani
C.
,
Matarrese
S.
,
2007
,
ApJ
,
662
,
1

Planck Collaboration XIII
,
2015
,
A&A
,
594
,
A13

Pritchard
J. R.
,
Loeb
A.
,
2012
,
Rep. Prog. Phys.
,
75
,
086901

Robertson
B. E.
et al. ,
2013
,
ApJ
,
768
,
71

Serra
J.
,
1983
,
Image Analysis and Mathematical Morphology
.
Academic Press, Inc.
,
San Diego, CA

Sheth
R. K.
,
van de Weygaert
R.
,
2004
,
MNRAS
,
350
,
517

Shin
M.-S.
,
Trac
H.
,
Cen
R.
,
2008
,
ApJ
,
681
,
756

Thompson
A. R.
,
Moran
J. M.
,
Swenson
G. W. Jr
,
2001
,
Interferometry and Synthesis in Radio Astronomy
, 2nd edn,
Wiley
,
New York

Vedantham
H.
,
Udaya Shankar
N.
,
Subrahmanyan
R.
,
2012
,
ApJ
,
745
,
176

Wyithe
J. S. B.
,
Loeb
A.
,
2004
,
Nature
,
432
,
194

Wyithe
S.
,
Geil
P.
,
Kim
H.
,
2015
,
Proc. Sci., Imaging HII Regions from Galaxies and Quasars During Reionisation with SKA
.
SISSA
,
Trieste
,
PoS(AASKA14)015

Yatawatta
S.
et al. ,
2013
,
A&A
,
550
,
A136

Yoshiura
S.
,
Shimabukuro
H.
,
Takahashi
K.
,
Matsubara
T.
,
2016
,
MNRAS
,
465
,
394

Zahn
O.
,
Lidz
A.
,
McQuinn
M.
,
Dutta
S.
,
Hernquist
L.
,
Zaldarriaga
M.
,
Furlanetto
S. R.
,
2007
,
ApJ
,
654
,
12

Zaroubi
S.
et al. ,
2012
,
MNRAS
,
425
,
2964

APPENDIX A: BASIC OPERATIONS IN MATHEMATICAL MORPHOLOGY

Mathematical morphology (Matheron 1975; Serra 1983) defines a set of operations to formulate an algebra of shapes. In this appendix, we present an intuitive explanation of these basic morphological operations using diagrams. For a mathematically rigorous treatment, the reader is referred to Matheron (1975), Serra (1983), Dougherty & Lotufo (2003), and Chiu et al. (2013).

Suppose that a binary field (image), X, is probed by a symmetric structuring element, S. There are four elemental operations as follows.

The Minkowski addition (dilation),18 denoted by a symbol ⊕, is defined as the union of a binary field, X, and a structuring element, S, as the centre of the structuring element is moved inside the binary field. Fig. A1 (top) shows how this operation works. The Minkowski addition enlarges the original binary field with a structuring element.

Diagrams describing how the Minkowski addition and subtraction work. The grey areas indicate the resulting binary images. The centre of the structuring element S is indicated by a cross. The hatched region and the circle are drawn as a guide.
Figure A1.

Diagrams describing how the Minkowski addition and subtraction work. The grey areas indicate the resulting binary images. The centre of the structuring element S is indicated by a cross. The hatched region and the circle are drawn as a guide.

The Minkowski subtraction (erosion), denoted by a symbol ⊖, is a dual to the Minkowski addition. It is defined as the intersection of a binary field, X, with the centre of a structuring element, S, as the structuring element moves inside the binary field. Fig. A1 (bottom) shows how the Minkowski subtraction works. Only the parts that can fit the structuring element inside the binary field remain. The Minkowski subtraction shrinks the original binary field with a structuring element.

The morphological opening, denoted by a symbol ∘, is then defined by a consecutive operation: Minkowski subtraction followed by Minkowski addition. The morphological opening of a binary field, X, by a structuring element, S, is expressed as XS ≡ (XS) ⊕ S. Fig. A2 (top) shows the initial and final results of the morphological opening operation. A step-by-step algebra of shapes defined in this framework of mathematical morphology is shown in Fig. A2 (bottom).

Diagrams describing how the morphological opening works. The meaning of the colour and symbols are same as in Fig. A1.
Figure A2.

Diagrams describing how the morphological opening works. The meaning of the colour and symbols are same as in Fig. A1.

The morphological closing is a dual operation to the morphological opening. Although here we do not use this operation, we define it for completeness. The morphological closing, denoted by a symbol •, is defined by a consecutive operation: Minkowski addition followed by Minkowski subtraction, XS ≡ (XS) ⊖ S.