-
MAPPRAISER: A massively parallel map-making framework for multi-kilo pixel CMB experiments
Authors:
Hamza El Bouhargani,
Aygul Jamal,
Dominic Beck,
Josquin Errard,
Laura Grigori,
Radek Stompor
Abstract:
Forthcoming cosmic microwave background (CMB) polarized anisotropy experiments have the potential to revolutionize our understanding of the Universe and fundamental physics. The sought-after, tale-telling signatures will be however distributed over voluminous data sets which these experiments will collect. These data sets will need to be efficiently processed and unwanted contributions due to astr…
▽ More
Forthcoming cosmic microwave background (CMB) polarized anisotropy experiments have the potential to revolutionize our understanding of the Universe and fundamental physics. The sought-after, tale-telling signatures will be however distributed over voluminous data sets which these experiments will collect. These data sets will need to be efficiently processed and unwanted contributions due to astrophysical, environmental, and instrumental effects characterized and efficiently mitigated in order to uncover the signatures. This poses a significant challenge to data analysis methods, techniques, and software tools which will not only have to be able to cope with huge volumes of data but to do so with unprecedented precision driven by the demanding science goals posed for the new experiments. A keystone of efficient CMB data analysis are solvers of very large linear systems of equations. Such systems appear in very diverse contexts throughout CMB data analysis pipelines, however they typically display similar algebraic structures and can therefore be solved using similar numerical techniques. Linear systems arising in the so-called map-making problem are one of the most prominent and common ones. In this work we present a massively parallel, flexible and extensible framework, comprised of a numerical library, MIDAPACK, and a high level code, MAPPRAISER, which provide tools for solving efficiently such systems. The framework implements iterative solvers based on conjugate gradient techniques: enlarged and preconditioned using different preconditioners. We demonstrate the framework on simulated examples reflecting basic characteristics of the forthcoming data sets issued by ground-based and satellite-borne instruments, executing it on as many as 16,384 compute cores. The software is developed as an open source project freely available to the community at: https://github.com/B3Dcmb/midapack .
△ Less
Submitted 10 May, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Accelerating linear system solvers for time domain component separation of cosmic microwave background data
Authors:
J. Papež,
L. Grigori,
R. Stompor
Abstract:
Component separation is one of the key stages of any modern, cosmic microwave background (CMB) data analysis pipeline. It is an inherently non-linear procedure and typically involves a series of sequential solutions of linear systems with similar, albeit not identical system matrices, derived for different data models of the same data set. Sequences of this kind arise for instance in the maximizat…
▽ More
Component separation is one of the key stages of any modern, cosmic microwave background (CMB) data analysis pipeline. It is an inherently non-linear procedure and typically involves a series of sequential solutions of linear systems with similar, albeit not identical system matrices, derived for different data models of the same data set. Sequences of this kind arise for instance in the maximization of the data likelihood with respect to foreground parameters or sampling of their posterior distribution. However, they are also common in many other contexts. In this work we consider solving the component separation problem directly in the measurement (time) domain, which can have a number of important advantageous over the more standard pixel-based methods, in particular if non-negligible time-domain noise correlations are present as it is commonly the case. The time-domain based approach implies, however, significant computational effort due to the need to manipulate the full volume of time-domain data set. To address this challenge, we propose and study efficient solvers adapted to solving time-domain-based, component separation systems and their sequences and which are capable of capitalizing on information derived from the previous solutions. This is achieved either via adapting the initial guess of the subsequent system or through a so-called subspace recycling, which allows to construct progressively more efficient, two-level preconditioners. We report an overall speed-up over solving the systems independently of a factor of nearly 7, or 5, in the worked examples inspired respectively by the likelihood maximization and likelihood sampling procedures we consider in this work.
△ Less
Submitted 1 June, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Solving linear equations with messenger-field and conjugate gradients techniques - an application to CMB data analysis
Authors:
J. Papez,
L. Grigori,
R. Stompor
Abstract:
We discuss linear system solvers invoking a messenger-field and compare them with (preconditioned) conjugate gradients approaches. We show that the messenger-field techniques correspond to fixed point iterations of an appropriately preconditioned initial system of linear equations. We then argue that a conjugate gradient solver applied to the same preconditioned system, or equivalently a precondit…
▽ More
We discuss linear system solvers invoking a messenger-field and compare them with (preconditioned) conjugate gradients approaches. We show that the messenger-field techniques correspond to fixed point iterations of an appropriately preconditioned initial system of linear equations. We then argue that a conjugate gradient solver applied to the same preconditioned system, or equivalently a preconditioned conjugate gradient solver using the same preconditioner and applied to the original system, will in general ensure at least a comparable and typically better performance in terms of the number of iterations to convergence and time-to-solution. We illustrate our conclusions on two common examples drawn from the Cosmic Microwave Background data analysis: Wiener filtering and map-making. In addition, and contrary to the standard lore in the CMB field, we show that the performance of the preconditioned conjugate gradient solver can depend importantly on the starting vector. This observation seems of particular importance in the cases of map-making of high signal-to-noise sky maps and therefore should be of relevance for the next generation of CMB experiments.
△ Less
Submitted 22 October, 2018; v1 submitted 9 March, 2018;
originally announced March 2018.
-
Accelerating Cosmic Microwave Background map-making procedure through preconditioning
Authors:
Mikolaj Szydlarski,
Laura Grigori,
Radek Stompor
Abstract:
Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blo…
▽ More
Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blocks. In this work we study new map-making solvers potentially suitable for applications to the largest anticipated data sets. They are based on iterative conjugate gradient (CG) approaches enhanced with novel, parallel, two-level preconditioners. We apply the proposed solvers to examples of simulated non-polarised and polarised CMB observations, and a set of idealised scanning strategies with sky coverage ranging from nearly a full sky down to small sky patches. We discuss in detail their implementation for massively parallel computational platforms and their performance for a broad range of parameters characterising the simulated data sets. We find that our best new solver can outperform carefully-optimised standard solvers used today by a factor of as much as 5 in terms of the convergence rate and a factor of up to $4$ in terms of the time to solution, and to do so without significantly increasing the memory consumption and the volume of inter-processor communication. The performance of the new algorithms is also found to be more stable and robust, and less dependent on specific characteristics of the analysed data set. We therefore conclude that the proposed approaches are well suited to address successfully challenges posed by new and forthcoming CMB data sets.
△ Less
Submitted 15 December, 2014; v1 submitted 13 August, 2014;
originally announced August 2014.
-
Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)
Authors:
Mikolaj Szydlarski,
Pierre Esterie,
Joel Falcou,
Laura Grigori,
R. Stompor
Abstract:
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formi…
▽ More
Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms.
This paper describes parallel algorithms for computing SHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures, multi-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top-level, MPI-based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi") outperforms the state of the art implementation for a multi-core processor executed on a current Intel Core i7-2600K. Furthermore, we show that an MPI/CUDA version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed on the same number of quad-core processors Intel Nahalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance, and elucidates the sources of the dichotomy between the direct and the inverse operations.
△ Less
Submitted 1 April, 2013; v1 submitted 1 June, 2011;
originally announced June 2011.
-
Spherical harmonic transform with GPUs
Authors:
Ioan O. Hupca,
Joel Falcou,
Laura Grigori,
Radek Stompor
Abstract:
We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original…
▽ More
We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability.
△ Less
Submitted 6 October, 2010;
originally announced October 2010.