-
The New Worlds Simulations: Large-scale Simulations across Three Cosmologies
Authors:
Katrin Heitmann,
Thomas Uram,
Nicholas Frontiere,
Salman Habib,
Adrian Pope,
Silvio Rizzi,
Joe Insley
Abstract:
In this paper we describe the set of ``New Worlds Simulations'', three very large cosmology simulations, Qo'noS, Vulcan, and Ferenginar, that were carried out on the Summit supercomputer with the Hardware/Hybrid Cosmology Code, HACC. The gravity-only simulations follow the evolution of structure in the Universe by each employing 12,288^3 particles in (3 Gpc/h)^3 volumes, leading to a mass resoluti…
▽ More
In this paper we describe the set of ``New Worlds Simulations'', three very large cosmology simulations, Qo'noS, Vulcan, and Ferenginar, that were carried out on the Summit supercomputer with the Hardware/Hybrid Cosmology Code, HACC. The gravity-only simulations follow the evolution of structure in the Universe by each employing 12,288^3 particles in (3 Gpc/h)^3 volumes, leading to a mass resolution of m_p~10^9 Msun/h. The simulations cover three different cosmologies, one LambdaCDM model, consistent with measurements from Planck, one simulation with massive neutrinos, and one simulation with a varying dark energy equation of state. All simulations have the same phases to allow a detailed comparison of the results and the investigation of the impact of different cosmological parameters. We present measurements of some basic statistics, such as matter power spectra, correlation function, halo mass function and concentration-mass relation and investigate the differences due to the varying cosmologies. Given the large volume and high resolution, these simulations provide excellent bases for creating synthetic skies. A subset of the data is made publicly available as part of this paper.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Extreme Scale Survey Simulation with Python Workflows
Authors:
A. S. Villarreal,
Yadu Babuji,
Tom Uram,
Daniel S. Katz,
Kyle Chard,
Katrin Heitmann
Abstract:
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will soon carry out an unprecedented wide, fast, and deep survey of the sky in multiple optical bands. The data from LSST will open up a new discovery space in astronomy and cosmology, simultaneously providing clues toward addressing burning issues of the day, such as the origin of dark energy and and the nature of dark matter, w…
▽ More
The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will soon carry out an unprecedented wide, fast, and deep survey of the sky in multiple optical bands. The data from LSST will open up a new discovery space in astronomy and cosmology, simultaneously providing clues toward addressing burning issues of the day, such as the origin of dark energy and and the nature of dark matter, while at the same time yielding data that will, in turn, pose fresh new questions. To prepare for the imminent arrival of this remarkable data set, it is crucial that the associated scientific communities be able to develop the software needed to analyze it. Computational power now available allows us to generate synthetic data sets that can be used as a realistic training ground for such an effort. This effort raises its own challenges -- the need to generate very large simulations of the night sky, scaling up simulation campaigns to large numbers of compute nodes across multiple computing centers with different architectures, and optimizing the complex workload around memory requirements and widely varying wall clock times. We describe here a large-scale workflow that melds together Python code to steer the workflow, Parsl to manage the large-scale distributed execution of workflow components, and containers to carry out the image simulation campaign across multiple sites. Taking advantage of these tools, we developed an extreme-scale computational framework and used it to simulate five years of observations for 300 square degrees of sky area. We describe our experiences and lessons learned in developing this workflow capability, and highlight how the scalability and portability of our approach enabled us to efficiently execute it on up to 4000 compute nodes on two supercomputers.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Farpoint: A High-Resolution Cosmology Simulation at the Gigaparsec Scale
Authors:
Nicholas Frontiere,
Katrin Heitmann,
Esteban Rangel,
Patricia Larsen,
Adrian Pope,
Imran Sultan,
Thomas Uram,
Salman Habib,
Silvio Rizzi,
Joe Insley
Abstract:
In this paper we introduce the Farpoint simulation, the latest member of the Hardware/Hybrid Accelerated Cosmology Code (HACC) gravity-only simulation family. The domain covers a volume of (1000$h^{-1}$Mpc)$^3$ and evolves close to two trillion particles, corresponding to a mass resolution of $m_p\sim 4.6\cdot 10^7 h^{-1}$M$_\odot$. These specifications enable comprehensive investigations of the g…
▽ More
In this paper we introduce the Farpoint simulation, the latest member of the Hardware/Hybrid Accelerated Cosmology Code (HACC) gravity-only simulation family. The domain covers a volume of (1000$h^{-1}$Mpc)$^3$ and evolves close to two trillion particles, corresponding to a mass resolution of $m_p\sim 4.6\cdot 10^7 h^{-1}$M$_\odot$. These specifications enable comprehensive investigations of the galaxy-halo connection, capturing halos down to small masses. Further, the large volume resolves scales typical of modern surveys with good statistical coverage of high mass halos. The simulation was carried out on the GPU-accelerated system Summit, one of the fastest supercomputers currently available. We provide specifics about the Farpoint run and present an initial set of results. The high mass resolution facilitates precise measurements of important global statistics, such as the halo concentration-mass relation and the correlation function down to small scales. Selected subsets of the simulation data products are publicly available via the HACC Simulation Data Portal.
△ Less
Submitted 28 February, 2022; v1 submitted 4 September, 2021;
originally announced September 2021.
-
AxonEM Dataset: 3D Axon Instance Segmentation of Brain Cortical Regions
Authors:
Donglai Wei,
Kisuk Lee,
Hanyu Li,
Ran Lu,
J. Alexander Bae,
Zequan Liu,
Lifu Zhang,
Márcia dos Santos,
Zudi Lin,
Thomas Uram,
Xueying Wang,
Ignacio Arganda-Carreras,
Brian Matejek,
Narayanan Kasthuri,
Jeff Lichtman,
Hanspeter Pfister
Abstract:
Electron microscopy (EM) enables the reconstruction of neural circuits at the level of individual synapses, which has been transformative for scientific discoveries. However, due to the complex morphology, an accurate reconstruction of cortical axons has become a major challenge. Worse still, there is no publicly available large-scale EM dataset from the cortex that provides dense ground truth seg…
▽ More
Electron microscopy (EM) enables the reconstruction of neural circuits at the level of individual synapses, which has been transformative for scientific discoveries. However, due to the complex morphology, an accurate reconstruction of cortical axons has become a major challenge. Worse still, there is no publicly available large-scale EM dataset from the cortex that provides dense ground truth segmentation for axons, making it difficult to develop and evaluate large-scale axon reconstruction methods. To address this, we introduce the AxonEM dataset, which consists of two 30x30x30 um^3 EM image volumes from the human and mouse cortex, respectively. We thoroughly proofread over 18,000 axon instances to provide dense 3D axon instance segmentation, enabling large-scale evaluation of axon reconstruction methods. In addition, we densely annotate nine ground truth subvolumes for training, per each data volume. With this, we reproduce two published state-of-the-art methods and provide their evaluation results as a baseline. We publicly release our code and data at https://connectomics-bazaar.github.io/proj/AxonEM/index.html to foster the development of advanced methods.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Toward Real-time Analysis of Experimental Science Workloads on Geographically Distributed Supercomputers
Authors:
Michael Salim,
Thomas Uram,
J. Taylor Childers,
Venkat Vishwanath,
Michael E. Papka
Abstract:
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing policies are prohibitive when faced with scarce inst…
▽ More
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing policies are prohibitive when faced with scarce instrument time. To bridge this divide, we introduce Balsam: a distributed orchestration platform enabling workflows at the edge to securely and efficiently trigger analytics tasks across a user-managed federation of HPC execution sites. We describe the architecture of the Balsam service, which provides a workflow management API, and distributed sites that provision resources and schedule scalable, fault-tolerant execution. We demonstrate Balsam in efficiently scaling real-time analytics from two DOE light sources simultaneously onto three supercomputers (Theta, Summit, and Cori), while maintaining low overheads for on-demand computing, and providing a Python library for seamless integration with existing ecosystems of data analysis tools.
△ Less
Submitted 2 July, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Workflows Community Summit: Bringing the Scientific Workflows Community Together
Authors:
Rafael Ferreira da Silva,
Henri Casanova,
Kyle Chard,
Dan Laney,
Dong Ahn,
Shantenu Jha,
Carole Goble,
Lavanya Ramakrishnan,
Luc Peterson,
Bjoern Enders,
Douglas Thain,
Ilkay Altintas,
Yadu Babuji,
Rosa M. Badia,
Vivien Bonazzi,
Taina Coleman,
Michael Crusoe,
Ewa Deelman,
Frank Di Natale,
Paolo Di Tommaso,
Thomas Fahringer,
Rosa Filgueira,
Grigori Fursin,
Alex Ganose,
Bjorn Gruning
, et al. (20 additional authors not shown)
Abstract:
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla…
▽ More
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
DESC DC2 Data Release Note
Authors:
LSST Dark Energy Science Collaboration,
Bela Abolfathi,
Robert Armstrong,
Humna Awan,
Yadu N. Babuji,
Franz Erik Bauer,
George Beckett,
Rahul Biswas,
Joanne R. Bogart,
Dominique Boutigny,
Kyle Chard,
James Chiang,
Johann Cohen-Tanugi,
Andrew J. Connolly,
Scott F. Daniel,
Seth W. Digel,
Alex Drlica-Wagner,
Richard Dubois,
Eric Gawiser,
Thomas Glanzman,
Salman Habib,
Andrew P. Hearin,
Katrin Heitmann,
Fabio Hernandez,
Renée Hložek
, et al. (32 additional authors not shown)
Abstract:
In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg$^2$ simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a reference LSST observing cadence, was processed with th…
▽ More
In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg$^2$ simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a reference LSST observing cadence, was processed with the LSST Science Pipelines (19.0.0). In this Note, we describe the public data release of the resulting object catalogs for the coadded images of five years of simulated observations along with associated truth catalogs. We include a brief description of the major features of the available data sets. To enable convenient access to the data products, we have developed a web portal connected to Globus data services. We describe how to access the data and provide example Jupyter Notebooks in Python to aid first interactions with the data. We welcome feedback and questions about the data release via a GitHub repository.
△ Less
Submitted 13 June, 2022; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Toward an Automated HPC Pipeline for Processing Large Scale Electron Microscopy Data
Authors:
Rafael Vescovi,
Hanyu Li,
Jeffery Kinnison,
Murat Keceli,
Misha Salim,
Narayanan Kasthuri,
Thomas D. Uram,
Nicola Ferrier
Abstract:
We present a fully modular and scalable software pipeline for processing electron microscope (EM) images of brain slices into 3D visualization of individual neurons and demonstrate an end-to-end segmentation of a large EM volume using a supercomputer. Our pipeline scales multiple packages used by the EM community with minimal changes to the original source codes. We tested each step of the pipelin…
▽ More
We present a fully modular and scalable software pipeline for processing electron microscope (EM) images of brain slices into 3D visualization of individual neurons and demonstrate an end-to-end segmentation of a large EM volume using a supercomputer. Our pipeline scales multiple packages used by the EM community with minimal changes to the original source codes. We tested each step of the pipeline individually, on a workstation, a cluster, and a supercomputer. Furthermore, we can compose workflows from these operations using a Balsam database that can be triggered during the data acquisition or with the use of different front ends and control the granularity of the pipeline execution. We describe the implementation of our pipeline and modifications required to integrate and scale up existing codes. The modular nature of our environment enables diverse research groups to contribute to the pipeline without disrupting the workflow, i.e. new individual codes can be easily integrated for each step on the pipeline.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
The LSST DESC DC2 Simulated Sky Survey
Authors:
LSST Dark Energy Science Collaboration,
Bela Abolfathi,
David Alonso,
Robert Armstrong,
Éric Aubourg,
Humna Awan,
Yadu N. Babuji,
Franz Erik Bauer,
Rachel Bean,
George Beckett,
Rahul Biswas,
Joanne R. Bogart,
Dominique Boutigny,
Kyle Chard,
James Chiang,
Chuck F. Claver,
Johann Cohen-Tanugi,
Céline Combet,
Andrew J. Connolly,
Scott F. Daniel,
Seth W. Digel,
Alex Drlica-Wagner,
Richard Dubois,
Emmanuel Gangler,
Eric Gawiser
, et al. (55 additional authors not shown)
Abstract:
We describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program represents a unique modeling effort that stresses…
▽ More
We describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program represents a unique modeling effort that stresses this interconnectivity in a way that has not been attempted before. This effort encompasses a full end-to-end approach: starting from a large N-body simulation, through setting up LSST-like observations including realistic cadences, through image simulations, and finally processing with Rubin's LSST Science Pipelines. This last step ensures that we generate data products resembling those to be delivered by the Rubin Observatory as closely as is currently possible. The simulated DC2 sky survey covers six optical bands in a wide-fast-deep (WFD) area of approximately 300 deg^2 as well as a deep drilling field (DDF) of approximately 1 deg^2. We simulate 5 years of the planned 10-year survey. The DC2 sky survey has multiple purposes. First, the LSST DESC working groups can use the dataset to develop a range of DESC analysis pipelines to prepare for the advent of actual data. Second, it serves as a realistic testbed for the image processing software under development for LSST by the Rubin Observatory. In particular, simulated data provide a controlled way to investigate certain image-level systematic effects. Finally, the DC2 sky survey enables the exploration of new scientific ideas in both static and time-domain cosmology.
△ Less
Submitted 26 January, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
The Last Journey. I. An Extreme-Scale Simulation on the Mira Supercomputer
Authors:
Katrin Heitmann,
Nicholas Frontiere,
Esteban Rangel,
Patricia Larsen,
Adrian Pope,
Imran Sultan,
Thomas Uram,
Salman Habib,
Hal Finkel,
Danila Korytov,
Eve Kovacs,
Silvio Rizzi,
Joe Insley
Abstract:
The Last Journey is a large-volume, gravity-only, cosmological N-body simulation evolving more than 1.24 trillion particles in a periodic box with a side-length of 5.025Gpc. It was implemented using the HACC simulation and analysis framework on the BG/Q system, Mira. The cosmological parameters are chosen to be consistent with the results from the Planck satellite. A range of analysis tools have b…
▽ More
The Last Journey is a large-volume, gravity-only, cosmological N-body simulation evolving more than 1.24 trillion particles in a periodic box with a side-length of 5.025Gpc. It was implemented using the HACC simulation and analysis framework on the BG/Q system, Mira. The cosmological parameters are chosen to be consistent with the results from the Planck satellite. A range of analysis tools have been run in situ to enable a diverse set of science projects, and at the same time, to keep the resulting data amount manageable. Analysis outputs have been generated starting at redshift z~10 to allow for construction of synthetic galaxy catalogs using a semi-analytic modeling approach in post-processing. As part of our in situ analysis pipeline we employ a new method for tracking halo sub-structures, introducing the concept of subhalo cores. The production of multi-wavelength synthetic sky maps is facilitated by generating particle lightcones in situ, also beginning at z~10. We provide an overview of the simulation set-up and the generated data products; a first set of analysis results is presented. A subset of the data is publicly available.
△ Less
Submitted 8 January, 2021; v1 submitted 2 June, 2020;
originally announced June 2020.
-
The Mira-Titan Universe. III. Emulation of the Halo Mass Function
Authors:
Sebastian Bocquet,
Katrin Heitmann,
Salman Habib,
Earl Lawrence,
Thomas Uram,
Nicholas Frontiere,
Adrian Pope,
Hal Finkel
Abstract:
We construct an emulator for the halo mass function over group and cluster mass scales for a range of cosmologies, including the effects of dynamical dark energy and massive neutrinos. The emulator is based on the recently completed Mira-Titan Universe suite of cosmological $N$-body simulations. The main set of simulations spans 111 cosmological models with 2.1 Gpc boxes. We extract halo catalogs…
▽ More
We construct an emulator for the halo mass function over group and cluster mass scales for a range of cosmologies, including the effects of dynamical dark energy and massive neutrinos. The emulator is based on the recently completed Mira-Titan Universe suite of cosmological $N$-body simulations. The main set of simulations spans 111 cosmological models with 2.1 Gpc boxes. We extract halo catalogs in the redshift range $z=[0.0, 2.0]$ and for masses $M_{200\mathrm{c}}\geq 10^{13}M_\odot/h$. The emulator covers an 8-dimensional hypercube spanned by {$Ω_\mathrm{m}h^2$, $Ω_\mathrm{b}h^2$, $Ω_νh^2$, $σ_8$, $h$, $n_s$, $w_0$, $w_a$}; spatial flatness is assumed. We obtain smooth halo mass functions by fitting piecewise second-order polynomials to the halo catalogs and employ Gaussian process regression to construct the emulator while keeping track of the statistical noise in the input halo catalogs and uncertainties in the regression process. For redshifts $z\lesssim1$, the typical emulator precision is better than $2\%$ for $10^{13}-10^{14} M_\odot/h$ and $<10\%$ for $M\simeq 10^{15}M_\odot/h$. For comparison, fitting functions using the traditional universal form for the halo mass function can be biased at up to 30\% at $M\simeq 10^{14}M_\odot/h$ for $z=0$. Our emulator is publicly available at \url{https://github.com/SebastianBocquet/MiraTitanHMFemulator}.
△ Less
Submitted 5 August, 2020; v1 submitted 26 March, 2020;
originally announced March 2020.
-
Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows
Authors:
Michael A. Salim,
Thomas D. Uram,
J. Taylor Childers,
Prasanna Balaprakash,
Venkatram Vishwanath,
Michael E. Papka
Abstract:
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their…
▽ More
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their scheduling lifecycle. The ensembles execute in a pilot "launcher" which (i) ensures concurrent, load-balanced execution of arbitrary serial and parallel programs with heterogeneous processor requirements, (ii) requires no modification of user applications, (iii) is tolerant of task-level faults and provides several options for error recovery, (iv) stores provenance data (e.g task history, error logs) in the database, (v) supports dynamic workflows, in which tasks are created or killed at runtime. Here, we present the design and Python implementation of the Balsam service and launcher. The efficacy of this system is illustrated using two case studies: hyperparameter optimization of deep neural networks, and high-throughput single-point quantum chemistry calculations. We find that the unique combination of flexible job-packing and automated scheduling with dynamic (pilot-managed) execution facilitates excellent resource utilization. The scripting overheads typically needed to manage resources and launch workflows on supercomputers are substantially reduced, accelerating workflow development and execution.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Authors:
Wushi Dong,
Murat Keceli,
Rafael Vescovi,
Hanyu Li,
Corey Adams,
Elise Jennings,
Samuel Flender,
Tom Uram,
Venkatram Vishwanath,
Nicola Ferrier,
Narayanan Kasthuri,
Peter Littlewood
Abstract:
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel d…
▽ More
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod library, which is different from the asynchronous training scheme used in the published FFN code. We demonstrated that our distributed training scaled well up to 2048 Intel Knights Landing (KNL) nodes on the Theta supercomputer. Our trained models achieved similar level of inference performance, but took less training time compared to previous methods. Our study on the effects of different batch sizes on FFN training suggests ways to further improve training efficiency. Our findings on optimal learning rate and batch sizes agree with previous works.
△ Less
Submitted 9 December, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
The Outer Rim Simulation: A Path to Many-Core Supercomputers
Authors:
Katrin Heitmann,
Hal Finkel,
Adrian Pope,
Vitali Morozov,
Nicholas Frontiere,
Salman Habib,
Esteban Rangel,
Thomas Uram,
Danila Korytov,
Hillary Child,
Samuel Flender,
Joe Insley,
Silvio Rizzi
Abstract:
We describe the Outer Rim cosmological simulation, one of the largest high-resolution N-body simulations performed to date, aimed at promoting science to be carried out with large-scale structure surveys. The simulation covers a volume of (4.225Gpc)^3 and evolves more than one trillion particles. It was executed on Mira, a BlueGene/Q system at the Argonne Leadership Computing Facility. We discuss…
▽ More
We describe the Outer Rim cosmological simulation, one of the largest high-resolution N-body simulations performed to date, aimed at promoting science to be carried out with large-scale structure surveys. The simulation covers a volume of (4.225Gpc)^3 and evolves more than one trillion particles. It was executed on Mira, a BlueGene/Q system at the Argonne Leadership Computing Facility. We discuss some of the computational challenges posed by a system like Mira, a many-core supercomputer, and how the simulation code, HACC, has been designed to overcome these challenges. We have carried out a large range of analyses on the simulation data and we report on the results as well as the data products that have been generated. The full data set generated by the simulation totals more than 5PB of data, making data curation and data handling a large challenge in of itself. The simulation results have been used to generate synthetic catalogs for large-scale structure surveys, including DESI and eBOSS, as well as CMB experiments. A detailed catalog for the LSST DESC data challenges has been created as well. We publicly release some of the Outer Rim halo catalogs, downsampled particle information, and lightcone data.
△ Less
Submitted 28 April, 2019; v1 submitted 26 April, 2019;
originally announced April 2019.
-
HACC Cosmological Simulations: First Data Release
Authors:
Katrin Heitmann,
Thomas D. Uram,
Hal Finkel,
Nicholas Frontiere,
Salman Habib,
Adrian Pope,
Esteban Rangel,
Joseph Hollowed,
Danila Korytov,
Patricia Larsen,
Benjamin S. Allen,
Kyle Chard,
Ian Foster
Abstract:
We describe the first major public data release from cosmological simulations carried out with Argonne's HACC code. This initial release covers a range of datasets from large gravity-only simulations. The data products include halo information for multiple redshifts, down-sampled particles, and lightcone outputs. We provide data from two very large LCDM simulations as well as beyond-LCDM simulatio…
▽ More
We describe the first major public data release from cosmological simulations carried out with Argonne's HACC code. This initial release covers a range of datasets from large gravity-only simulations. The data products include halo information for multiple redshifts, down-sampled particles, and lightcone outputs. We provide data from two very large LCDM simulations as well as beyond-LCDM simulations spanning eleven w0-wa cosmologies. Our release platform uses Petrel, a research data service, located at the Argonne Leadership Computing Facility. Petrel offers fast data transfer mechanisms and authentication via Globus, enabling simple and efficient access to stored datasets. Easy browsing of the available data products is provided via a web portal that allows the user to navigate simulation products efficiently. The data hub will be extended by adding more types of data products and by enabling computational capabilities to allow direct interactions with simulation results.
△ Less
Submitted 3 October, 2019; v1 submitted 26 April, 2019;
originally announced April 2019.
-
DESCQA: An Automated Validation Framework for Synthetic Sky Catalogs
Authors:
Yao-Yuan Mao,
Eve Kovacs,
Katrin Heitmann,
Thomas D. Uram,
Andrew J. Benson,
Duncan Campbell,
Sofía A. Cora,
Joseph DeRose,
Tiziana Di Matteo,
Salman Habib,
Andrew P. Hearin,
J. Bryce Kalmbach,
K. Simon Krughoff,
François Lanusse,
Zarija Lukić,
Rachel Mandelbaum,
Jeffrey A. Newman,
Nelson Padilla,
Enrique Paillas,
Adrian Pope,
Paul M. Ricker,
Andrés N. Ruiz,
Ananth Tenneti,
Cristian Vega-Martínez,
Risa H. Wechsler
, et al. (2 additional authors not shown)
Abstract:
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing these uncertainties, aiding analysis pipeline developm…
▽ More
The use of high-quality simulated sky catalogs is essential for the success of cosmological surveys. The catalogs have diverse applications, such as investigating signatures of fundamental physics in cosmological observables, understanding the effect of systematic uncertainties on measured signals and testing mitigation strategies for reducing these uncertainties, aiding analysis pipeline development and testing, and survey strategy optimization. The list of applications is growing with improvements in the quality of the catalogs and the details that they can provide. Given the importance of simulated catalogs, it is critical to provide rigorous validation protocols that enable both catalog providers and users to assess the quality of the catalogs in a straightforward and comprehensive way. For this purpose, we have developed the DESCQA framework for the Large Synoptic Survey Telescope Dark Energy Science Collaboration as well as for the broader community. The goal of DESCQA is to enable the inspection, validation, and comparison of an inhomogeneous set of synthetic catalogs via the provision of a common interface within an automated framework. In this paper, we present the design concept and first implementation of DESCQA. In order to establish and demonstrate its full functionality we use a set of interim catalogs and validation tests. We highlight several important aspects, both technical and scientific, that require thoughtful consideration when designing a validation framework, including validation metrics and how these metrics impose requirements on the synthetic sky catalogs.
△ Less
Submitted 8 February, 2018; v1 submitted 27 September, 2017;
originally announced September 2017.
-
Adapting the serial Alpgen event generator to simulate LHC collisions on millions of parallel threads
Authors:
J. T. Childers,
T. D. Uram,
T. J. LeCompte,
M. E. Papka,
D. P. Benjamin
Abstract:
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application th…
▽ More
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.