-
Reproducing the results for NICER observation of PSR J0030+0451
Authors:
Chaitanya Afle,
Patrick R. Miles,
Silvina Caino-Lores,
Collin D. Capano,
Ingo Tews,
Karan Vahi,
Ewa Deelman,
Michela Taufer,
Duncan A. Brown
Abstract:
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We not…
▽ More
NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We note the challenges we faced in reproducing the results and demonstrate that the analysis can be reproduced and reused in future works by changing the prior distribution for the radius and the sampler configuration. We find no significant change in the measurement of the mass and radius, demonstrating that the original result is robust to these changes. Finally, we provide a containerized working environment that facilitates third-party reproduction of the measurements of mass and radius of PSR J0030+0451 using the NICER observations.
△ Less
Submitted 31 January, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Reproducibility of the First Image of a Black Hole in the Galaxy M87 from the Event Horizon Telescope (EHT) Collaboration
Authors:
Ria Patel,
Brandan Roachell,
Silvina Caino-Lores,
Ross Ketron,
Jacob Leonard,
Nigel Tan,
Duncan Brown,
Ewa Deelman,
Michela Taufer
Abstract:
This paper presents an interdisciplinary effort aiming to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with the generation of the first image of a black hole, called M87. The image was computed by the Event Horizon Te…
▽ More
This paper presents an interdisciplinary effort aiming to develop and share sustainable knowledge necessary to analyze, understand, and use published scientific results to advance reproducibility in multi-messenger astrophysics. Specifically, we target the breakthrough work associated with the generation of the first image of a black hole, called M87. The image was computed by the Event Horizon Telescope Collaboration. Based on the artifacts made available by EHT, we deliver documentation, code, and a computational environment to reproduce the first image of a black hole. Our deliverables support new discovery in multi-messenger astrophysics by providing all the necessary tools for generalizing methods and findings from the EHT use case. Challenges encountered during the reproducibility of EHT results are reported. The result of our effort is an open-source, containerized software package that enables the public to reproduce the first image of a black hole in the galaxy M87.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Astronomical Image Processing at Scale With Pegasus and Montage
Authors:
G. Bruce Berriman,
John C. Good,
Ewa Deelman,
Ryan Tanaka,
Karan Vahi
Abstract:
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow by locating data and resources, recovery from fail…
▽ More
Image processing at scale is a powerful tool for creating new data sets and integrating them with existing data sets and performing analysis and quality assurance investigations. Workflow managers offer advantages in this type of processing, which involves multiple data access and processing steps. Generally, they enable automation of the workflow by locating data and resources, recovery from failures, and monitoring of performance. In this focus demo we demonstrate how the Pegasus Workflow Manager Python API manages image processing to create mosaics with the Montage Image Mosaic engine. Since 2001, Pegasus has been developed and maintained at USC/ISI. Montage was in fact one of the first applications used to design Pegasus and optimize its performance. Pegasus has since found application in many areas of science. LIGO exploited it in making discoveries of black holes. The Vera C. Rubin Observatory used it to compare the cost and performance of processing images on cloud platforms. While these are examples of projects at large scale, small team investigations on local clusters of machines can benefit from Pegasus as well.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Reproducing GW150914: the first observation of gravitational waves from a binary black hole merger
Authors:
Duncan A. Brown,
Karan Vahi,
Michela Taufer,
Von Welch,
Ewa Deelman
Abstract:
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has…
▽ More
In 2016, LIGO and Virgo announced the first observation of gravitational waves from a binary black hole merger, known as GW150914. To establish the confidence of this detection, large-scale scientific workflows were used to measure the event's statistical significance. They used code written by the LIGO/Virgo and were executed on the LIGO Data Grid. The codes are publicly available, but there has not yet been an attempt to directly reproduce the results, although several analyses have replicated the analysis, confirming the detection. We attempt to reproduce the result presented in the GW150914 discovery paper using publicly available code on the Open Science Grid. We show that we can reproduce the main result but we cannot exactly reproduce the LIGO analysis as the original data set used is not public. We discuss the challenges we encountered and make recommendations for scientists who wish to make their work reproducible.
△ Less
Submitted 2 March, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Applicability study of the PRIMAD model to LIGO gravitational wave search workflows
Authors:
Dylan Chapp,
Danny Rorabaugh,
Duncan Brown,
Ewa Deelman,
Karan Vahi,
Von Welch,
Michela Taufer
Abstract:
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess the model applicability to a set of Laser Interferometer Gravitational-Wave Observatory (LIGO) workflows from literature sources (i.e., published…
▽ More
The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess the model applicability to a set of Laser Interferometer Gravitational-Wave Observatory (LIGO) workflows from literature sources (i.e., published papers). Our work outlines potentials and limits of the model in terms of its abstraction levels and application process.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.
-
Cyberinfrastructure Requirements to Enhance Multi-messenger Astrophysics
Authors:
Philip Chang,
Gabrielle Allen,
Warren Anderson,
Federica B. Bianco,
Joshua S. Bloom,
Patrick R. Brady,
Adam Brazier,
S. Bradley Cenko,
Sean M. Couch,
Tyce DeYoung,
Ewa Deelman,
Zachariah B Etienne,
Ryan J. Foley,
Derek B Fox,
V. Zach Golkhou,
Darren R Grant,
Chad Hanna,
Kelly Holley-Bockelmann,
D. Andrew Howell,
E. A. Huerta,
Margaret W. G. Johnson,
Mario Juric,
David L. Kaplan,
Daniel S. Katz,
Azadeh Keivani
, et al. (17 additional authors not shown)
Abstract:
The identification of the electromagnetic counterpart of the gravitational wave event, GW170817, and discovery of neutrinos and gamma-rays from TXS 0506+056 heralded the new era of multi-messenger astrophysics. As the number of multi-messenger events rapidly grow over the next decade, the cyberinfrastructure requirements to handle the increase in data rates, data volume, need for event follow up,…
▽ More
The identification of the electromagnetic counterpart of the gravitational wave event, GW170817, and discovery of neutrinos and gamma-rays from TXS 0506+056 heralded the new era of multi-messenger astrophysics. As the number of multi-messenger events rapidly grow over the next decade, the cyberinfrastructure requirements to handle the increase in data rates, data volume, need for event follow up, and analysis across the different messengers will also explosively grow. The cyberinfrastructure requirements to enhance multi-messenger astrophysics will both be a major challenge and opportunity for astronomers, physicists, computer scientists and cyberinfrastructure specialists. Here we outline some of these requirements and argue for a distributed cyberinfrastructure institute for multi-messenger astrophysics to meet these challenges.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
All-sky Search for Periodic Gravitational Waves in the O1 LIGO Data
Authors:
LIGO Scientific Collaboration,
Virgo Collaboration,
B. P. Abbott,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
T. Adams,
P. Addesso,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
M. Afrough,
B. Agarwal,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
B. Allen,
G. Allen,
A. Allocca,
P. A. Altin
, et al. (1020 additional authors not shown)
Abstract:
We report on an all-sky search for periodic gravitational waves in the frequency band 20-475 Hz and with a frequency time derivative in the range of [-1.0, +0.1]e-8 Hz/s. Such a signal could be produced by a nearby spinning and slightly non-axisymmetric isolated neutron star in our galaxy. This search uses the data from Advanced LIGO's first observational run, O1. No periodic gravitational wave si…
▽ More
We report on an all-sky search for periodic gravitational waves in the frequency band 20-475 Hz and with a frequency time derivative in the range of [-1.0, +0.1]e-8 Hz/s. Such a signal could be produced by a nearby spinning and slightly non-axisymmetric isolated neutron star in our galaxy. This search uses the data from Advanced LIGO's first observational run, O1. No periodic gravitational wave signals were observed, and upper limits were placed on their strengths. The lowest upper limits on worst-case (linearly polarized) strain amplitude h0 are 4e-25 near 170 Hz. For a circularly polarized source (most favorable orientation), the smallest upper limits obtained are 1.5e-25. These upper limits refer to all sky locations and the entire range of frequency derivative values. For a population-averaged ensemble of sky locations and stellar orientations, the lowest upper limits obtained for the strain amplitude are 2.5e-25.
△ Less
Submitted 15 July, 2017; v1 submitted 9 July, 2017;
originally announced July 2017.
-
Upper Limits on Gravitational Waves from Scorpius X-1 from a Model-Based Cross-Correlation Search in Advanced LIGO Data
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
B. P. Abbott,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
T. Adams,
P. Addesso,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
M. Afrough,
B. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
B. Allen,
G. Allen,
A. Allocca
, et al. (1024 additional authors not shown)
Abstract:
We present the results of a semicoherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using data from the first Advanced LIGO observing run. The search method uses details of the modelled, parametrized continuous signal to combine coherently data separated by less than a specified coherence time, which can be adjusted to trade off sensitivity against compu…
▽ More
We present the results of a semicoherent search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1, using data from the first Advanced LIGO observing run. The search method uses details of the modelled, parametrized continuous signal to combine coherently data separated by less than a specified coherence time, which can be adjusted to trade off sensitivity against computational cost. A search was conducted over the frequency range from 25 Hz to 2000 Hz, spanning the current observationally-constrained range of the binary orbital parameters. No significant detection candidates were found, and frequency-dependent upper limits were set using a combination of sensitivity estimates and simulated signal injections. The most stringent upper limit was set at 175 Hz, with comparable limits set across the most sensitive frequency range from 100 Hz to 200 Hz. At this frequency, the 95 pct upper limit on signal amplitude h0 is 2.3e-25 marginalized over the unknown inclination angle of the neutron star's spin, and 8.03e-26 assuming the best orientation (which results in circularly polarized gravitational waves). These limits are a factor of 3-4 stronger than those set by other analyses of the same data, and a factor of about 7 stronger than the best upper limits set using initial LIGO data. In the vicinity of 100 Hz, the limits are a factor of between 1.2 and 3.5 above the predictions of the torque balance model, depending on inclination angle, if the most likely inclination angle of 44 degrees is assumed, they are within a factor of 1.7.
△ Less
Submitted 16 November, 2019; v1 submitted 9 June, 2017;
originally announced June 2017.
-
GW170104: Observation of a 50-Solar-Mass Binary Black Hole Coalescence at Redshift 0.2
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
B. P. Abbott,
R. Abbott,
T. D. Abbott,
F. Acernese,
K. Ackley,
C. Adams,
T. Adams,
P. Addesso,
R. X. Adhikari,
V. B. Adya,
C. Affeldt,
M. Afrough,
B. Agarwal,
M. Agathos,
K. Agatsuma,
N. Aggarwal,
O. D. Aguiar,
L. Aiello,
A. Ain,
P. Ajith,
B. Allen,
G. Allen,
A. Allocca
, et al. (1026 additional authors not shown)
Abstract:
We describe the observation of GW170104, a gravitational-wave signal produced by the coalescence of a pair of stellar-mass black holes. The signal was measured on January 4, 2017 at 10:11:58.6 UTC by the twin advanced detectors of the Laser Interferometer Gravitational-Wave Observatory during their second observing run, with a network signal-to-noise ratio of 13 and a false alarm rate less than 1…
▽ More
We describe the observation of GW170104, a gravitational-wave signal produced by the coalescence of a pair of stellar-mass black holes. The signal was measured on January 4, 2017 at 10:11:58.6 UTC by the twin advanced detectors of the Laser Interferometer Gravitational-Wave Observatory during their second observing run, with a network signal-to-noise ratio of 13 and a false alarm rate less than 1 in 70,000 years. The inferred component black hole masses are $31.2^{+8.4}_{-6.0}\,M_\odot$ and $19.4^{+5.3}_{-5.9}\,M_\odot$ (at the 90% credible level). The black hole spins are best constrained through measurement of the effective inspiral spin parameter, a mass-weighted combination of the spin components perpendicular to the orbital plane, $χ_\mathrm{eff} = -0.12^{+0.21}_{-0.30}.$ This result implies that spin configurations with both component spins positively aligned with the orbital angular momentum are disfavored. The source luminosity distance is $880^{+450}_{-390}~\mathrm{Mpc}$ corresponding to a redshift of $z = 0.18^{+0.08}_{-0.07}$. We constrain the magnitude of modifications to the gravitational-wave dispersion relation and perform null tests of general relativity. Assuming that gravitons are dispersed in vacuum like massive particles, we bound the graviton mass to $m_g \le 7.7 \times 10^{-23}~\mathrm{eV}/c^2$. In all cases, we find that GW170104 is consistent with general relativity.
△ Less
Submitted 23 October, 2018; v1 submitted 6 June, 2017;
originally announced June 2017.
-
Creating A Galactic Plane Atlas With Amazon Web Services
Authors:
G. Bruce Berriman,
Ewa Deelman,
John Good,
Gideon Juve,
Jamie Kinney,
Ann Merrihew,
Mats Rynge
Abstract:
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual clus…
▽ More
This paper describes by example how astronomers can use cloud-computing resources offered by Amazon Web Services (AWS) to create new datasets at scale. We have created from existing surveys an atlas of the Galactic Plane at 16 wavelengths from 1 μm to 24 μm with pixels co-registered at spatial sampling of 1 arcsec. We explain how open source tools support management and operation of a virtual cluster on AWS platforms to process data at scale, and describe the technical issues that users will need to consider, such as optimization of resources, resource costs, and management of virtual machine instances.
△ Less
Submitted 23 December, 2013;
originally announced December 2013.
-
A Tale Of 160 Scientists, Three Applications, A Workshop and A Cloud
Authors:
G. Bruce Berriman,
Carolyn Brinkworth,
Dawn Gelino,
Dennis K. Wittman,
Ewa Deelman,
Gideon Juve,
Mats Rynge,
Jamie Kinney
Abstract:
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the wo…
▽ More
The NASA Exoplanet Science Institute (NExScI) hosts the annual Sagan Workshops, thematic meetings aimed at introducing researchers to the latest tools and methodologies in exoplanet research. The theme of the Summer 2012 workshop, held from July 23 to July 27 at Caltech, was to explore the use of exoplanet light curves to study planetary system architectures and atmospheres. A major part of the workshop was to use hands-on sessions to instruct attendees in the use of three open source tools for the analysis of light curves, especially from the Kepler mission. Each hands-on session involved the 160 attendees using their laptops to follow step-by-step tutorials given by experts. We describe how we used the Amazon Elastic Cloud 2 to run these applications.
△ Less
Submitted 16 November, 2012;
originally announced November 2012.
-
Data Sharing Options for Scientific Workflows on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often s…
▽ More
Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon's EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to Astronomy: A Study of Cost and Performance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Gideon Juve,
Moira Regelson,
Peter Plavchan
Abstract:
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCS…
▽ More
Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance of processing data on the Amazon EC2 cloud with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and memory-limited applications than for I/O-bound applications. We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas will support the identification of periodic signals, including those due to transiting exoplanets, in the Kepler data sets.
△ Less
Submitted 22 October, 2010;
originally announced October 2010.
-
The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance
Authors:
G. Bruce Berriman,
Ewa Deelman,
Paul Groth,
Gideon Juve
Abstract:
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Ap…
▽ More
We have used the Montage image mosaic engine to investigate the cost and performance of processing images on the Amazon EC2 cloud, and to inform the requirements that higher-level products impose on provenance management technologies. We will present a detailed comparison of the performance of Montage on the cloud and on the Abe high performance cluster at the National Center for Supercomputing Applications (NCSA). Because Montage generates many intermediate products, we have used it to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with provenance management technologies such as the "Provenance Aware Service Oriented Architecture" (PASOA).
△ Less
Submitted 24 June, 2010;
originally announced June 2010.
-
Pipeline-Centric Provenance Model
Authors:
Paul Groth,
Ewa Deelman,
Gideon Juve,
Gaurang Mehta,
Bruce Berriman
Abstract:
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronom…
▽ More
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.
-
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Authors:
Joseph C. Jacob,
Daniel S. Katz,
G. Bruce Berriman,
John Good,
Anastasia C. Laity,
Ewa Deelman,
Carl Kesselman,
Gurmeet Singh,
Mei-Hui Su,
Thomas A. Prince,
Roy Williams
Abstract:
Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, locati…
▽ More
Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.
-
The Role of Provenance Management in Accelerating the Rate of Astronomical Research
Authors:
G. Bruce Berriman,
Ewa Deelman
Abstract:
The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenanc…
▽ More
The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenance; that is, records of the data and processes used in the creation of such products. We use the creation of image mosaics with the Montage grid-enabled mosaic engine to emphasize the necessity of provenance management and to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with one technology, the "Provenance Aware Service Oriented Architecture" (PASOA), that stores provenance information at each step in the computation of a mosaic. The results inform the technical specifications of provenance management systems, including the need for extensible systems built on common standards. Finally, we describe examples of provenance management technology emerging from the fields of geophysics and oceanography that have applicability to astronomy applications.
△ Less
Submitted 19 May, 2010;
originally announced May 2010.
-
Scientific Workflow Applications on Amazon EC2
Authors:
Gideon Juve,
Ewa Deelman,
Karan Vahi,
Gaurang Mehta,
Bruce Berriman,
Benjamin P. Berman,
Phil Maechling
Abstract:
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not…
▽ More
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.
△ Less
Submitted 15 May, 2010;
originally announced May 2010.
-
Metadata and provenance management
Authors:
Ewa Deelman,
Bruce Berriman,
Ann Chervenak,
Oscar Corcho,
Paul Groth,
Luc Moreau
Abstract:
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata an…
▽ More
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes.
△ Less
Submitted 14 May, 2010;
originally announced May 2010.