-
The Future of Astronomical Data Infrastructure: Meeting Report
Authors:
Michael R. Blanton,
Janet D. Evans,
Dara Norman,
William O'Mullane,
Adrian Price-Whelan,
Luca Rizzi,
Alberto Accomazzi,
Megan Ansdell,
Stephen Bailey,
Paul Barrett,
Steven Berukoff,
Adam Bolton,
Julian Borrill,
Kelle Cruz,
Julianne Dalcanton,
Vandana Desai,
Gregory P. Dubois-Felsmann,
Frossie Economou,
Henry Ferguson,
Bryan Field,
Dan Foreman-Mackey,
Jaime Forero-Romero,
Niall Gaffney,
Kim Gillies,
Matthew J. Graham
, et al. (47 additional authors not shown)
Abstract:
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and productio…
▽ More
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and production of both astronomical data and software. This report is the culmination of a workshop held in February 2023 on the Future of Astronomical Data Infrastructure. Attended by 70 scientists and software professionals from ground-based and space-based missions and archives spanning the entire spectrum of astronomical research, the group deliberated on the prevailing state of software and data infrastructure in astronomy, identified pressing issues, and explored potential solutions. In this report, we describe the ecosystem of astronomical data, its existing flaws, and the many gaps, duplication, inconsistencies, barriers to access, drags on productivity, missed opportunities, and risks to the long-term integrity of essential data sets. We also highlight the successes and failures in a set of deep dives into several different illustrative components of the ecosystem, included as an appendix.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Toward Enabling Reproducibility for Data-Intensive Research using the Whole Tale Platform
Authors:
Kyle Chard,
Niall Gaffney,
Mihael Hategan,
Kacper Kowalik,
Bertram Ludaescher,
Timothy McPhillips,
Jarek Nabrzyski,
Victoria Stodden,
Ian Taylor,
Thomas Thelen,
Matthew J. Turk,
Craig Willis
Abstract:
Whole Tale http://wholetale.org is a web-based, open-source platform for reproducible research supporting the creation, sharing, execution, and verification of "Tales" for the scientific research community. Tales are executable research objects that capture the code, data, and environment along with narrative and workflow information needed to re-create computational results from scientific studie…
▽ More
Whole Tale http://wholetale.org is a web-based, open-source platform for reproducible research supporting the creation, sharing, execution, and verification of "Tales" for the scientific research community. Tales are executable research objects that capture the code, data, and environment along with narrative and workflow information needed to re-create computational results from scientific studies. Creating reproducible research objects that enable reproducibility, transparency, and re-execution for computational experiments requiring significant compute resources or utilizing massive data is an especially challenging open problem. We describe opportunities, challenges, and solutions to facilitating reproducibility for data- and compute-intensive research, that we call "Tales at Scale," using the Whole Tale computing platform. We highlight challenges and solutions in frontend responsiveness needs, gaps in current middleware design and implementation, network restrictions, containerization, and data access. Finally, we discuss challenges in packaging computational experiment implementations for portable data-intensive Tales and outline future work.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
A data-driven method for quantifying the impact of a genetic circuit on its host
Authors:
Aqib Hasnain,
Subhrajit Sinha,
Yuval Dorfan,
Amin Espah Borujeni,
Yongjin Park,
Paul Maschhoff,
Uma Saxena,
Joshua Urrutia,
Niall Gaffney,
Diveena Becker,
Atsede Siba,
Narendra Maheshri,
Ben Gordon,
Chris Voigt,
Enoch Yeung
Abstract:
Genetic circuits are designed to implement certain logic in living cells, keeping burden on the host cell minimal. However, manipulating the genome often will have a significant impact for various reasons (usage of the cell machinery to express new genes, toxicity of genes, interactions with native genes, etc.). In this work we utilize Koopman operator theory to construct data-driven models of tra…
▽ More
Genetic circuits are designed to implement certain logic in living cells, keeping burden on the host cell minimal. However, manipulating the genome often will have a significant impact for various reasons (usage of the cell machinery to express new genes, toxicity of genes, interactions with native genes, etc.). In this work we utilize Koopman operator theory to construct data-driven models of transcriptomic-level dynamics from noisy and temporally sparse RNAseq measurements. We show how Koopman models can be used to quantify impact on genetic circuits. We consider an experimental example, using high-throughput RNAseq measurements collected from wild-type E. coli, single gate components transformed in E. coli, and a NAND circuit composed from individual gates in E. coli, to explore how Koopman subspace functions encode increasing circuit interference on E. coli chassis dynamics. The algorithm provides a novel method for quantifying the impact of synthetic biological circuits on host-chassis dynamics.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
The demise of the filesystem and multi level service architecture
Authors:
William O'Mullane,
Niall Gaffney,
Frossie Economou,
Arfon M. Smith,
J. Ross Thomson,
Tim Jenness
Abstract:
Many astronomy data centres still work on filesystems. Industry has moved on; current practice in computing infrastructure is to achieve Big Data scalability using object stores rather than POSIX file systems. This presents us with opportunities for portability and reuse of software underlying processing and archive systems but it also causes problems for legacy implementations in current data cen…
▽ More
Many astronomy data centres still work on filesystems. Industry has moved on; current practice in computing infrastructure is to achieve Big Data scalability using object stores rather than POSIX file systems. This presents us with opportunities for portability and reuse of software underlying processing and archive systems but it also causes problems for legacy implementations in current data centers.
△ Less
Submitted 31 July, 2019; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Petabytes to Science
Authors:
Amanda E. Bauer,
Eric C. Bellm,
Adam S. Bolton,
Surajit Chaudhuri,
A. J. Connolly,
Kelle L. Cruz,
Vandana Desai,
Alex Drlica-Wagner,
Frossie Economou,
Niall Gaffney,
J. Kavelaars,
J. Kinney,
Ting S. Li,
B. Lundgren,
R. Margutti,
G. Narayan,
B. Nord,
Dara J. Norman,
W. O'Mullane,
S. Padhi,
J. E. G. Peek,
C. Schafer,
Megan E. Schwamb,
Arfon M. Smith,
Erik J. Tollerud
, et al. (2 additional authors not shown)
Abstract:
A Kavli foundation sponsored workshop on the theme \emph{Petabytes to Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas. The aim of the this workshop was to discuss important trends and technologies which may support astronomy. We also tackled how to better shape the workforce for the new trends and how we should approach education and public outreach. This document was coauth…
▽ More
A Kavli foundation sponsored workshop on the theme \emph{Petabytes to Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas. The aim of the this workshop was to discuss important trends and technologies which may support astronomy. We also tackled how to better shape the workforce for the new trends and how we should approach education and public outreach. This document was coauthored during the workshop and edited in the weeks after. It comprises the discussions and highlights many recommendations which came out of the workshop.
We shall distill parts of this document and formulate potential white papers for the decadal survey.
△ Less
Submitted 17 November, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
FanStore: Enabling Efficient and Scalable I/O for Distributed Deep Learning
Authors:
Zhao Zhang,
Lei Huang,
Uri Manor,
Linjing Fang,
Gabriele Merlo,
Craig Michoski,
John Cazes,
Niall Gaffney
Abstract:
Emerging Deep Learning (DL) applications introduce heavy I/O workloads on computer clusters. The inherent long lasting, repeated, and random file access pattern can easily saturate the metadata and data service and negatively impact other users. In this paper, we present FanStore, a transient runtime file system that optimizes DL I/O on existing hardware/software stacks. FanStore distributes datas…
▽ More
Emerging Deep Learning (DL) applications introduce heavy I/O workloads on computer clusters. The inherent long lasting, repeated, and random file access pattern can easily saturate the metadata and data service and negatively impact other users. In this paper, we present FanStore, a transient runtime file system that optimizes DL I/O on existing hardware/software stacks. FanStore distributes datasets to the local storage of compute nodes, and maintains a global namespace. With the techniques of system call interception, distributed metadata management, and generic data compression, FanStore provides a POSIX-compliant interface with native hardware throughput in an efficient and scalable manner. Users do not have to make intrusive code changes to use FanStore and take advantage of the optimized I/O. Our experiments with benchmarks and real applications show that FanStore can scale DL training to 512 compute nodes with over 90\% scaling efficiency.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Computing Environments for Reproducibility: Capturing the "Whole Tale"
Authors:
Adam Brinckman,
Kyle Chard,
Niall Gaffney,
Mihael Hategan,
Matthew B. Jones,
Kacper Kowalik,
Sivakumar Kulasekaran,
Bertram Ludäscher,
Bryce D. Mecum,
Jarek Nabrzyski,
Victoria Stodden,
Ian J. Taylor,
Matthew J. Turk,
Kandace Turner
Abstract:
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale p…
▽ More
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process--transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create "living publications" or "tales". The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments
Authors:
Bertram Ludaescher,
Kyle Chard,
Niall Gaffney,
Matthew B. Jones,
Jaroslaw Nabrzyski,
Victoria Stodden,
Matthew Turk
Abstract:
We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research public…
▽ More
We present an overview of the recently funded "Merging Science and Cyberinfrastructure Pathways: The Whole Tale" project (NSF award #1541450). Our approach has two nested goals: 1) deliver an environment that enables researchers to create a complete narrative of the research process including exposure of the data-to-publication lifecycle, and 2) systematically and persistently link research publications to their associated digital scholarly objects such as the data, code, and workflows. To enable this, Whole Tale will create an environment where researchers can collaborate on data, workspaces, and workflows and then publish them for future adoption or modification. Published data and applications will be consumed either directly by users using the Whole Tale environment or can be integrated into existing or future domain Science Gateways.
△ Less
Submitted 28 October, 2016;
originally announced October 2016.
-
Femtosecond X-ray Laser induced transient electronic phase change observed in fullerene C60
Authors:
Brian Abbey,
Ruben A. Dilanian,
Connie Darmanin,
Rebecca A. Ryan,
Corey T. Putkunz,
Andrew V. Martin,
Victor Streltsov,
Michael W. M. Jones,
Naylyn Gaffney,
Felix Hofmann,
Garth J. Williams,
Sebastian Boutet,
Marc Messerschmidt,
M. Marvin Siebert,
Sophie Williams,
Evan Curwood,
Eugeniu Balaur,
Andrew G. Peele,
Keith A. Nugent,
Harry M. Quiney
Abstract:
X-ray Free-Electron Lasers (XFELs) deliver X-ray pulses with a coherent flux that is approximately eight orders of magnitude greater than that available from a modern third generation synchrotron source. The power density in an XFEL pulse may be so high that it can modify the electronic properties of a sample on a femtosecond timescale. Exploration of the interaction of intense coherent X-ray puls…
▽ More
X-ray Free-Electron Lasers (XFELs) deliver X-ray pulses with a coherent flux that is approximately eight orders of magnitude greater than that available from a modern third generation synchrotron source. The power density in an XFEL pulse may be so high that it can modify the electronic properties of a sample on a femtosecond timescale. Exploration of the interaction of intense coherent X-ray pulses and matter is of both intrinsic scientific interest, and of critical importance to the interpretation of experiments that probe the structures of materials using high-brightness femtosecond XFEL pulses. In this letter, we report observations of the diffraction of an extremely intense 32 fs nanofocused X-ray pulses by a powder sample of crystalline C60. We find that the diffraction pattern at the highest available incident power exhibits significant structural signatures that are absent in data obtained at both third-generation synchrotron sources or from XFEL sources operating at low output power. These signatures are consistent with a highly ordered structure that does not correspond with any previously known phase of crystalline C60. We argue that these data indicate the formation of a transient phase that is formed by a dynamic electronic distortion induced by inner-shell ionisation of at least one carbon atom in each C60 molecule.
△ Less
Submitted 24 September, 2012;
originally announced September 2012.
-
Ten Year Review of Queue Scheduling of the Hobby-Eberly Telescope
Authors:
Matthew Shetrone,
Mark E. Cornell,
James R. Fowler,
Niall Gaffney,
Benjamin Laws,
Jeff Mader,
Cloud Mason,
Stephen Odewahn,
Brian Roman,
Sergey Rostopchin,
Donald P. Schneider,
James Umbarger,
Amy Westfall
Abstract:
This paper presents a summary of the first 10 years of operating the Hobby-Eberly Telescope (HET) in queue mode. The scheduling can be quite complex but has worked effectively for obtaining the most science possible with this uniquely designed telescope. The queue must handle dozens of separate scientific programs, the involvement of a number of institutions with individual Telescope Allocation…
▽ More
This paper presents a summary of the first 10 years of operating the Hobby-Eberly Telescope (HET) in queue mode. The scheduling can be quite complex but has worked effectively for obtaining the most science possible with this uniquely designed telescope. The queue must handle dozens of separate scientific programs, the involvement of a number of institutions with individual Telescope Allocation Committees as well as engineering and instrument commissioning. We have continuously revised our queue operations as we have learned from experience. The flexibility of the queue and the simultaneous availability of three instruments, along with a staff trained for all aspects of telescope and instrumentation operation, have allowed optimum use to be made of variable weather conditions and have proven to be especially effective at accommodating targets of opportunity and engineering tasks. In this paper we review the methodology of the HET queue along with its strengths and weaknesses.
△ Less
Submitted 26 May, 2007;
originally announced May 2007.