Skip to main content

Showing 1–25 of 25 results for author: Babuji, Y

  1. arXiv:2406.17710  [pdf, other

    cs.DC

    GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS

    Authors: Alok Kamatar, Valerie Hayot-Sasson, Yadu Babuji, Andre Bauer, Gourav Rattihalli, Ninad Hogade, Dejan Milojicic, Kyle Chard, Ian Foster

    Abstract: Application energy efficiency can be improved by executing each application component on the compute element that consumes the least energy while also satisfying time constraints. In principle, the function as a service (FaaS) paradigm should simplify such optimizations by abstracting away compute location, but existing FaaS systems do not provide for user transparency over application energy cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 10 figures

  2. arXiv:2403.19257  [pdf, other

    cs.DC

    UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving

    Authors: Yifei Li, Ryan Chard, Yadu Babuji, Kyle Chard, Ian Foster, Zhuozhao Li

    Abstract: Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 13 pages, 13 figures, IPDPS2024

  3. arXiv:2307.11060  [pdf, ps, other

    cs.SE

    The Changing Role of RSEs over the Lifetime of Parsl

    Authors: Daniel S. Katz, Ben Clifford, Yadu Babuji, Kevin Hunter Kesling, Anna Woodard, Kyle Chard

    Abstract: This position paper describes the Parsl open source research software project and its various phases over seven years. It defines four types of research software engineers (RSEs) who have been important to the project in those phases; we believe this is also applicable to other research software projects.

    Submitted 20 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: 3 pages

  4. arXiv:2304.14244  [pdf, other

    cs.DC

    Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

    Authors: Nicholson Collier, Justin M. Wozniak, Abby Stevens, Yadu Babuji, Mickaël Binois, Arindam Fadikar, Alexandra Würth, Kyle Chard, Jonathan Ozik

    Abstract: COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas includ… ▽ More

    Submitted 10 May, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

  5. Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources

    Authors: Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster

    Abstract: Applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  6. funcX: Federated Function as a Service for Science

    Authors: Zhuozhao Li, Ryan Chard, Yadu Babuji, Ben Galewsky, Tyler Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian Foster, Kyle Chard

    Abstract: funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and superc… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2005.04215

  7. Extended Abstract: Productive Parallel Programming with Parsl

    Authors: Kyle Chard, Yadu Babuji, Anna Woodard, Ben Clifford, Zhuozhao Li, Mihael Hategan, Ian Foster, Mike Wilde, Daniel S. Katz

    Abstract: Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together… ▽ More

    Submitted 4 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Journal ref: ACM SIGAda Ada Letters 40 (2), 73-75, 2020

  8. arXiv:2110.02827  [pdf, other

    cs.DC cond-mat.mtrl-sci cs.LG

    Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

    Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster

    Abstract: Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: camera-ready version for ML in HPC Environments 2021

  9. Extreme Scale Survey Simulation with Python Workflows

    Authors: A. S. Villarreal, Yadu Babuji, Tom Uram, Daniel S. Katz, Kyle Chard, Katrin Heitmann

    Abstract: The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will soon carry out an unprecedented wide, fast, and deep survey of the sky in multiple optical bands. The data from LSST will open up a new discovery space in astronomy and cosmology, simultaneously providing clues toward addressing burning issues of the day, such as the origin of dark energy and and the nature of dark matter, w… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Proceeding for eScience 2021, 9 pages, 5 figures

  10. arXiv:2108.13521  [pdf, other

    cs.DC

    ExaWorks: Workflows for Exascale

    Authors: Aymen Al-Saadi, Dong H. Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Justin M. Wozniak

    Abstract: Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms.… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  11. arXiv:2106.07036  [pdf, other

    q-bio.BM cs.LG

    Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening

    Authors: Austin Clyde, Thomas Brettin, Alexander Partin, Hyunseung Yoo, Yadu Babuji, Ben Blaiszik, Andre Merzky, Matteo Turilli, Shantenu Jha, Arvind Ramanathan, Rick Stevens

    Abstract: We propose a benchmark to study surrogate model accuracy for protein-ligand docking. We share a dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. Our work shows surrogate docking models have six orders of magnitude more throughput than standa… ▽ More

    Submitted 30 June, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

  12. Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Tainã Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

    Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  13. Workflows Community Summit: Bringing the Scientific Workflows Community Together

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Dan Laney, Dong Ahn, Shantenu Jha, Carole Goble, Lavanya Ramakrishnan, Luc Peterson, Bjoern Enders, Douglas Thain, Ilkay Altintas, Yadu Babuji, Rosa M. Badia, Vivien Bonazzi, Taina Coleman, Michael Crusoe, Ewa Deelman, Frank Di Natale, Paolo Di Tommaso, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Alex Ganose, Bjorn Gruning , et al. (20 additional authors not shown)

    Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  14. arXiv:2101.04855  [pdf, other

    astro-ph.CO astro-ph.IM

    DESC DC2 Data Release Note

    Authors: LSST Dark Energy Science Collaboration, Bela Abolfathi, Robert Armstrong, Humna Awan, Yadu N. Babuji, Franz Erik Bauer, George Beckett, Rahul Biswas, Joanne R. Bogart, Dominique Boutigny, Kyle Chard, James Chiang, Johann Cohen-Tanugi, Andrew J. Connolly, Scott F. Daniel, Seth W. Digel, Alex Drlica-Wagner, Richard Dubois, Eric Gawiser, Thomas Glanzman, Salman Habib, Andrew P. Hearin, Katrin Heitmann, Fabio Hernandez, Renée Hložek , et al. (32 additional authors not shown)

    Abstract: In preparation for cosmological analyses of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), the LSST Dark Energy Science Collaboration (LSST DESC) has created a 300 deg$^2$ simulated survey as part of an effort called Data Challenge 2 (DC2). The DC2 simulated sky survey, in six optical bands with observations following a reference LSST observing cadence, was processed with th… ▽ More

    Submitted 13 June, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: 25 pages, 3 figures; 9 tables. A detailed changelog can be found in Appendix A. To obtain data, visit the DESC Data Portal at https://data.lsstdesc.org/

  15. arXiv:2010.06574  [pdf, other

    cs.DC cs.CE q-bio.QM

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Authors: Aymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin , et al. (11 additional authors not shown)

    Abstract: The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  16. arXiv:2010.05926  [pdf, other

    astro-ph.IM astro-ph.CO

    The LSST DESC DC2 Simulated Sky Survey

    Authors: LSST Dark Energy Science Collaboration, Bela Abolfathi, David Alonso, Robert Armstrong, Éric Aubourg, Humna Awan, Yadu N. Babuji, Franz Erik Bauer, Rachel Bean, George Beckett, Rahul Biswas, Joanne R. Bogart, Dominique Boutigny, Kyle Chard, James Chiang, Chuck F. Claver, Johann Cohen-Tanugi, Céline Combet, Andrew J. Connolly, Scott F. Daniel, Seth W. Digel, Alex Drlica-Wagner, Richard Dubois, Emmanuel Gangler, Eric Gawiser , et al. (55 additional authors not shown)

    Abstract: We describe the simulated sky survey underlying the second data challenge (DC2) carried out in preparation for analysis of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) by the LSST Dark Energy Science Collaboration (LSST DESC). Significant connections across multiple science domains will be a hallmark of LSST; the DC2 program represents a unique modeling effort that stresses… ▽ More

    Submitted 26 January, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 39 pages, 19 figures, version accepted for publication in ApJS

  17. arXiv:2006.02431  [pdf, other

    q-bio.BM cs.LG q-bio.QM stat.ML

    Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

    Authors: Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

    Abstract: Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort,… ▽ More

    Submitted 27 May, 2020; originally announced June 2020.

    Comments: 11 pages, 5 figures

  18. funcX: A Federated Function Serving Fabric for Science

    Authors: Ryan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, Kyle Chard

    Abstract: Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are ava… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.04907

  19. arXiv:1908.04907  [pdf, other

    cs.DC

    Serverless Supercomputing: High Performance Function as a Service for Science

    Authors: Ryan Chard, Tyler J. Skluzacek, Zhuozhao Li, Yadu Babuji, Anna Woodard, Ben Blaiszik, Steven Tuecke, Ian Foster, Kyle Chard

    Abstract: Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), or be offloaded to special… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.

  20. Parsl: Pervasive Parallel Programming in Python

    Authors: Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde, Kyle Chard

    Abstract: High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking h… ▽ More

    Submitted 17 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

  21. arXiv:1811.11213  [pdf, other

    cs.LG cs.DC stat.ML

    DLHub: Model and Data Serving for Science

    Authors: Ryan Chard, Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steve Tuecke, Ben Blaiszik, Michael J. Franklin, Ian Foster

    Abstract: While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and ser… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

    Comments: 10 pages, 8 figures, conference paper

  22. arXiv:1705.00070  [pdf, other

    cs.DC

    Enabling Interactive Analytics of Secure Data using Cloud Kotta

    Authors: Yadu N. Babuji, Kyle Chard, Eamon Duede

    Abstract: Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a… ▽ More

    Submitted 28 April, 2017; originally announced May 2017.

    Comments: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 pages

  23. arXiv:1610.03108  [pdf, other

    cs.DC

    Cloud Kotta: Enabling Secure and Scalable Data Analytics in the Cloud

    Authors: Yadu N. Babuji, Kyle Chard, Aaron Gerow, Eamon Duede

    Abstract: Distributed communities of researchers rely increasingly on valuable, proprietary, or sensitive datasets. Given the growth of such data, especially in fields new to data-driven, computationally intensive research like the social sciences and humanities, coupled with what are often strict and complex data-use agreements, many research communities now require methods that allow secure, scalable and… ▽ More

    Submitted 18 October, 2016; v1 submitted 10 October, 2016; originally announced October 2016.

    Comments: A version of this paper is forthcoming at BigData 2016

  24. arXiv:1610.03105  [pdf, other

    cs.DC

    A Secure Data Enclave and Analytics Platform for Social Scientists

    Authors: Yadu N. Babuji, Kyle Chard, Aaron Gerow, Eamon Duede

    Abstract: Data-driven research is increasingly ubiquitous and data itself is a defining asset for researchers, particularly in the computational social sciences and humanities. Entire careers and research communities are built around valuable, proprietary or sensitive datasets. However, many existing computation resources fail to support secure and cost-effective storage of data while also enabling secure a… ▽ More

    Submitted 10 October, 2016; originally announced October 2016.

    Comments: Forthcoming eScience 2016

  25. Evaluating Distributed Execution of Workloads

    Authors: Matteo Turilli, Yadu Nand Babuji, Andre Merzky, Ming Tai Ha, Michael Wilde, Daniel S. Katz, Shantenu Jha

    Abstract: Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation diffic… ▽ More

    Submitted 2 November, 2021; v1 submitted 31 May, 2016; originally announced May 2016.