-
Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels
Authors:
Ka Hei Martin Kwok,
Matti Kortelainen,
Giuseppe Cerati,
Alexei Strelchenko,
Oliver Gutsche,
Allison Reinsvold Hall,
Steve Lantz,
Michael Reid,
Daniel Riley,
Sophie Berkman,
Seyong Lee,
Hammad Ather,
Boyana Norris,
Cong Wang
Abstract:
Next generation High-Energy Physics (HEP) experiments are presented with significant computational challenges, both in terms of data volume and processing power. Using compute accelerators, such as GPUs, is one of the promising ways to provide the necessary computational power to meet the challenge. The current programming models for compute accelerators often involve using architecture-specific p…
▽ More
Next generation High-Energy Physics (HEP) experiments are presented with significant computational challenges, both in terms of data volume and processing power. Using compute accelerators, such as GPUs, is one of the promising ways to provide the necessary computational power to meet the challenge. The current programming models for compute accelerators often involve using architecture-specific programming languages promoted by the hardware vendors and hence limit the set of platforms that the code can run on. Developing software with platform restrictions is especially unfeasible for HEP communities as it takes significant effort to convert typical HEP algorithms into ones that are efficient for compute accelerators. Multiple performance portability solutions have recently emerged and provide an alternative path for using compute accelerators, which allow the code to be executed on hardware from different vendors. We apply several portability solutions, such as Kokkos, SYCL, C++17 std::execution::par and Alpaka, on two mini-apps extracted from the mkFit project: p2z and p2r. These apps include basic kernels for a Kalman filter track fit, such as propagation and update of track parameters, for detectors at a fixed z or fixed r position, respectively. The two mini-apps explore different memory layout formats.
We report on the development experience with different portability solutions, as well as their performance on GPUs and many-core CPUs, measured as the throughput of the kernels from different GPU and CPU vendors such as NVIDIA, AMD and Intel.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
The U.S. CMS HL-LHC R&D Strategic Plan
Authors:
Oliver Gutsche,
Tulika Bose,
Margaret Votava,
David Mason,
Andrew Melo,
Mia Liu,
Dirk Hufnagel,
Lindsey Gray,
Mike Hildreth,
Burt Holzman,
Kevin Lannon,
Saba Sehrish,
David Sperka,
James Letts,
Lothar Bauerdick,
Kenneth Bloom
Abstract:
The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the…
▽ More
The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.
△ Less
Submitted 4 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
A Ceph S3 Object Data Store for HEP
Authors:
Nick Smith,
Bo Jayatilaka,
David Mason,
Oliver Gutsche,
Alison Peisker,
Robert Illingworth,
Chris Jones
Abstract:
We present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize metadata volume and maximize data processing parallelism. Performance benchmarks of data storage and retrieval are presented.
We present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize metadata volume and maximize data processing parallelism. Performance benchmarks of data storage and retrieval are presented.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
Authors:
Mohammad Atif,
Meghna Battacharya,
Paolo Calafiura,
Taylor Childers,
Mark Dewing,
Zhihua Dong,
Oliver Gutsche,
Salman Habib,
Kyle Knoepfel,
Matti Kortelainen,
Ka Hei Martin Kwok,
Charles Leggett,
Meifeng Lin,
Vincent Pascuzzi,
Alexei Strelchenko,
Vakhtang Tsulaia,
Brett Viren,
Tianle Wang,
Beomki Yeo,
Haiwang Yu
Abstract:
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with t…
▽ More
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Detector R&D needs for the next generation $e^+e^-$ collider
Authors:
A. Apresyan,
M. Artuso,
J. Brau,
H. Chen,
M. Demarteau,
Z. Demiragli,
S. Eno,
J. Gonski,
P. Grannis,
H. Gray,
O. Gutsche,
C. Haber,
M. Hohlmann,
J. Hirschauer,
G. Iakovidis,
K. Jakobs,
A. J. Lankford,
C. Pena,
S. Rajagopalan,
J. Strube,
C. Tully,
C. Vernieri,
A. White,
G. W. Wilson,
S. Xie
, et al. (3 additional authors not shown)
Abstract:
The 2021 Snowmass Energy Frontier panel wrote in its final report "The realization of a Higgs factory will require an immediate, vigorous and targeted detector R&D program". Both linear and circular $e^+e^-$ collider efforts have developed a conceptual design for their detectors and are aggressively pursuing a path to formalize these detector concepts. The U.S. has world-class expertise in particl…
▽ More
The 2021 Snowmass Energy Frontier panel wrote in its final report "The realization of a Higgs factory will require an immediate, vigorous and targeted detector R&D program". Both linear and circular $e^+e^-$ collider efforts have developed a conceptual design for their detectors and are aggressively pursuing a path to formalize these detector concepts. The U.S. has world-class expertise in particle detectors, and is eager to play a leading role in the next generation $e^+e^-$ collider, currently slated to become operational in the 2040s. It is urgent that the U.S. organize its efforts to provide leadership and make significant contributions in detector R&D. These investments are necessary to build and retain the U.S. expertise in detector R&D and future projects, enable significant contributions during the construction phase and maintain its leadership in the Energy Frontier regardless of the choice of the collider project. In this document, we discuss areas where the U.S. can and must play a leading role in the conceptual design and R&D for detectors for $e^+e^-$ colliders.
△ Less
Submitted 26 June, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
The Future of High Energy Physics Software and Computing
Authors:
V. Daniel Elvira,
Steven Gottlieb,
Oliver Gutsche,
Benjamin Nachman,
S. Bailey,
W. Bhimji,
P. Boyle,
G. Cerati,
M. Carrasco Kind,
K. Cranmer,
G. Davies,
V. D. Elvira,
R. Gardner,
K. Heitmann,
M. Hildreth,
W. Hopkins,
T. Humble,
M. Lin,
P. Onyisi,
J. Qiang,
K. Pedro,
G. Perdue,
A. Roberts,
M. Savage,
P. Shanahan
, et al. (3 additional authors not shown)
Abstract:
Software and Computing (S&C) are essential to all High Energy Physics (HEP) experiments and many theoretical studies. The size and complexity of S&C are now commensurate with that of experimental instruments, playing a critical role in experimental design, data acquisition/instrumental control, reconstruction, and analysis. Furthermore, S&C often plays a leading role in driving the precision of th…
▽ More
Software and Computing (S&C) are essential to all High Energy Physics (HEP) experiments and many theoretical studies. The size and complexity of S&C are now commensurate with that of experimental instruments, playing a critical role in experimental design, data acquisition/instrumental control, reconstruction, and analysis. Furthermore, S&C often plays a leading role in driving the precision of theoretical calculations and simulations. Within this central role in HEP, S&C has been immensely successful over the last decade. This report looks forward to the next decade and beyond, in the context of the 2021 Particle Physics Community Planning Exercise ("Snowmass") organized by the Division of Particles and Fields (DPF) of the American Physical Society.
△ Less
Submitted 8 November, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Snowmass Computational Frontier: Topical Group Report on Experimental Algorithm Parallelization
Authors:
G. Cerati,
K. Heitmann,
W. Hopkins,
J. Bennett,
T. Y. Chen,
V. V. Gligorov,
O. Gutsche,
S. Habib,
M. Kortelainen,
C. Leggett,
R. Mandelbaum,
N. Whitehorn,
M. Williams
Abstract:
The substantial increase in data volume and complexity expected from future experiments will require significant investment to prepare experimental algorithms. These algorithms include physics object reconstruction, calibrations, and processing of observational data. In addition, the changing computing architecture landscape, which will be primarily composed of heterogeneous resources, will contin…
▽ More
The substantial increase in data volume and complexity expected from future experiments will require significant investment to prepare experimental algorithms. These algorithms include physics object reconstruction, calibrations, and processing of observational data. In addition, the changing computing architecture landscape, which will be primarily composed of heterogeneous resources, will continue to pose major challenges with regard to algorithmic migration. Portable tools need to be developed that can be shared among the frontiers (e.g., for code execution on different platforms) and opportunities, such as forums or cross-experimental working groups, need to be provided where experiences and lessons learned can be shared between experiments and frontiers. At the same time, individual experiments also need to invest considerable resources to develop algorithms unique to their needs (e.g., for facilities dedicated to the experiment), and ensure that their specific algorithms will be able to efficiently exploit external heterogeneous computing facilities. Common software tools represent a cost-effective solution, providing ready-to-use software solutions as well as a platform for R\&D work. These are particularly important for small experiments which typically do not have dedicated resources needed to face the challenges imposed by the evolving computing technologies. Workforce development is a key concern across frontiers and experiments, and additional support is needed to provide career opportunities for researchers working in the field of experimental algorithm development. Finally, cross-discipline collaborations going beyond high-energy physics are a key ingredient to address the challenges ahead and more support for such collaborations needs to be created. This report targets future experiments, observations and experimental algorithm development for the next 10-15 years.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Portability: A Necessary Approach for Future Scientific Software
Authors:
Meghna Bhattacharya,
Paolo Calafiura,
Taylor Childers,
Mark Dewing,
Zhihua Dong,
Oliver Gutsche,
Salman Habib,
Xiangyang Ju,
Michael Kirby,
Kyle Knoepfel,
Matti Kortelainen,
Martin Kwok,
Charles Leggett,
Meifeng Lin,
Vincent R. Pascuzzi,
Alexei Strelchenko,
Brett Viren,
Beomki Yeo,
Haiwang Yu
Abstract:
Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High Energy Physics Center for Computational Excellence (HEP/CCE) is investigating solutions for portability techniques that will allow the coding of an algorithm once,…
▽ More
Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High Energy Physics Center for Computational Excellence (HEP/CCE) is investigating solutions for portability techniques that will allow the coding of an algorithm once, and the ability to execute it on a variety of hardware products from many vendors, especially including accelerators. We think without these solutions, the scientific success of our experiments and endeavors is in danger, as software development could be expert driven and costly to be able to run on available hardware infrastructure. We think the best solution for the community would be an extension to the C++ standard with a very low entry bar for users, supporting all hardware forms and vendors. We are very far from that ideal though. We argue that in the future, as a community, we need to request and work on portability solutions and strive to reach this ideal.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Coffea -- Columnar Object Framework For Effective Analysis
Authors:
Nicholas Smith,
Lindsey Gray,
Matteo Cremonesi,
Bo Jayatilaka,
Oliver Gutsche,
Allison Hall,
Kevin Pedro,
Maria Acosta,
Andrew Melo,
Stefano Belforte,
Jim Pivarski
Abstract:
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes…
▽ More
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.
△ Less
Submitted 6 August, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Response to NITRD, NCO, NSF Request for Information on "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan"
Authors:
J. Amundson,
J. Annis,
C. Avestruz,
D. Bowring,
J. Caldeira,
G. Cerati,
C. Chang,
S. Dodelson,
D. Elvira,
A. Farahi,
K. Genser,
L. Gray,
O. Gutsche,
P. Harris,
J. Kinney,
J. B. Kowalkowski,
R. Kutschke,
S. Mrenna,
B. Nord,
A. Para,
K. Pedro,
G. N. Perdue,
A. Scheinker,
P. Spentzouris,
J. St. John
, et al. (5 additional authors not shown)
Abstract:
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspect…
▽ More
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
New Technologies for Discovery
Authors:
Z. Ahmed,
A. Apresyan,
M. Artuso,
P. Barry,
E. Bielejec,
F. Blaszczyk,
T. Bose,
D. Braga,
S. A. Charlebois,
A. Chatterjee,
A. Chavarria,
H. -M. Cho,
S. Dalla Torre,
M. Demarteau,
D. Denisov,
M. Diefenthaler,
A. Dragone,
F. Fahim,
C. Gee,
S. Habib,
G. Haller,
J. Hogan,
B. J. P. Jones,
M. Garcia-Sciveres,
G. Giacomini
, et al. (58 additional authors not shown)
Abstract:
For the field of high energy physics to continue to have a bright future, priority within the field must be given to investments in the development of both evolutionary and transformational detector development that is coordinated across the national laboratories and with the university community, international partners and other disciplines. While the fundamental science questions addressed by hi…
▽ More
For the field of high energy physics to continue to have a bright future, priority within the field must be given to investments in the development of both evolutionary and transformational detector development that is coordinated across the national laboratories and with the university community, international partners and other disciplines. While the fundamental science questions addressed by high energy physics have never been more compelling, there is acute awareness of the challenging budgetary and technical constraints when scaling current technologies. Furthermore, many technologies are reaching their sensitivity limit and new approaches need to be developed to overcome the currently irreducible technological challenges. This situation is unfolding against a backdrop of declining funding for instrumentation, both at the national laboratories and in particular at the universities. This trend has to be reversed for the country to continue to play a leadership role in particle physics, especially in this most promising era of imminent new discoveries that could finally break the hugely successful, but limited, Standard Model of fundamental particle interactions. In this challenging environment it is essential that the community invest anew in instrumentation and optimize the use of the available resources to develop new innovative, cost-effective instrumentation, as this is our best hope to successfully accomplish the mission of high energy physics. This report summarizes the current status of instrumentation for high energy physics, the challenges and needs of future experiments and indicates high priority research areas.
△ Less
Submitted 10 August, 2019; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Using Big Data Technologies for HEP Analysis
Authors:
Matteo Cremonesi,
Claudio Bellini,
Bianny Bian,
Luca Canali,
Vasileios Dimakopoulos,
Peter Elmer,
Ian Fisk,
Maria Girone,
Oliver Gutsche,
Siew-Yan Hoh,
Bo Jayatilaka,
Viktor Khristenko,
Andrea Luiselli,
Andrew Melo,
Evangelos Evangelos,
Dominick Olivito,
Jacopo Pazzini,
Jim Pivarski,
Alexey Svyatkovskiy,
Marco Zanetti
Abstract:
The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches…
▽ More
The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother.
In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis.
The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow.
△ Less
Submitted 21 January, 2019;
originally announced January 2019.
-
HEP Software Foundation Community White Paper Working Group -- Data Organization, Management and Access (DOMA)
Authors:
Dario Berzano,
Riccardo Maria Bianchi,
Ian Bird,
Brian Bockelman,
Simone Campana,
Kaushik De,
Dirk Duellmann,
Peter Elmer,
Robert Gardner,
Vincent Garonne,
Claudio Grandi,
Oliver Gutsche,
Andrew Hanushevsky,
Burt Holzman,
Bodhitha Jayatilaka,
Ivo Jimenez,
Michel Jouvin,
Oliver Keeble,
Alexei Klimentov,
Valentin Kuznetsov,
Eric Lancon,
Mario Lassnig,
Miron Livny,
Carlos Maltzahn,
Shawn McKee
, et al. (13 additional authors not shown)
Abstract:
Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess th…
▽ More
Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess these changes is also proposed.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
HEP Software Foundation Community White Paper Working Group --- Visualization
Authors:
Matthew Bellis,
Riccardo Maria Bianchi,
Sebastien Binet,
Ciril Bohak,
Benjamin Couturier,
Hadrien Grasland,
Oliver Gutsche,
Sergey Linev,
Alex Martyniuk,
Thomas McCauley,
Edward Moyse,
Alja Mrak Tadel,
Mark Neubauer,
Jeremi Niedziela,
Leo Piilonen,
Jim Pivarski,
Martin Ritter,
Tai Sakuma,
Matevz Tadel,
Barthélémy von Haller,
Ilija Vukotic,
Ben Waugh
Abstract:
In modern High Energy Physics (HEP) experiments visualization of experimental data has a key role in many activities and tasks across the whole data chain: from detector development to monitoring, from event generation to reconstruction of physics objects, from detector simulation to data analysis, and all the way to outreach and education. In this paper, the definition, status, and evolution of d…
▽ More
In modern High Energy Physics (HEP) experiments visualization of experimental data has a key role in many activities and tasks across the whole data chain: from detector development to monitoring, from event generation to reconstruction of physics objects, from detector simulation to data analysis, and all the way to outreach and education. In this paper, the definition, status, and evolution of data visualization for HEP experiments will be presented. Suggestions for the upgrade of data visualization tools and techniques in current experiments will be outlined, along with guidelines for future experiments. This paper expands on the summary content published in the HSF \emph{Roadmap} Community White Paper~\cite{HSF-CWP-2017-01}
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
Authors:
Lothar Bauerdick,
Riccardo Maria Bianchi,
Brian Bockelman,
Nuno Castro,
Kyle Cranmer,
Peter Elmer,
Robert Gardner,
Maria Girone,
Oliver Gutsche,
Benedikt Hegner,
José M. Hernández,
Bodhitha Jayatilaka,
David Lange,
Mark S. Neubauer,
Daniel S. Katz,
Lukasz Kreczko,
James Letts,
Shawn McKee,
Christoph Paus,
Kevin Pedro,
Jim Pivarski,
Martin Ritter,
Eduardo Rodrigues,
Tai Sakuma,
Elizabeth Sexton-Kennedy
, et al. (4 additional authors not shown)
Abstract:
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific po…
▽ More
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
A Roadmap for HEP Software and Computing R&D for the 2020s
Authors:
Johannes Albrecht,
Antonio Augusto Alves Jr,
Guilherme Amadio,
Giuseppe Andronico,
Nguyen Anh-Ky,
Laurent Aphecetche,
John Apostolakis,
Makoto Asai,
Luca Atzori,
Marian Babik,
Giuseppe Bagliesi,
Marilena Bandieramonte,
Sunanda Banerjee,
Martin Barisits,
Lothar A. T. Bauerdick,
Stefano Belforte,
Douglas Benjamin,
Catrin Bernius,
Wahid Bhimji,
Riccardo Maria Bianchi,
Ian Bird,
Catherine Biscarat,
Jakob Blomer,
Kenneth Bloom,
Tommaso Boccali
, et al. (285 additional authors not shown)
Abstract:
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for…
▽ More
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
△ Less
Submitted 19 December, 2018; v1 submitted 18 December, 2017;
originally announced December 2017.
-
CMS Analysis and Data Reduction with Apache Spark
Authors:
Oliver Gutsche,
Luca Canali,
Illia Cremer,
Matteo Cremonesi,
Peter Elmer,
Ian Fisk,
Maria Girone,
Bo Jayatilaka,
Jim Kowalkowski,
Viktor Khristenko,
Evangelos Motesnitsalis,
Jim Pivarski,
Saba Sehrish,
Kacper Surdy,
Alexey Svyatkovskiy
Abstract:
Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the a…
▽ More
Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and tools, promising a fresh look at analysis of very large datasets that could potentially reduce the time-to-physics with increased interactivity. Moreover these new tools are typically actively developed by large communities, often profiting of industry resources, and under open source licensing. These factors result in a boost for adoption and maturity of the tools and for the communities supporting them, at the same time helping in reducing the cost of ownership for the end-users. In this talk, we are presenting studies of using Apache Spark for end user data analysis. We are studying the HEP analysis workflow separated into two thrusts: the reduction of centrally produced experiment datasets and the end-analysis up to the publication plot. Studying the first thrust, CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. Studying the second thrust, we are presenting studies on using Apache Spark for a CMS Dark Matter physics search, comparing Spark's feasibility, usability and performance to the ROOT-based analysis.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Authors:
Burt Holzman,
Lothar A. T. Bauerdick,
Brian Bockelman,
Dave Dykstra,
Ian Fisk,
Stuart Fuess,
Gabriele Garzoglio,
Maria Girone,
Oliver Gutsche,
Dirk Hufnagel,
Hyunwoo Kim,
Robert Kennedy,
Nicolo Magini,
David Mason,
Panagiotis Spentzouris,
Anthony Tiradani,
Steve Timm,
Eric W. Vaandering
Abstract:
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly de…
▽ More
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing nterest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized both local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. In addition, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.
△ Less
Submitted 29 September, 2017;
originally announced October 2017.
-
Big Data in HEP: A comprehensive use case study
Authors:
Oliver Gutsche,
Matteo Cremonesi,
Peter Elmer,
Bo Jayatilaka,
Jim Kowalkowski,
Jim Pivarski,
Saba Sehrish,
Cristina Mantilla Surez,
Alexey Svyatkovskiy,
Nhan Tran
Abstract:
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of dat…
▽ More
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.
△ Less
Submitted 12 March, 2017;
originally announced March 2017.
-
ASCR/HEP Exascale Requirements Review Report
Authors:
Salman Habib,
Robert Roser,
Richard Gerber,
Katie Antypas,
Katherine Riley,
Tim Williams,
Jack Wells,
Tjerk Straatsma,
A. Almgren,
J. Amundson,
S. Bailey,
D. Bard,
K. Bloom,
B. Bockelman,
A. Borgland,
J. Borrill,
R. Boughezal,
R. Brower,
B. Cowan,
H. Finkel,
N. Frontiere,
S. Fuess,
L. Ge,
N. Gnedin,
S. Gottlieb
, et al. (29 additional authors not shown)
Abstract:
This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 ti…
▽ More
This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.
△ Less
Submitted 31 March, 2016; v1 submitted 30 March, 2016;
originally announced March 2016.
-
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Authors:
Salman Habib,
Robert Roser,
Tom LeCompte,
Zach Marshall,
Anders Borgland,
Brett Viren,
Peter Nugent,
Makoto Asai,
Lothar Bauerdick,
Hal Finkel,
Steve Gottlieb,
Stefan Hoeche,
Paul Sheldon,
Jean-Luc Vay,
Peter Elmer,
Michael Kirby,
Simon Patton,
Maxim Potekhin,
Brian Yanny,
Paolo Calafiura,
Eli Dart,
Oliver Gutsche,
Taku Izubuchi,
Adam Lyon,
Don Petravick
Abstract:
Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence…
▽ More
Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.
△ Less
Submitted 28 October, 2015;
originally announced October 2015.
-
Observation of the rare $B^0_s\toμ^+μ^-$ decay from the combined analysis of CMS and LHCb data
Authors:
The CMS,
LHCb Collaborations,
:,
V. Khachatryan,
A. M. Sirunyan,
A. Tumasyan,
W. Adam,
T. Bergauer,
M. Dragicevic,
J. Erö,
M. Friedl,
R. Frühwirth,
V. M. Ghete,
C. Hartl,
N. Hörmann,
J. Hrubec,
M. Jeitler,
W. Kiesenhofer,
V. Knünz,
M. Krammer,
I. Krätschmer,
D. Liko,
I. Mikulec,
D. Rabady,
B. Rahbaran
, et al. (2807 additional authors not shown)
Abstract:
A joint measurement is presented of the branching fractions $B^0_s\toμ^+μ^-$ and $B^0\toμ^+μ^-$ in proton-proton collisions at the LHC by the CMS and LHCb experiments. The data samples were collected in 2011 at a centre-of-mass energy of 7 TeV, and in 2012 at 8 TeV. The combined analysis produces the first observation of the $B^0_s\toμ^+μ^-$ decay, with a statistical significance exceeding six sta…
▽ More
A joint measurement is presented of the branching fractions $B^0_s\toμ^+μ^-$ and $B^0\toμ^+μ^-$ in proton-proton collisions at the LHC by the CMS and LHCb experiments. The data samples were collected in 2011 at a centre-of-mass energy of 7 TeV, and in 2012 at 8 TeV. The combined analysis produces the first observation of the $B^0_s\toμ^+μ^-$ decay, with a statistical significance exceeding six standard deviations, and the best measurement of its branching fraction so far. Furthermore, evidence for the $B^0\toμ^+μ^-$ decay is obtained with a statistical significance of three standard deviations. The branching fraction measurements are statistically compatible with SM predictions and impose stringent constraints on several theories beyond the SM.
△ Less
Submitted 17 August, 2015; v1 submitted 17 November, 2014;
originally announced November 2014.
-
A ROOT-based Client-Server Event Display for the ZEUS Experiment
Authors:
U. Fricke,
C. Genta,
O. Gutsche,
S. Hanlon,
E. Heaphy,
R. Kaczorowski,
O. Kind,
R. Mankel,
J. Rautenberg,
K. Wrona
Abstract:
A new event visualization tool for the ZEUS experiment is nearing completion, which will provide the functionality required by the new detector components implemented during the recently achieved HERA luminosity upgrade. The new design is centered around a client-server concept, which allows to obtain random access to the ZEUS central event store as well as to events taken online via the HTTP pr…
▽ More
A new event visualization tool for the ZEUS experiment is nearing completion, which will provide the functionality required by the new detector components implemented during the recently achieved HERA luminosity upgrade. The new design is centered around a client-server concept, which allows to obtain random access to the ZEUS central event store as well as to events taken online via the HTTP protocol, even from remote locations. The client is a lightweight C++ application, and the ROOT system is used as underlying toolkit. Particular attention has been given to a smooth integration of 3-dimensional and layered 2-dimensional visualizations. The functionality of server and client application with its graphical user interface are presented.
△ Less
Submitted 28 May, 2003;
originally announced May 2003.