-
A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization
Authors:
Daoce Wang,
Pascal Grosset,
Jesus Pulido,
Tushar M. Athawale,
Jiannan Tian,
Kai Zhao,
Zarija Lukić,
Axel Huebl,
Zhe Wang,
James Ahrens,
Dingwen Tao
Abstract:
Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To thi…
▽ More
Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To this end, we introduce an innovative workflow that facilitates high-quality multi-resolution data compression for both uniform and AMR simulations. Initially, to extend the usability of multi-resolution techniques, our workflow employs a compression-oriented Region of Interest (ROI) extraction method, transforming uniform data into a multi-resolution format. Subsequently, to bridge the gap between multi-resolution techniques and lossy compressors, we optimize three distinct compressors, ensuring their optimal performance on multi-resolution data. Lastly, we incorporate an advanced uncertainty visualization method into our workflow to understand the potential impacts of lossy compression. Experimental evaluation demonstrates that our workflow achieves significant compression quality improvements.
△ Less
Submitted 11 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
AMReX and pyAMReX: Looking Beyond ECP
Authors:
Andrew Myers,
Weiqun Zhang,
Ann Almgren,
Thierry Antoun,
John Bell,
Axel Huebl,
Alexander Sinn
Abstract:
AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also de…
▽ More
AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also developed a Python binding, pyAMReX, that provides a bridge between AMReX-based application codes and the data science ecosystem. pyAMReX provides zero-copy application GPU data access for AI/ML, in situ analysis and application coupling, and enables rapid, massively parallel prototyping. In this paper we review the overall functionality of AMReX and pyAMReX, focusing on new developments, new functionality, and optimizations of key operations. We also summarize capabilities of ECP projects that used AMReX and provide an overview of new, non-ECP applications.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability
Authors:
Lois Curfman McInnes,
Michael Heroux,
David E. Bernholdt,
Anshu Dubey,
Elsa Gonsiorowski,
Rinku Gupta,
Osni Marques,
J. David Moulton,
Hai Ah Nam,
Boyana Norris,
Elaine M. Raybourn,
Jim Willenbring,
Ann Almgren,
Ross Bartlett,
Kita Cranfill,
Stephen Fickas,
Don Frederick,
William Godoy,
Patricia Grubel,
Rebecca Hartman-Baker,
Axel Huebl,
Rose Lynch,
Addi Malviya Thakur,
Reed Milewicz,
Mark C. Miller
, et al. (9 additional authors not shown)
Abstract:
Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-gene…
▽ More
Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software, its sustainability, and the trustworthiness of the results that it produces. Members of the IDEAS project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This paper discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond.
△ Less
Submitted 16 February, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
State of In Situ Visualization in Simulations: We are fast. But are we inspiring?
Authors:
Axel Huebl,
Arianna Formenti,
Marco Garten,
Jean-Luc Vay
Abstract:
Visualization of dynamic processes in scientific high-performance computing is an immensely data intensive endeavor. Application codes have recently demonstrated scaling to full-size Exascale machines, and generating high-quality data for visualization is consequently on the machine-scale, easily spanning 100s of TBytes of input to generate a single video frame. In situ visualization, the techniqu…
▽ More
Visualization of dynamic processes in scientific high-performance computing is an immensely data intensive endeavor. Application codes have recently demonstrated scaling to full-size Exascale machines, and generating high-quality data for visualization is consequently on the machine-scale, easily spanning 100s of TBytes of input to generate a single video frame. In situ visualization, the technique to consume the many-node decomposed data in-memory, as exposed by applications, is the dominant workflow. Although in situ visualization has achieved tremendous progress in the last decade, scaling to system-size together with the application codes that produce its data, there is one important question that we cannot skip: is what we produce insightful and inspiring?
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms
Authors:
Hannah Klion,
Revathi Jambunathan,
Michael E. Rowan,
Eloise Yang,
Donald Willcox,
Jean-Luc Vay,
Remi Lehe,
Andrew Myers,
Axel Huebl,
Weiqun Zhang
Abstract:
Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and…
▽ More
Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and numerical techniques for efficiently modeling such systems has been limited. With the GPU-accelerated PIC code WarpX, we explore the accuracy and potential performance benefits of two advanced Maxwell solver algorithms: a non-standard finite difference scheme (CKC) and an ultrahigh-order pseudo-spectral method (PSATD). We find that for the relativistic reconnection problem, CKC and PSATD qualitatively and quantitatively match the standard Yee-grid finite-difference method. CKC and PSATD both admit a time step that is 40% longer than Yee, resulting in a ~40% faster time to solution for CKC, but no performance benefit for PSATD when using a current deposition scheme that satisfies Gauss's law. Relaxing this constraint maintains accuracy and yields a 30% speedup. Unlike Yee and CKC, PSATD is numerically stable at any time step, allowing for a larger time step than with the finite-difference methods. We found that increasing the time step 2.4-3 times over the standard Yee step still yields accurate results, but only translates to modest performance improvements over CKC due to the current deposition scheme used with PSATD. Further optimization of this scheme will likely improve the effective performance of PSATD.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
From Compact Plasma Particle Sources to Advanced Accelerators with Modeling at Exascale
Authors:
Axel Huebl,
Remi Lehe,
Edoardo Zoni,
Olga Shapoval,
Ryan T. Sandberg,
Marco Garten,
Arianna Formenti,
Revathi Jambunathan,
Prabhat Kumar,
Kevin Gott,
Andrew Myers,
Weiqun Zhang,
Ann Almgren,
Chad E. Mitchell,
Ji Qiang,
David Grote,
Alexander Sinn,
Severin Diederichs,
Maxence Thevenet,
Luca Fedeli,
Thomas Clark,
Neil Zaim,
Henri Vincenti,
Jean-Luc Vay
Abstract:
Developing complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using…
▽ More
Developing complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using the US DOE Exascale Computing Project WarpX. Leveraging developments for Exascale, the new DOE SCIDAC-5 Consortium for Advanced Modeling of Particle Accelerators (CAMPA) will advance numerical algorithms and accelerate community modeling codes in a cohesive manner: from beam source, over energy boost, transport, injection, storage, to application or interaction. Such start-to-end modeling will enable the exploration of hybrid accelerators, with conventional and advanced elements, as the next step for advanced accelerator modeling. Following open community standards, we seed an open ecosystem of codes that can be readily combined with each other and machine learning frameworks. These will cover ultrafast to ultraprecise modeling for future hybrid accelerator design, even enabling virtual test stands and twins of accelerators that can be used in operations.
△ Less
Submitted 18 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Next Generation Computational Tools for the Modeling and Design of Particle Accelerators at Exascale
Authors:
Axel Huebl,
Remi Lehe,
Chad E. Mitchell,
Ji Qiang,
Robert D. Ryne,
Ryan T. Sandberg,
Jean-Luc Vay
Abstract:
Particle accelerators are among the largest, most complex devices. To meet the challenges of increasing energy, intensity, accuracy, compactness, complexity and efficiency, increasingly sophisticated computational tools are required for their design and optimization. It is key that contemporary software take advantage of the latest advances in computer hardware and scientific software engineering…
▽ More
Particle accelerators are among the largest, most complex devices. To meet the challenges of increasing energy, intensity, accuracy, compactness, complexity and efficiency, increasingly sophisticated computational tools are required for their design and optimization. It is key that contemporary software take advantage of the latest advances in computer hardware and scientific software engineering practices, delivering speed, reproducibility and feature composability for the aforementioned challenges. A new open source software stack is being developed at the heart of the Beam pLasma Accelerator Simulation Toolkit (BLAST) by LBNL and collaborators, providing new particle-in-cell modeling codes capable of exploiting the power of GPUs on Exascale supercomputers. Combined with advanced numerical techniques, such as mesh-refinement, and intrinsic support for machine learning, these codes are primed to provide ultrafast to ultraprecise modeling for future accelerator design and operations.
△ Less
Submitted 9 August, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization
Authors:
Lipeng Wan,
Axel Huebl,
Junmin Gu,
Franz Poeschel,
Ana Gainaru,
Ruonan Wang,
Jieyang Chen,
Xin Liang,
Dmitry Ganyushin,
Todd Munson,
Ian Foster,
Jean-Luc Vay,
Norbert Podhorszki,
Kesheng Wu,
Scott Klasky
Abstract:
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Y…
▽ More
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80%.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2
Authors:
Franz Poeschel,
Juncheng E,
William F. Godoy,
Norbert Podhorszki,
Scott Klasky,
Greg Eisenhauer,
Philip E. Davis,
Lipeng Wan,
Ana Gainaru,
Junmin Gu,
Fabian Koller,
René Widera,
Michael Bussmann,
Axel Huebl
Abstract:
This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes…
▽ More
This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.
△ Less
Submitted 19 January, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code
Authors:
Michael E. Rowan,
Axel Huebl,
Kevin N. Gott,
Jack Deslippe,
Maxence Thévenet,
Remi Lehe,
Jean-Luc Vay
Abstract:
Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performan…
▽ More
Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a laser-ion acceleration test problem run with up to 6144 GPUs on Summit, the enhanced dynamic load balancing achieves from 62%--74% (88% when running on 6 GPUs) of the theoretically predicted maximum speedup; for the 96-GPU case, we find that dynamic load balancing improves performance relative to baselines without load balancing (3.8x speedup) and with static load balancing (1.2x speedup). Our results provide important insights into dynamic load balancing and performance assessment, and are particularly relevant in the context of distributed memory applications ran on GPUs.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Porting WarpX to GPU-accelerated platforms
Authors:
A. Myers,
A. Almgren,
L. D. Amorim,
J. Bell,
L. Fedeli,
L. Ge,
K. Gott,
D. P. Grote,
M. Hogan,
A. Huebl,
R. Jambunathan,
R. Lehe,
C. Ng,
M. Rowan,
O. Shapoval,
M. Thévenet,
J. -L. Vay,
H. Vincenti,
E. Yang,
N. Zaïm,
W. Zhang,
Y. Zhao,
E. Zoni
Abstract:
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren…
▽ More
WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems.
△ Less
Submitted 2 September, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library
Authors:
Alexander Matthes,
René Widera,
Erik Zenker,
Benjamin Worpitz,
Axel Huebl,
Michael Bussmann
Abstract:
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this e…
▽ More
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidia's Tesla P100, Intel's Knights Landing (KNL) and Haswell architecture as well as IBM's Power8 system. On some of these we are able to reach almost 50\% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we are able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective
Authors:
Axel Huebl,
Rene Widera,
Felix Schmitt,
Alexander Matthes,
Norbert Podhorszki,
Jong Youl Choi,
Scott Klasky,
Michael Bussmann
Abstract:
We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa…
▽ More
We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.
-
In situ, steerable, hardware-independent and data-structure agnostic visualization with ISAAC
Authors:
Alexander Matthes,
Axel Huebl,
René Widera,
Sebastian Grottel,
Stefan Gumhold,
Michael Bussmann
Abstract:
The computation power of supercomputers grows faster than the bandwidth of their storage and network. Especially applications using hardware accelerators like Nvidia GPUs cannot save enough data to be analyzed in a later step. There is a high risk of loosing important scientific information. We introduce the in situ template library ISAAC which enables arbitrary applications like scientific simula…
▽ More
The computation power of supercomputers grows faster than the bandwidth of their storage and network. Especially applications using hardware accelerators like Nvidia GPUs cannot save enough data to be analyzed in a later step. There is a high risk of loosing important scientific information. We introduce the in situ template library ISAAC which enables arbitrary applications like scientific simulations to live visualize their data without the need of deep copy operations or data transformation using the very same compute node and hardware accelerator the data is already residing on. Arbitrary meta data can be added to the renderings and user defined steering commands can be asynchronously sent back to the running application. Using an aggregating server, ISAAC streams the interactive visualization video and enables user to access their applications from everywhere.
△ Less
Submitted 28 November, 2016;
originally announced November 2016.
-
Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
Authors:
Erik Zenker,
René Widera,
Axel Huebl,
Guido Juckeland,
Andreas Knüpfer,
Wolfgang E. Nagel,
Michael Bussmann
Abstract:
With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs,…
▽ More
With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka. We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs.
△ Less
Submitted 12 June, 2016; v1 submitted 9 June, 2016;
originally announced June 2016.
-
Alpaka - An Abstraction Library for Parallel Kernel Acceleration
Authors:
Erik Zenker,
Benjamin Worpitz,
René Widera,
Axel Huebl,
Guido Juckeland,
Andreas Knüpfer,
Wolfgang E. Nagel,
Michael Bussmann
Abstract:
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform.
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model explo…
▽ More
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform.
The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits parallelism and memory hierarchies on a node at all levels available in current hardware. By doing so, it allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. All hardware types (multi- and many-core CPUs, GPUs and other accelerators) are supported for and can be programmed in the same way. The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization.
Running Alpaka applications on a new (and supported) platform requires the change of only one source code line instead of a lot of \#ifdefs.
△ Less
Submitted 26 February, 2016;
originally announced February 2016.