subscribe to arXiv mailings

A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization

Authors: Daoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao, Zarija Lukić, Axel Huebl, Zhe Wang, James Ahrens, Dingwen Tao

Abstract: Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To thi… ▽ More Multi-resolution methods such as Adaptive Mesh Refinement (AMR) can enhance storage efficiency for HPC applications generating vast volumes of data. However, their applicability is limited and cannot be universally deployed across all applications. Furthermore, integrating lossy compression with multi-resolution techniques to further boost storage efficiency encounters significant barriers. To this end, we introduce an innovative workflow that facilitates high-quality multi-resolution data compression for both uniform and AMR simulations. Initially, to extend the usability of multi-resolution techniques, our workflow employs a compression-oriented Region of Interest (ROI) extraction method, transforming uniform data into a multi-resolution format. Subsequently, to bridge the gap between multi-resolution techniques and lossy compressors, we optimize three distinct compressors, ensuring their optimal performance on multi-resolution data. Lastly, we incorporate an advanced uncertainty visualization method into our workflow to understand the potential impacts of lossy compression. Experimental evaluation demonstrates that our workflow achieves significant compression quality improvements. △ Less

Submitted 11 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: accepted by SC '24

arXiv:2403.12179 [pdf, other]

AMReX and pyAMReX: Looking Beyond ECP

Authors: Andrew Myers, Weiqun Zhang, Ann Almgren, Thierry Antoun, John Bell, Axel Huebl, Alexander Sinn

Abstract: AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also de… ▽ More AMReX is a software framework for the development of block-structured mesh applications with adaptive mesh refinement (AMR). AMReX was initially developed and supported by the AMReX Co-Design Center as part of the U.S. DOE Exascale Computing Project, and is continuing to grow post-ECP. In addition to adding new functionality and performance improvements to the core AMReX framework, we have also developed a Python binding, pyAMReX, that provides a bridge between AMReX-based application codes and the data science ecosystem. pyAMReX provides zero-copy application GPU data access for AI/ML, in situ analysis and application coupling, and enables rapid, massively parallel prototyping. In this paper we review the overall functionality of AMReX and pyAMReX, focusing on new developments, new functionality, and optimizations of key operations. We also summarize capabilities of ECP projects that used AMReX and provide an overview of new, non-ECP applications. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 12 pages, 1 figure, submitted to the International Journal of High Performance Computing Applications

arXiv:2311.02010 [pdf, other]

A cast of thousands: How the IDEAS Productivity project has advanced software productivity and sustainability

Authors: Lois Curfman McInnes, Michael Heroux, David E. Bernholdt, Anshu Dubey, Elsa Gonsiorowski, Rinku Gupta, Osni Marques, J. David Moulton, Hai Ah Nam, Boyana Norris, Elaine M. Raybourn, Jim Willenbring, Ann Almgren, Ross Bartlett, Kita Cranfill, Stephen Fickas, Don Frederick, William Godoy, Patricia Grubel, Rebecca Hartman-Baker, Axel Huebl, Rose Lynch, Addi Malviya Thakur, Reed Milewicz, Mark C. Miller , et al. (9 additional authors not shown)

Abstract: Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-gene… ▽ More Computational and data-enabled science and engineering are revolutionizing advances throughout science and society, at all scales of computing. For example, teams in the U.S. DOE Exascale Computing Project have been tackling new frontiers in modeling, simulation, and analysis by exploiting unprecedented exascale computing capabilities-building an advanced software ecosystem that supports next-generation applications and addresses disruptive changes in computer architectures. However, concerns are growing about the productivity of the developers of scientific software, its sustainability, and the trustworthiness of the results that it produces. Members of the IDEAS project serve as catalysts to address these challenges through fostering software communities, incubating and curating methodologies and resources, and disseminating knowledge to advance developer productivity and software sustainability. This paper discusses how these synergistic activities are advancing scientific discovery-mitigating technical risks by building a firmer foundation for reproducible, sustainable science at all scales of computing, from laptops to clusters to exascale and beyond. △ Less

Submitted 16 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: 12 pages, 1 figure

arXiv:2310.00469 [pdf, other]

State of In Situ Visualization in Simulations: We are fast. But are we inspiring?

Authors: Axel Huebl, Arianna Formenti, Marco Garten, Jean-Luc Vay

Abstract: Visualization of dynamic processes in scientific high-performance computing is an immensely data intensive endeavor. Application codes have recently demonstrated scaling to full-size Exascale machines, and generating high-quality data for visualization is consequently on the machine-scale, easily spanning 100s of TBytes of input to generate a single video frame. In situ visualization, the techniqu… ▽ More Visualization of dynamic processes in scientific high-performance computing is an immensely data intensive endeavor. Application codes have recently demonstrated scaling to full-size Exascale machines, and generating high-quality data for visualization is consequently on the machine-scale, easily spanning 100s of TBytes of input to generate a single video frame. In situ visualization, the technique to consume the many-node decomposed data in-memory, as exposed by applications, is the dominant workflow. Although in situ visualization has achieved tremendous progress in the last decade, scaling to system-size together with the application codes that produce its data, there is one important question that we cannot skip: is what we produce insightful and inspiring? △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 2 pages + references, 1 figure, accepted lightning talk abstract for ISAV23 (in conjunction with SC23)

arXiv:2304.10566 [pdf, other]

doi 10.3847/1538-4357/acd75b

Particle-in-Cell Simulations of Relativistic Magnetic Reconnection with Advanced Maxwell Solver Algorithms

Authors: Hannah Klion, Revathi Jambunathan, Michael E. Rowan, Eloise Yang, Donald Willcox, Jean-Luc Vay, Remi Lehe, Andrew Myers, Axel Huebl, Weiqun Zhang

Abstract: Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and… ▽ More Relativistic magnetic reconnection is a non-ideal plasma process that is a source of non-thermal particle acceleration in many high-energy astrophysical systems. Particle-in-cell (PIC) methods are commonly used for simulating reconnection from first principles. While much progress has been made in understanding the physics of reconnection, especially in 2D, the adoption of advanced algorithms and numerical techniques for efficiently modeling such systems has been limited. With the GPU-accelerated PIC code WarpX, we explore the accuracy and potential performance benefits of two advanced Maxwell solver algorithms: a non-standard finite difference scheme (CKC) and an ultrahigh-order pseudo-spectral method (PSATD). We find that for the relativistic reconnection problem, CKC and PSATD qualitatively and quantitatively match the standard Yee-grid finite-difference method. CKC and PSATD both admit a time step that is 40% longer than Yee, resulting in a ~40% faster time to solution for CKC, but no performance benefit for PSATD when using a current deposition scheme that satisfies Gauss's law. Relaxing this constraint maintains accuracy and yields a 30% speedup. Unlike Yee and CKC, PSATD is numerically stable at any time step, allowing for a larger time step than with the finite-difference methods. We found that increasing the time step 2.4-3 times over the standard Yee step still yields accurate results, but only translates to modest performance improvements over CKC due to the current deposition scheme used with PSATD. Further optimization of this scheme will likely improve the effective performance of PSATD. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 19 pages, 10 figures. Submitted to ApJ

arXiv:2303.12873 [pdf, other]

From Compact Plasma Particle Sources to Advanced Accelerators with Modeling at Exascale

Authors: Axel Huebl, Remi Lehe, Edoardo Zoni, Olga Shapoval, Ryan T. Sandberg, Marco Garten, Arianna Formenti, Revathi Jambunathan, Prabhat Kumar, Kevin Gott, Andrew Myers, Weiqun Zhang, Ann Almgren, Chad E. Mitchell, Ji Qiang, David Grote, Alexander Sinn, Severin Diederichs, Maxence Thevenet, Luca Fedeli, Thomas Clark, Neil Zaim, Henri Vincenti, Jean-Luc Vay

Abstract: Developing complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using… ▽ More Developing complex, reliable advanced accelerators requires a coordinated, extensible, and comprehensive approach in modeling, from source to the end of beam lifetime. We present highlights in Exascale Computing to scale accelerator modeling software to the requirements set for contemporary science drivers. In particular, we present the first laser-plasma modeling on an exaflop supercomputer using the US DOE Exascale Computing Project WarpX. Leveraging developments for Exascale, the new DOE SCIDAC-5 Consortium for Advanced Modeling of Particle Accelerators (CAMPA) will advance numerical algorithms and accelerate community modeling codes in a cohesive manner: from beam source, over energy boost, transport, injection, storage, to application or interaction. Such start-to-end modeling will enable the exploration of hybrid accelerators, with conventional and advanced elements, as the next step for advanced accelerator modeling. Following open community standards, we seed an open ecosystem of codes that can be readily combined with each other and machine learning frameworks. These will cover ultrafast to ultraprecise modeling for future hybrid accelerator design, even enabling virtual test stands and twins of accelerators that can be used in operations. △ Less

Submitted 18 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 4 pages, 3 figures, presented at the 20th Advanced Accelerator Concepts Workshop (AAC22)

arXiv:2208.02382 [pdf, ps, other]

doi 10.18429/JACoW-NAPAC2022-TUYE2

Next Generation Computational Tools for the Modeling and Design of Particle Accelerators at Exascale

Authors: Axel Huebl, Remi Lehe, Chad E. Mitchell, Ji Qiang, Robert D. Ryne, Ryan T. Sandberg, Jean-Luc Vay

Abstract: Particle accelerators are among the largest, most complex devices. To meet the challenges of increasing energy, intensity, accuracy, compactness, complexity and efficiency, increasingly sophisticated computational tools are required for their design and optimization. It is key that contemporary software take advantage of the latest advances in computer hardware and scientific software engineering… ▽ More Particle accelerators are among the largest, most complex devices. To meet the challenges of increasing energy, intensity, accuracy, compactness, complexity and efficiency, increasingly sophisticated computational tools are required for their design and optimization. It is key that contemporary software take advantage of the latest advances in computer hardware and scientific software engineering practices, delivering speed, reproducibility and feature composability for the aforementioned challenges. A new open source software stack is being developed at the heart of the Beam pLasma Accelerator Simulation Toolkit (BLAST) by LBNL and collaborators, providing new particle-in-cell modeling codes capable of exploiting the power of GPUs on Exascale supercomputers. Combined with advanced numerical techniques, such as mesh-refinement, and intrinsic support for machine learning, these codes are primed to provide ultrafast to ultraprecise modeling for future accelerator design and operations. △ Less

Submitted 9 August, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

Comments: 4 pages, 8 figures; NAPAC22, Invited Oral, TUYE2

MSC Class: 78-10 ACM Class: I.6.0; D.2.12; D.2.13

Journal ref: NAPAC22, 2022

arXiv:2107.07108 [pdf, other]

doi 10.1109/TPDS.2021.3100784

Improving I/O Performance for Exascale Applications through Online Data Layout Reorganization

Authors: Lipeng Wan, Axel Huebl, Junmin Gu, Franz Poeschel, Ana Gainaru, Ruonan Wang, Jieyang Chen, Xin Liang, Dmitry Ganyushin, Todd Munson, Ian Foster, Jean-Luc Vay, Norbert Podhorszki, Kesheng Wu, Scott Klasky

Abstract: The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Y… ▽ More The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80%. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: 12 pages, 15 figures, accepted by IEEE Transactions on Parallel and Distributed Systems

Journal ref: IEEE Transactions on Parallel and Distributed Systems, 2021

arXiv:2107.06108 [pdf]

doi 10.1007/978-3-030-96498-6_6

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

Authors: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl

Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes… ▽ More This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks. △ Less

Submitted 19 January, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276

arXiv:2104.11385 [pdf, other]

doi 10.1145/3468267.3470614

In-Situ Assessment of Device-Side Compute Work for Dynamic Load Balancing in a GPU-Accelerated PIC Code

Authors: Michael E. Rowan, Axel Huebl, Kevin N. Gott, Jack Deslippe, Maxence Thévenet, Remi Lehe, Jean-Luc Vay

Abstract: Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performan… ▽ More Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if improperly load balanced. We present enhancements to traditional load balancing approaches and explicitly target GPU architectures, exploring the resulting performance. A key component of our enhancements is the introduction of several GPU-amenable strategies for assessing compute work. These strategies are implemented and benchmarked to find the most optimal data collection methodology for in-situ assessment of GPU compute work. For the fully kinetic particle-in-cell code WarpX, which supports MPI+CUDA parallelism, we investigate the performance of the improved dynamic load balancing via a strong scaling-based performance model and show that, for a laser-ion acceleration test problem run with up to 6144 GPUs on Summit, the enhanced dynamic load balancing achieves from 62%--74% (88% when running on 6 GPUs) of the theoretically predicted maximum speedup; for the 96-GPU case, we find that dynamic load balancing improves performance relative to baselines without load balancing (3.8x speedup) and with static load balancing (1.2x speedup). Our results provide important insights into dynamic load balancing and performance assessment, and are particularly relevant in the context of distributed memory applications ran on GPUs. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 11 pages, 8 figures. Paper accepted in the Platform for Advanced Scientific Computing Conference (PASC '21), July 5 to 9, 2021, Geneva, Switzerland

Journal ref: PASC 2021: Proceedings of the Platform for Advanced Scientific Computing Conference

arXiv:2101.12149 [pdf, other]

doi 10.1016/j.parco.2021.102833

Porting WarpX to GPU-accelerated platforms

Authors: A. Myers, A. Almgren, L. D. Amorim, J. Bell, L. Fedeli, L. Ge, K. Gott, D. P. Grote, M. Hogan, A. Huebl, R. Jambunathan, R. Lehe, C. Ng, M. Rowan, O. Shapoval, M. Thévenet, J. -L. Vay, H. Vincenti, E. Yang, N. Zaïm, W. Zhang, Y. Zhao, E. Zoni

Abstract: WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give curren… ▽ More WarpX is a general purpose electromagnetic particle-in-cell code that was originally designed to run on many-core CPU architectures. We describe the strategy followed to allow WarpX to use the GPU-accelerated nodes on OLCF's Summit supercomputer, a strategy we believe will extend to the upcoming machines Frontier and Aurora. We summarize the challenges encountered, lessons learned, and give current performance results on a series of relevant benchmark problems. △ Less

Submitted 2 September, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

Comments: 11 pages, 5 figures, accepted by Parallel Computing. Minor revisions, results unchanged

Journal ref: Parallel Computing, Volume 108, 2021, 102833

arXiv:1706.10086 [pdf, other]

doi 10.1007/978-3-319-67630-2_36

Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library

Authors: Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, Michael Bussmann

Abstract: We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this e… ▽ More We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidia's Tesla P100, Intel's Knights Landing (KNL) and Haswell architecture as well as IBM's Power8 system. On some of these we are able to reach almost 50\% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific #pragmas we are able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system. △ Less

Submitted 30 June, 2017; originally announced June 2017.

Comments: Accepted paper for the P\^{}3MA workshop at the ISC 2017 in Frankfurt

Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 496-514, 2017

arXiv:1706.00522 [pdf, other]

doi 10.1007/978-3-319-67630-2_2

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

Authors: Axel Huebl, Rene Widera, Felix Schmitt, Alexander Matthes, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Michael Bussmann

Abstract: We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threa… ▽ More We implement and benchmark parallel I/O methods for the fully-manycore driven particle-in-cell code PIConGPU. Identifying throughput and overall I/O size as a major challenge for applications on today's and future HPC systems, we present a scaling law characterizing performance bottlenecks in state-of-the-art approaches for data reduction. Consequently, we propose, implement and verify multi-threaded data-transformations for the I/O library ADIOS as a feasible way to trade underutilized host-side compute potential on heterogeneous systems for reduced I/O latency. △ Less

Submitted 1 June, 2017; originally announced June 2017.

Comments: 15 pages, 5 figures, accepted for DRBSD-1 in conjunction with ISC'17

ACM Class: D.4.8; B.4.3; I.6.6

Journal ref: J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 15-29, 2017

arXiv:1611.09048 [pdf, other]

doi 10.14529/jsfi160403

In situ, steerable, hardware-independent and data-structure agnostic visualization with ISAAC

Authors: Alexander Matthes, Axel Huebl, René Widera, Sebastian Grottel, Stefan Gumhold, Michael Bussmann

Abstract: The computation power of supercomputers grows faster than the bandwidth of their storage and network. Especially applications using hardware accelerators like Nvidia GPUs cannot save enough data to be analyzed in a later step. There is a high risk of loosing important scientific information. We introduce the in situ template library ISAAC which enables arbitrary applications like scientific simula… ▽ More The computation power of supercomputers grows faster than the bandwidth of their storage and network. Especially applications using hardware accelerators like Nvidia GPUs cannot save enough data to be analyzed in a later step. There is a high risk of loosing important scientific information. We introduce the in situ template library ISAAC which enables arbitrary applications like scientific simulations to live visualize their data without the need of deep copy operations or data transformation using the very same compute node and hardware accelerator the data is already residing on. Arbitrary meta data can be added to the renderings and user defined steering commands can be asynchronously sent back to the running application. Using an aggregating server, ISAAC streams the interactive visualization video and enables user to access their applications from everywhere. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Journal ref: Supercomputing Frontiers and Innovations, [S.l.], v. 3, n. 4, p. 30-48, oct. 2016

arXiv:1606.02862 [pdf, other]

doi 10.1007/978-3-319-46079-6_21

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Authors: Erik Zenker, René Widera, Axel Huebl, Guido Juckeland, Andreas Knüpfer, Wolfgang E. Nagel, Michael Bussmann

Abstract: With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs,… ▽ More With the appearance of the heterogeneous platform OpenPower,many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka. We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs. △ Less

Submitted 12 June, 2016; v1 submitted 9 June, 2016; originally announced June 2016.

Comments: 9 pages, 3 figures, accepted on IWOPH 2016

Journal ref: Lecture Notes in Computer Science, 9945, pp 293-301, 2016

arXiv:1602.08477 [pdf, other]

doi 10.1109/IPDPSW.2016.50

Alpaka - An Abstraction Library for Parallel Kernel Acceleration

Authors: Erik Zenker, Benjamin Worpitz, René Widera, Axel Huebl, Guido Juckeland, Andreas Knüpfer, Wolfgang E. Nagel, Michael Bussmann

Abstract: Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model explo… ▽ More Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits parallelism and memory hierarchies on a node at all levels available in current hardware. By doing so, it allows to achieve platform and performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator. All hardware types (multi- and many-core CPUs, GPUs and other accelerators) are supported for and can be programmed in the same way. The Alpaka C++ template interface allows for straightforward extension of the library to support other accelerators and specialization of its internals for optimization. Running Alpaka applications on a new (and supported) platform requires the change of only one source code line instead of a lot of \#ifdefs. △ Less

Submitted 26 February, 2016; originally announced February 2016.

Comments: 10 pages, 10 figures

Showing 1–16 of 16 results for author: Huebl, A