John Blaas

Boulder, Colorado, United States Contact Info
291 followers 277 connections

Join to view profile

Activity

Join now to see all activity

Publications

  • Clushible: Tidal Wave-Like Configuration with Ansible

    SC23: HPCSYSPROS Workshop

    Configuration of HPC nodes is an important aspect of maintaining any HPC cluster. Our flagship HPE/Cray EX supercomputer, Derecho, is approximately 2,500 compute nodes and is susceptible to power interruptions from external factors such as lightning strike induced power sags and utility mishaps. These events challenged us to find an acceptable mean time to recovery. Ansible is our selected configuration management system but struggles with single large-scale runs of configuration despite…

    Configuration of HPC nodes is an important aspect of maintaining any HPC cluster. Our flagship HPE/Cray EX supercomputer, Derecho, is approximately 2,500 compute nodes and is susceptible to power interruptions from external factors such as lightning strike induced power sags and utility mishaps. These events challenged us to find an acceptable mean time to recovery. Ansible is our selected configuration management system but struggles with single large-scale runs of configuration despite optimizing individual runs such as tuning fork count and enabling pipelining. We needed a method to perform a large blast of configuration within a short time period to get the system back to a functional state or apply some level of remediation such as security updates. We therefore wrote a utility, Clushible, which wraps Ansible with ClusterShell's Python API to scale out the execution of Ansible that effectively took our standard full system run from multiple hours to minutes.

    Other authors
    See publication
  • Stateless Provisioning: Modern Practice in HPC

    SC18: HPCSYSPROS Workshop

    We outline a model for creating a continuous integration and continuous delivery work flow targeted at provisioning CPIO based initramfs images that are used to run computational work nodes in a bare metal cluster running RHEL or CentOS.

    See publication

Organizations

  • ACM SIGHPC Syspros

    Chair

    - Present
  • CaRCC Systems-Facing track

    Steering Committee Member

    - Present
  • ACM SIGHPC Syspros

    Member at Large

    -

More activity by John

View John’s full profile

  • See who you know in common
  • Get introduced
  • Contact John directly
Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named John Blaas

1 other named John Blaas is on LinkedIn

See others named John Blaas

Add new skills with these courses