Universe-HPC aims to define a training curriculum framework – spanning from undergraduate to continuing professional development level – for Research Software Engineers (RSEs) specialising in high performance computing (HPC). As part of this project, we explore ways to make training more accessible, for audiences at various skill levels. Together with HPC resources kindly provided by ARCHER2, in March we piloted an online foundational course for researchers in Southampton who were interested in learning fundamental high-performance computing (HPC) skills.
One of the outputs of UNIVERSE-HPC is to make existing training materials available, particularly in the areas of Research Software Engineering and HPC. These materials are written in an easily reusable Markdown format in a GitHub repository under a Creative Commons Licence, for anyone to incorporate into their own courses or reuse however they wish. So far, the project has made 28 different modules available, from introductory aspects of mathematics and Python to topics such as advanced version control, unit testing, and scientific computing. One of the more recent ones is an introductory HPC course originally developed by EPCC, which is currently being ported into this format and was piloted in March this year, using ARCHER2 as the learning platform. The topics included introductions to Supercomputing, parallel and distributed computation, and computer simulations. These gave the attendees the opportunity to gain experience with compiling example HPC applications written using OpenMP and the Message Passing Interface (MPI). Both of these technologies take different approaches within programs to conduct computational processing in parallel, and hence complete much faster.
Delivering any form of computational training can be a challenge at the best of times, particularly with the breadth of software required on attendees' machines, and navigating the technical complexities and problems typically encountered during a workshop. Providing training for high performance computing presents a new set of challenges, particularly around navigating the new principles, paradigms, technologies and architectures within a learning narrative that doesn't overwhelm the learners.
The Real Thing™ as a Learning Platform
Our foundational course aims to provide a broad introduction to HPC, both conceptually and practically, covering the key topics in introductory detail but also allowing the learners hands-on experience with the technologies typically associated with HPC, such as batch job schedulers like Slurm, and parallelisation implementations like OpenMP and MPI. OpenMP and MPI are very useful if computations would otherwise take a long time. OpenMP allows a program to parallelise computation across multiple threads by specifying parts of the program that can be computed in parallel. On the other side, MPI allows parallel computation at a larger scale, across many copies of the same program running simultaneously, and is able to communicate data via messages.
However, in addition to providing a wealth of training courses themselves, ARCHER2 also allows access to HPC resources for external training providers on request. The huge benefit of hosting training on such a platform is that as well as gaining practical experience on a real HPC infrastructure - the UK's National Supercomputing Service - learners need only follow an ARCHER2 registration procedure and install an SSH client program on their machines. An SSH client allows attendees to connect remotely to other machines and infrastructures like ARCHER2,
which has these technologies and tools (and many others) already installed for use. This greatly simplifies this aspect for novices and helps avoid dedicating a lot of time at the start of the workshop to resolving installation problems. In the case of Slurm, it wouldn't make a lot of sense, given HPC users do not need to install their own batch schedulers on their machines.
"I liked the ability to use the HPC directly and link this to the terminology used throughout the course." - learner at March 2024 pilot |
The other huge benefit is that ARCHER2 allowed the learner to continue to access ARCHER for a week after the pilot, which meant they had the option to continue to explore the infrastructure and apply what they had learned.
Supported Self-Learning
"The topic is very new to me and I found the content very helpful for understanding what HPC is, how it works, and when it can be useful. The instructors were excellent, always around to help and were excellent at helping us through the various issues." - learner at March 2024 pilot |
Our previous pilot last August used a hybrid format, with introductory lessons delivered using a live coding, instructor-led approach, and later lessons provided in self-learning sessions. From the post-workshop survey, overall, learners found the pilot an effective learning experience (8.2/10, n=9) that compared favourably with previous online training courses learners had attended (7.7/10, n=9).
The March pilot was entirely self-learning, with instructors on hand to help out with any difficulties, and learners able to proceed at entirely their own pace. This approach was well received (7.2/10, n=9), with some lauding it ("Loved that it was self paced, a good range of topics but were all well detailed"), although others would have liked more practical exercises. It was also noted that a hybrid approach would have been of benefit here too.
We also made use of Oxford's Gutenberg training platform again, deployed at Southampton, which hosted the training materials and was very well received (8.7/10, n=9). In particular, we used its more advanced training tools, which were really useful. Firstly, the ability for instructors to monitor progress through the material in real time as learners ticked-off exercises as they finished them. We found this very useful to measure progress, although some learners mentioned this was sometimes they would forget to do, so we're working on incorporating reminders into the system. Secondly, we also made use of its material annotations feature, which allows learners and instructors alike to add comments to passages of the material. These can be monitored in real time - a very useful way to rapidly record and monitor issues as they arise.
We're currently improving the course based on the feedback we've received, prior to making these modules available as part of the UNIVERSE-HPC course materials.