By Krishna Kumar, Institute's fellow, University of Cambridge
The University of Cambridge with the support of the Software Sustainability Institute is organising a workshop on Containers for HPC: A Workshop on Singularity and Containers in HPC and Cloud on 29th and 30th June 2017. The aim of the workshop is to give an overview of container technologies in the context of Research Computing, with a specific focus on enabling HPC and GPU workloads.
The main focus will be around Singularity which is available in the current HPC system at Cambridge and is also the chosen technology that will be implemented in the new Cambridge Service for Data Driven Discovery at Cambridge (CSD3). Alongside a few special keynote talks, an afternoon session will cover practical examples from running a container on HPC to building your own Singularity container images.
Further information and registration is available at Containers for HPC workshop website. Places are limited, please make sure you have a real interest and need of this container technology and you are also familiar with Linux environments.
Containers for High Performance Computing
Reproducible research has been receiving an increasing level of attention throughout universities and research laboratories. There has been a gradual adoption of open source tools and open standards in research, which facilitates research collaboration and enables reproducibility. However, as the complexity of research tools increases, the development environment and dependencies are critical for reproducibility.
In the past few years, there has been an increased adoption of container technologies such as Docker, Kubernetes and Singularity. Containers wrap up a piece of software in a complete file system that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that a container will always run the same, regardless of the environment it is deployed. Containers, especially Docker, has been promoted as a critical tool for reproducible research.
Traditionally, research communities conduct large-scale simulations and data analysis on HPC clusters. These clusters are typically set-up on slightly different configuration of hardware, operating system and software. Trying to recreate an environment to re-run code exactly as it was executed on another cluster is beyond the abilities, free time, and account privileges of most researchers who may want to try to reproduce the results. Often times it is also impossible for HPC managers to install new libraries, and support an array of research tools and libraries required by an ever increasing diverse research communities. Ability to run containers on HPC clusters will enable researchers and HPC administrators alike to facilitate reproducible research on large-scale clusters, which was never been possible in the past. However, certain container technologies, such as Docker, require privileged access to the underlying operating system, which is often not possible on a university-wide HPC cluster.
Singularity is an implementation of a container, and an engine to run those containers without requiring any privileged access (root-access). Containers allow researchers to isolate the software environment needed to produce a result away from the configuration and operating system of the computer that the analysis will be run on. This means that your colleague at University X can run the analysis exactly the same way on their cluster as you are running it on your cluster at University Y, and all it requires is sharing a Git repository (code) and a container image (which includes dependency libraries, software and tools).
Docker Containers for Reproducible Research Workshop.