agave-container-workshop-20180806

Advanced Singularity

When using containers for scientific analysis, there are generally 3 ingredients that are being combined:

  1. The dependencies for the code or software that you want to run (e.g. python, tensorflow, etc.)
  2. The code or software package itself
  3. the data that the code is consuming to do an analysis

In general, the first two live inside the container. The third usually changes each time you run the software, so it usually lives outside the container.

Let’s start by cementing what you learned in the previous sections on Docker and Singularity and putting it in this context.

Exercise 1 (15-20 minutes)

Build a Singularity container that implements a simple Tensorflow image classifier. GPUs are not required for this example; it will just use CPUs.

The image classifier script is available “out of the box” here: https://raw.githubusercontent.com/tensorflow/models/master/tutorials/image/imagenet/classify_image.py

This script represents your “code”. This represents something that you might have written that you want to package up into a portable container.

You also need to choose an image to classify. You might call it “cat.png”, for example. This represents your data. Data normally isn’t packaged into a container, though it could be. There are better ways for preserving and sharing data than containers.

Tensorflow has working Docker containers on DockerHub that you can use to support all the dependencies. For example, the first line of your Dockerfile might look like:

  FROM tensorflow/tensorflow:1.5.0-py3

When running the image classifier, the non-containerized version would be invoked with something like:

  python /classify_image.py --image_file cat.png

You can use a Singularity file or a Dockerfile to help you. For reference, you can look back at the previous material or the respective manual pages:

Using HPC Environments

Conducting analyses on high performance computing clusters happens through very different patterns of interaction than running analyses on a VM. When you login, you are on a node that is shared with lots of people. Trying to run jobs on that node is not “high performance” at all. Those login nodes are just intended to be used for moving files, editing files, and launching jobs.

Most jobs on an HPC cluster are neither interactive, nor realtime. When you submit a job to the scheduler, you must tell it what resources you need (e.g. how many nodes, what type of nodes) and what you want to run. Then the scheduler finds resources matching your requirements, and runs the job for you when it can.

For example, if you want to run the command:

  singularity exec docker://python:latest /usr/local/bin/python

On an HPC system, your job submission script would look something like:

  #!/bin/bash
  #
  #SBATCH -p sb.q                       # Queue name
  #SBATCH -N 1                          # Total number of nodes requested (68 cores/node)
  #SBATCH -c 1                          # Number of cores requested
  #SBATCH -t 30                         # Run time in minutes
  #SBATCH -o example.out                # Standard out goes to this file
  #SBATCH -e example.err                # Standard err goes to this file

  source ~/.bash_profile
  module load singularity
  singularity exec docker://python:latest /usr/local/bin/python --version

This example is for the Slurm scheduler, a popular one used by many systems. Each of the #SBATCH lines looks like a comment to the bash kernel, but the scheduler reads all those lines to know what resources to reserve for you.

It is usually possible to get an interactive session as well. At the University of Hawaii, the command “srun” is used to get an interactive development session. For example:

  srun -I --pty -p sb.q -t 30 /bin/bash

The example above will give you a 30 minute interactive session on the “sb.q” partition of the system.

Every HPC cluster is a little different, but they almost universally have a “User’s Guide” that serves both as a quick reference for helpful commands and contains guidelines for how to be a “good citizen” while using the system. For Hawaii’s HPC systems, the website is: https://www.hawaii.edu/its/ci/

How do HPC systems fit into the development workflow?

A few things to consider when using HPC systems:

  1. Using sudo is not allowed on HPC systems, and building a Singularity container from scratch requires sudo. That means you have to build your containers on a different development system. You can pull a docker image on HPC systems
  2. If you need to edit text files, command line text editors don’t support using a mouse, so working efficiently has a learning curve. There are text editors that support editing files over SSH. This lets you use a local text editor and just save the changes to the HPC system.
  3. Singularity is in the process of changing image formats. Depending on the version of Singularity running on the HPC system, new squashFS or .simg formats may not work.

Directory Mount Points

from http://singularity.lbl.gov/docs-mount

To mount a bind path inside the container, a bind point must be defined within the container. The bind point is a directory within the container that Singularity can use to bind a directory on the host system. This means that if you want to bind to a point within the container such as /global, that directory must already exist within the container.

On the University of Hawaii HPC systems, most files should be housed on the /lus filesystem. To use that filesystem within your containers, be sure to make a mount point inside the container. This can be done in Docker or Singularity

For example, in your Dockerfile, you could include:

RUN mkdir /lus

Singularity and MPI

Singularity supports MPI fairly well. Since (by default) the network is the same insde and outside the container, the communication between containers usually just works. The more complicated bit is making sure that the container has the right set of MPI libraries. MPI is an open specification, but there are several implementations (OpenMPI, MVAPICH2, and Intel MPI to name three) with some non-overlapping feature sets. If the host and container are running different MPI implementations, or even different versions of the same implementation, hilarity may ensue.

The general rule is that you want the version of MPI inside the container to be the same version or newer than the host. You may be thinking that this is not good for the portability of your container, and you are right. Containerizing MPI applications is not terribly difficult with Singularity, but it comes at the cost of additional requirements for the host system.

Many HPC Systems have highspeed, low latency networks that have special drivers. Infiniband, Ares, and OmniPath are three different specs for these types of networks. When running MPI jobs, if the container doesn’t have the right libraries, it won’t be able to use those special interconnects to communicate between nodes.

Because you may have to build your own MPI enabled Singularity images (to get the versions to match), here is a 2.3 compatible example of what it may look like:

  # Copyright (c) 2015-2016, Gregory M. Kurtzer. All rights reserved.
  # 
  # "Singularity" Copyright (c) 2016, The Regents of the University of     California,
  # through Lawrence Berkeley National Laboratory (subject to receipt of any
  # required approvals from the U.S. Dept. of Energy).  All rights reserved.
  
  BootStrap: debootstrap
  OSVersion: xenial
  MirrorURL: http://us.archive.ubuntu.com/ubuntu/
  
  
  %runscript
      echo "This is what happens when you run the container..."
  
  
  %post
      echo "Hello from inside the container"
      sed -i 's/$/ universe/' /etc/apt/sources.list
      apt update
      apt -y --allow-unauthenticated install vim build-essential wget     gfortran bison libibverbs-dev libibmad-dev libibumad-dev librdmacm-dev     libmlx5-dev libmlx4-dev
      wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/    mvapich2-2.1.tar.gz
      tar xvf mvapich2-2.1.tar.gz
      cd mvapich2-2.1
      ./configure --prefix=/usr/local
      make -j4
      make install
      /usr/local/bin/mpicc examples/hellow.c -o /usr/bin/hellow

You could also build in everything in a Dockerfile and convert the image to Singularity at the end.

Once you have a working MPI container, invoking it would look something like:

  mpirun -np 4 singularity exec ./mycontainer.img /app.py arg1 arg2

This will use the host MPI libraries to run in parallel, and assuming the image has what it needs, can work across many nodes.

For a single node, you can also use the container MPI to run in parallel (usually you don’t want this)

  singularity exec ./mycontainer.img mpirun -np 4 /app.py arg1 arg2

Singularity and GPU Computing

GPU support in Singularity is (usually) fantastic.

Since Singularity supports Docker containers, it is usually fairly simple to utilize GPUs for machine learning code like TensorFlow. From Maverick, which is TACC’s GPU system:

  # Work from a compute node
  srun -I --pty -p sb.q -t 30 /bin/bash
  # Load the singularity module
  module load singularity
  # Pull your image
  singularity pull docker://nvidia/caffe:latest
  
  singularity exec --nv caffe-latest.img caffe device_query -gpu 0

Please note that the –nv flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected.

  singularity exec caffe-latest.img caffe device_query -gpu 0

For TensorFlow, you can directly pull their latest GPU image and utilize it as follows.

  # Change to your $WORK directory
  cd $WORK
  #Get the software
  git clone https://github.com/tensorflow/models.git ~/models
  # Pull the image
  singularity pull docker://tensorflow/tensorflow:latest-gpu
  # Run the code
  singularity exec --nv tensorflow-latest-gpu.img python $HOME/models/tutorials/image/mnist/convolutional.py

You probably noticed that we check out the models repository into your $HOME directory. This is because your $HOME and $WORK directories are only available inside the container if the root folders /home and /work exist inside the container. In the case of tensorflow-latest-gpu.img, the /work directory does not exist, so any files there are inaccessible to the container.

You may be thinking “what about overlayFS??”. Stampede2 supports it, but the Linux kernel on the other systems does not support overlayFS, so it had to be disabled in our Singularity install. This may change as new Singularity versions are released.

Exercise 2 (5-10 minutes)

On the Hawaii HPC cluster, try running a batch analysis that classifies one or more images using the image classifier container you created in Exercise 1. You will need: