Running MPI or OpenMP jobs¶
What is MPI?¶
MPI stands for Message Passing interface and is essentially a standardised means of exchanging messages between multiple computers running a parallel program across distributed memory.
When you run a job across multiple computes nodes in a HPC cluster each node is a separate physical computer and typically each node works on a portion of the overall computing problem. The challenge then is to synchronize the actions of each parallel node, exchange data between nodes, and provide command and control over the entire parallel cluster.
The message passing interface defines a standard suite of functions for these tasks.
Running an MPI Job¶
For non MPI Jobs we have been using either the combination of --nodes and --ntasks-per-node or --ntasks
slurm options (with our sbatch job scripts/submit files or the srun command)
to specify the number of cores we would like use. Another option which can be used is --cpus-per-task.
With these options there are multiple different ways of allocating nodes, but there are several ways to get the same or similar allocation of resources for your job.
For example, the following: --nodes=3 --ntasks=3 --cpus-per-task=3 is equivalent in terms of resource allocation to
--ntasks=9 --ntasks-per-node=3 but seen differently by slurm and MPI: where the first case,
3 processes are launched and in the second case 9 processes are launched.
examples¶
Consider the following examples where 9 cores are required, there are variety of scenarios to ask for these 9 cores depending on if you are doing MPI (or Distributed jobs) or OpenMP(single node parallel jobs).
MPI (or distributed)¶
you use mpi and do not care about where those cores are distributed:
--ntasks=16you want to launch 16 independent processes (no communication):
--ntasks=16you want those cores to spread across distinct nodes:
--ntasks=16 --ntasks-per-node=1 or --ntasks=16 --nodes=16you want 10 processes to spread across 5 nodes to have two processes per node:
--ntasks=10 --ntasks-per-node=2you want 10 processes to stay on the same node:
--ntasks=10 --ntasks-per-node=10