Running Jobs¶
Batch jobs (sbatch)¶
sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified,
sbatch will read in a script from standard input. The batch script may contain options preceded with “#SBATCH” before any executable commands in the script.
sbatch will stop processing further #SBATCH directives once the first non-comment non-whitespace line has been reached in the script.
Job scripts¶
A job script (a.k.a. submit file) is a plain text file in which you specify and request cluster resources and list, in sequence,
the commands that you want to execute (like applications) as you would on the command prompt.
The script is passed to the sbatch command which will submit the script to the slurm scheduler.
Below are examples of a Slurm Job script on various clusters, it is a text file called slurm_test.sub .
#!/bin/bash
#SBATCH --partition=shared #Selecting “shared” Queue
#SBATCH --job-name="hello" #Name of Jobs (displayed in squeue)
#SBATCH --nodes=2 #No of nodes to run job
#SBATCH --ntasks-per-node=10 #No of cores to use per node
#SBATCH --time=00:05:00 #Maximum time for job to run
#SBATCH --mem=2G #Amount of memory per node
#SBATCH --output=slurm.%N.%j.out #Output file for stdout (optional)
#SBATCH --error=slurm.%N.%j.err #Error file for stderr (optional)
cd $SLURM_SUBMIT_DIR #Change to submission directory
module load helloworld/1.1 #Load up hello module for program to run
mpirun -np 20 helloworld #Execute myprogram on 20 cores with mpi comes from 20 = ntasks-per-node * nodes
echo $SLURM_NODELIST > nodes #Record the nodes the code runs on to file nodes
Many more examples are available for you to use in gitlab: https://gitlab.surrey.ac.uk/rcs/eureka-examples
#!/bin/bash
#SBATCH --partition=2080ti #Selecting partition based on GPU type
#SBATCH --job-name="hello" #Name of Jobs (displayed in squeue)
#SBATCH --nodes=1 #No of nodes to run job
#SBATCH --cpus-per-task=10 #No of cores to use per node (default - 4 CPU cores per GPU selected)
#SBATCH --time=00:05:00 #Maximum time for job to run
#SBATCH --mem=20G #Amount of memory per node
#SBATCH --gpus=2 #Selecting 2 x GPUs
#SBATCH --output=slurm.%N.%j.out #Output file for stdout (optional)
#SBATCH --error=slurm.%N.%j.err #Error file for stderr (optional)
# If using an apptainer image
apptainer exec oras://container-registry.surrey.ac.uk/shared-containers/pytorch:latest python mnist_pytorch.py > torch.out
# If using a docker image
# apptainer exec docker://<container_image_url> python mnist_pytorch.py > torch.out
Many more examples are available for you to use in gitlab: https://gitlab.surrey.ac.uk/rcs/ai-surrey-slurm-examples
The #SBATCH directives in your job script (or submit file) define the resources requested for compute jobs (use what you need to describe the resources you require).
The general format of these is as follows: #SBATCH --“<option>”=“<value>”.
Not all of these directives have to be specified, if one is missed, a default will be given upon submission. These must always be at the top of the file without any gaps.
Tip
Here are some example job scripts for the AISURREY cluster to help get you started.
sbatch options¶
Some common options you might want to use in your job submit file:
- –nodes=<number>:
Number of nodes requested
- –ntasks-per-node=<number>:
Number of processes to run per node
- –ntasks:
Total number of processes
- –mem=<number>:
Total memory per node
- –mem-per-cpu=<number>:
Total memory per core
- –constraint=<attribute>:
Node property to request (e.g. avx, IB,OP)
- –partition=<partition_name>:
Request specified partition or queue
- –job-name=<myjobname>:
Name of Job
- –error=<slurm.err>:
Print out file for slurm errors
- –output=<example.out>:
Specify output file for stout
- –time=<hh:mm:ss>:
Define time jobs will run
- –exclusive:
Exclusive access to node
- –gpus-per-node=2:
Allows requestion of GPU resources
Slurm environment variables¶
$SLURM_XXXX are useful built-in environment variables from Slurm that you can put into your scripts to make
them more automated and transferable. In the helloworld example script above, the slurm environment variable $SLURM_SUBMIT_DIR
is used so that when this jobs runs, it will change to the directory from where its submitted before it runs anything.
Environment Variables:
- $SLURM_JOB_ID:
ID of job allocation
- $SLURM_SUBMIT_DIR:
Directory job where was submitted
- $SLURM_JOB_NODELIST:
File containing allocated host names
- $SLURM_NTASKS:
Total number of cores for job
- $SLURM_JOB_ID:
ID of job allocation
- $SLURM_ARRAY_TASK_ID:
Index for array task
Submit a batch job¶
Once you have created your job script/submit file, you need to submit it to the job queue for the job to run.
This is done using the command sbatch <job_script>.
For example to submit the previous job script we made slurm_test.sub:
[abc123@login1(eureka2) ~]$ sbatch slurm_test.sub
Submitted batch job 40145
Once submitted, your job is allocated a job id number, which is used to reference a job and interact with it after it has been submitted.
Tip
to learn how to cancel a running job or other job management commands see managingjobs
Interactive jobs (srun)¶
An interactive job will connect you to an interactive shell on a compute node(s). This can be a helpful as a debugging tool for creating job scripts for batch job submission in a test scenario. It allows you to experiment on compute nodes with command options, and environmental variables, providing immediate feedback (helpful in determining your workflow!).
Note
An interactive job will not bypass the queue, the job will be submitted to the slurm scheduler and will be assigned to a compute node in the same way a batch job is.
Submit an interactive job¶
To run an interactive job you will need to use the srun command.
This is used to get slurm to allocate resources after which you can ssh into the node(s) allocated to do interactive work.
Resources for interactive sessions can be allocated using the same options as sbatch shown in sbatch options
by adding them as arguments to the srun command e.g. srun --“<option>”=“<value>” --“<option>”=“<value>” --pty bash.
[abc123@login1(eureka2) ~]$ srun -N 1 --exclusive --constraint=avx2 --time=02:00:00 --pty bash
[abc123@node14(eureka) ~]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
108098 shared bash abc123 R 0:05 1 node14
[abc123@node14(eureka) ~]$ exit
exit
[abc123@login1(eureka2) ~]$ squeue -u abc123
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[abc123@login1(eureka2) ~]$
srun can also be used to run commands interactively and then immediately close the allocation,
this can be done if the srun command is executed with command at the end of it
srun --“<option>”=“<value>” --“<option>”=“<value>” <command to run>.
[abc123@login1(eureka2) ~]$ module load helloworld/1.1
[abc123@login1(eureka2) ~]$ srun -N 1 --constraint=avx2 --time=02:00:00 mpirun -np 28 helloworld
Hello world from processor node19.swmgmt.eureka, rank 1 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 4 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 5 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 9 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 12 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 16 out of 28 processors
Hello world from processor node19.swmgmt.eureka, rank 17 out of 28 processors
Tip
to learn how to cancel a running job or other job management commands see managingjobs
Array jobs (submitting a batch of jobs)¶
Often you may need to submit hundreds of jobs over a list or an index. In these cases you should avoid creating and submitting 100s of separate job scripts. Instead, you should submit a 100 jobs in one job script. This is done through Array Jobs. Array Jobs are a way to submit jobs and collectively manage sets of jobs that are similar in a quick and compact manner.
The below example demonstrates an Array job, where the job is submitted for the index 1 to 16,
specified by the #SBATCH --array=1-16 directive, the index is then referred to in the script
through the slurm environment variable $SLURM_ARRAY_TASK_ID.
#!/bin/bash
#SBATCH --job-name=array
#SBATCH --array=1-16 #Array job indices/range for $SLURM_ARRAY_TASK_ID (can be incremented if desired)
#SBATCH --time=01:00:00
#SBATCH --partition=shared
#SBATCH --ntasks=1
#SBATCH --mem=4G
#SBATCH --error=array_%A_%a.err #Error file label by job number and index.
# Print the task id.
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID > test_"$SLURM_ARRAY_TASK_ID"
In the example script above, the %A_%a notation is filled in with the master job id (%A) and the array task id (%a).
This is a simple way to create output files in which the file name is different for each job in the array.
There are different ways of specifying the arrays indices, depends on the jobs you have, examples are shown below:
# A job array with array tasks numbered from 0 to 31.
#SBATCH --array=0-31
# A job array with array tasks numbered 1, 2, 5, 19, 27.
#SBATCH --array=1,2,5,19,27
# A job array with array tasks numbered 1, 3, 5 and 7.
#SBATCH --array=1-7:2
To cancel an entire array job:
scancel <Job ID Number>To cancel a specific task in an array job:
scancel <jobid>_<taskid>To cancel a range of specific tasks in an array job:
scancel <jobid>_[<taskid>_<taskid>]
Job dependencies¶
The sbatch can also be utilised to assist workflows that involve multiple steps or when you are using checkpoints:
the --dependency option allows you to launch jobs on the condition of completion (or successful completion) of another job.
To submit a job to start after the completion of specified job the following can be used: sbatch --dependency=afterok:<Job Number> <Job script>.
For example, the below submits the jobs script dependant_job.sub so that it will only start upon the completion of job number, 106178.
[abc123@login1(eureka2) ~]$ sbatch --dependency=afterok:106178 dependant_job.sub
Benchmarking and scaling¶
Benchmarking and scaling are very important aspects when running simulations on HPC clusters, it can help maximise your output and minimise the wastage of resources.
One of the most important things to do before running production workloads is to benchmark your problem against the number of cores you might use to ensure you asking for the correct amount of resources.
Below are plots of a benchmarking exercise of a DFT B3LYP energy calculation of 2 molecules:
Both plots shows the time of the calculation as a function of the number of cores, and it can be see than it scales quite well leading to significant increases in time up to 20 cores in both cases.
However in these examples after 20 cores there is no improvement and therefore no benefit in running the calculation of anymore cores.
In fact, requesting more cores in your slurm job script would:
Waste resources which could be allocated to another job.
Although not shown here asking for too many cores can result in a significant slowdown in a calculation due to the parallel overhead, turning the above plots into U-plots!
Tip
Having the scaling of your problem for every simulation is not necessary, but using model problems to inform yourself is a very good idea, so you can gauge the correct amount of resources to request when running day to day jobs.