8. Eureka documentation

A collection of Topics about working with the Eureka HPC Cluster.

The Eureka HPC service consists of our main, shared, HPC clusters. This means that they are open for use to anyone at the University, whereas other clusters at Surrey are owned by and reserved for the private use of certain research groups at the University.

If you would like to request access to the eureka clusters please see Requesting access to HPC

If you are looking to purchase some HPC compute resources for your research we would always encourage you to invest in Eureka, see Purchasing HPC servers for more information.

8.1. Eureka cluster overview

Eureka is a heterogenous cluster. It’s running Centos 7 Linux for its Operating system.

The compute resources on Eureka can be utilised for a wide variety of workloads, including large parallel jobs, high memory jobs and it has a small amount of GPU capabilities too.

Eureka Cluster Specification

OS

CentOS 7

Fabric

Intel Omni-Path (OP) and Infiniband (IB)

Parallel storage

BeeGFS 56TB

Standard Storage

NFS 7.5TB

Scheduler/Queue

Slurm

1 x Login Node

eureka.surrey.ac.uk

16 x CPU node

Intel Xeon E5-2660 v4 @ 2.0 GHz

128GB RAM

Omni-Path

38 x CPU node

Intel Xeon Gold 5120 @ 2.20 GHz

192GB RAM

13 x CPU node

Intel Xeon E5-2680 v2 @ 2.80 GHz

128GB RAM

2 x High Mem node

Intel Xeon Gold 5120 @ 2.20 GHz

375GB RAM

8 x CPU node

Intel Xeon E5-2470-0 @ 2.30 GHz

64GB RAM

Infiniband

12 x CPU node

Intel Xeon E5-2670 @ 2.60 GHz

64GB RAM

2 x CPU node

Intel Xeon E5-2670-v2 @ 2.50 GHz

128GB RAM

6 x CPU node

Intel Xeon E5-2697-v2 @ 2.70 GHz

128GB RAM

7 x High Mem node

Intel XEON E5-2670 v2 @ 2.50GHz

256GB RAM

1 x High Mem node

Intel XEON E5-2670 v1 @ 2.60GHz

256GB RAM

3 x GPU node

Intel XEON E5-2670 v2 @ 2.50GHz

256GB RAM

Nvidia Tesla K20m

../_images/eureka-diagram.png

Cluster topology diagram.

Software

The Eureka clusters host a wide variety of software and development tools relevant to Science, Engineering and Statistical computing workloads. Compilers such GCC and Intel, scripting languages such as Julia, R and Python as well as a wide range of standard software such as matlab, mathematica, lammps, castep and gromacs plus many more…

If there is an application you need installed on the cluster please submit a request. See Request new software

If you have your own builds of software, you are free to build and use your own version in your hpc storage areas.

8.2. Using Slurm on Eureka

This section covers some of the specifics of using slurm on the Eureka Clusters.

For general information about the Slurm scheduler please see Slurm - job scheduler

8.2.1. Eureka partitions (Queues)

On Eureka there are 5 different partitions (Queues) to which you can submit your jobs: shared, high-mem, gpu, debug_all and debug_latest.

  • The shared partition contains most nodes on Eureka

  • high-mem contains a few nodes with a large mem/core ratios

  • gpu contains a few nodes with GPU cards

  • debug_latest contains a node with the newest cpu features like avx2 and avx512

  • debug_all contains a node which should run code which have been installed without avx2+ support

The configurations of the partitions are summarised in the table below:

Name

Node count

time limit

purpose

debug_all

1

60 min maximum

Debugging jobs that can eventually run across all nodes.

debug_latest

1

60 min maximum

Debugging jobs that can eventually run on latest nodes, hence take advantage of newest features.

gpu

3

1 day default, 1 week maximum

Run jobs that require GPU cards.

high-mem

10

1 day default, 1 week maximum

Run jobs with high memory requirements. Current threshold is nodes with >=12 GB/core.

shared

90

1 day default, 1 week maximum

Day to Day production jobs.

Eureka is a heterogenous cluster in which we have different types of nodes and 2 different low latency network fabrics. This influences how you use the cluster and the resources you ask for in your Slurm job scripts.

There are two different low latency network fabrics, Intel Omni-Path (op) and Infiniband (ib). Most nodes on the op fabric are the newer nodes which support a minimum of avx2 instruction sets. Nodes on the ib fabric do not support avx2 instructions.

If a piece of software can only run on avx2 enabled nodes this is usually indicated in the modules name and so you should know when you load the module. In many cases this is not an issue, where programs will simply run on any node regardless of instruction set. Futhermore, many nodes have a different numbers of cores, where some have 28, 24, 20 or 16.

Note

If you are running multi-node parallel jobs, you will need to consider which fabric you are using op or ib. Jobs cannot run on two different fabrics e.g. you cannot use a mixtures of nodes from the ib and op fabric.

8.2.2. How to Submit Jobs to the right nodes

To allow users to submits jobs to the correct type of nodes we have enabled the #SBATCH --constraints directive to be used in SLURM. This allows you submit your jobs to the specific types of nodes you wish to run your job on via their features.

To see the sets of features and whats available, the command sinfo -o "%R %.6D %.4c  %.6m %.30f" | column -t or showcluster can be used to give a summary of all nodes, the number of cores they have, their memory, the partitions they belong and the features you use can user to select them via the constraint.

[abc123@login7(eureka) Python_example]$ showcluster
PARTITION     NODES  CPUS  MEMORY  AVAIL_FEATURES
shared        8      16    64216   e5-2470-0,v1,ib
shared        33     28    191678  gold-5120,avx2,avx512,op
shared        2      20    128679  e5-2670-v2,v2,ib
shared        13     20    128706  e5-2680-v2,galaxy,op
shared        16     28    128658  e5-2660-v4,avx2,v4,op
shared        6      24    128711  e5-2697-v2,v2,ib
shared        12     16    64171+  e5-2670,v1,ib
debug_latest  1      28    191908  gold-5120,avx2,avx512,op
debug_all     1      16    64216   e5-2470-0,v1,ib
high_mem      2      28    385204  gold-5120,avx2,avx512,op
high_mem      7      20    257695  e5-2670-v2,v2,ib
high_mem      1      16    257695  e5-2670,v1,ib
gpu           3      20    257695  e5-2670-v2,ib

The above can be quite daunting for most to find the correct options for #SBATCH --constraints initially, however most users will normally fall into a few sets of scenarios when submitting jobs to Eureka. Note: nodes have more than one feature, so if you were you ask for avx2 nodes this would include any node with that feature. Also the number of CPU cores a nodes has is important as well.

  • Case 1 #SBATCH --constraint=[ib|op]

Request nodes exclusively on Omni-Path fabric OR Infiniband fabric. This for when you want to run a parallel multi-node job and you don’t care about cpu instruction set.

  • Case 2 #SBATCH --constraint=avx512

Request nodes only with avx512, this for when you want to run a single node or parallel multi-node job and you want nodes with the avx512 instruction set.

  • Case 3 #SBATCH --constraint=avx2

Request nodes at least with the avx2 feature, this for when you want to run a single node or parallel multi-node job and you want nodes with at least the avx2 instruction set.

  • Case 4 #SBATCH --constraint=ib

Request nodes only on the Infiniband network fabric. This for when you want to run a parallel multi-node jobs on nodes only on the infiniband fabric.

  • Case 5 #SBATCH --constraint=op

Request nodes only on the omni-path network fabric. This for when you want to run a parallel multi-node jobs on nodes only on the omni-path fabric.

  • Case 6 #SBATCH --constraint=e5-2660-v4

Request nodes only with the e5-2660-v4 cpu model, this for when you want to run a single node or parallel multi-node job and want your job to run on these specific nodes.

  • Case 7 #SBATCH --constraint="e5-2660-v4|gold-5120"

Request nodes with the e5-2660-v4 OR gold-5120 cpu model, this for when you want to run a single node or parallel multi-node job and want run on mixture nodes which have the e5-2660-v4 or gold-5120 cpu.

8.2.3. Consider the number of cores on the nodes

When requesting for jobs on different sets of nodes it is important to take into account the number of cores on the nodes that you have requested.

For example, if you request nodes only on ib, and your are running a large parallel job, depending on your jobs requirements, you could ask for a maximum of 16 cores per node so you can utilise any node on ib fabric. If you were to ask for 24 cores per node you would restrict yourself to only 6 nodes on the ib fabric.

Alternatively you can use #SBATCH --ntasks= to specify the total number of cores, rather than specifying #SBATCH --nodes=2 and #SBATCH --ntasks-per-node=10 this will allow you maximise the number of cores you can use, since SLURM will allocate you cores on any node. Using this will depend on the type of job you are doing and whether balancing of workloads per node is important for your simulations.

For example, Instead of the below:

#!/bin/bash

#SBATCH --partition=shared
#SBATCH --job-name="hello"
#SBATCH --nodes=2            #<----- Request 2 nodes
#SBATCH --ntasks-per-node=10 #<----- Request 10 core per node so there must be 2 nodes with 10 cores available
#SBATCH --time=00:05:00
#SBATCH --constraint=[ib|op]
#SBATCH --mem=2G
#SBATCH --output=helloworld.out

cd $SLURM_SUBMIT_DIR

module load helloworld/1.1

mpirun -np 20 helloworld

echo $SLURM_NODELIST > nodes

You could use:

#!/bin/bash

#SBATCH --partition=shared
#SBATCH --job-name="hello"
#SBATCH --ntasks=20     #<--------  20 cores anywhere they can be found, no "per node" restriction
#SBATCH --time=00:05:00
#SBATCH --constraint=[ib|op]
#SBATCH --mem=2G
#SBATCH --output=helloworld.out


cd $SLURM_SUBMIT_DIR

module load helloworld/1.1

mpirun -np 20 helloworld

echo $SLURM_NODELIST > nodes

You can submit to ranges of nodes, slurm allows specifying a range of number of nodes, e.g. --nodes=2-15. This means that your job will start as soon as at least two nodes are available, however if 10 nodes are available, you will be allocated 10 nodes.

#!/bin/bash

#SBATCH --partition=shared
#SBATCH --job-name="hello"
#SBATCH --nodes=2-15
#SBATCH --ntasks-per-node=18
#SBATCH --time=00:01:00
#SBATCH --constraint=op
#SBATCH --mem=2G
#SBATCH --output=helloworld.out

cd $SLURM_SUBMIT_DIR

module load helloworld/1.1

NTASKS=$[$SLURM_NTASKS_PER_NODE*$SLURM_JOB_NUM_NODES]
mpirun -np $NTASKS  helloworld

8.2.4. Eureka Accounting and Usage

http://eureka-monitor.eps.surrey.ac.uk/xdmod/

http://eureka-monitor.eps.surrey.ac.uk/ganglia

8.3. Eureka quick start

To help users get started quickly, we have created a repository of working submission script examples for a variety of programs currently on the cluster, and hosted them on GitLab.

If you would like to contribute any examples scripts yourself please let us know.

The repository can be accessed at: https://gitlab.surrey.ac.uk/rcs/eureka-examples

If you are logged into Eureka and have setup your GitLab account and configured SSH access to it, you can clone the repository into your space, with the command below:

$ git clone git@gitlab.surrey.ac.uk:rcs/eureka-examples.git

The repository contains a variety of scripts:

  1. Example starter scripts (ready to be customised):

    • Interactive jobs

    • Raw Submission scripts

    • Raw Array job scripts

  2. A few ready made submission scripts & example inputs for specific software:

    • Lammps

    • matlab

    • Castep

    • and more…

  3. Introductory Example “hello world” exercise to get started with submitting jobs to Eureka.

8.4. Job priority and “Fairshare” on Eureka

8.4.1. Fairshare

On Eureka, each user is associated with a Slurm - job scheduler account usually related to their research group, department or Faculty. Users belong to accounts, and accounts have shares associated with them based on their contribution/investment to Eureka. These shares determine how much of the cluster that research group/department has invested.

Different groups over the university have contributed different amounts of resources to Eureka. In order to serve the great variety of groups and the contribution/investment, a method of fairly adjudicating job priority is required. This is the goal of Fairshare. Fairshare allows those users who have not fully used their contribution/investment to get higher priority for their jobs on the cluster over jobs by groups that have used more than their contribution/investment. The cluster is a limited resource and Fairshare allows us to ensure everyone gets a fair opportunity to use it regardless of how big or small the group is.

8.4.1.1. Fairshare example

To see how much your group/account has used of their fairshare, the following command sshare -a --account=<account name> can be used to show a summary of this information.

The account we use in this example is chemistry. The first line of the sshare gives the summary for the whole account, with the additional lines giving a summary per user on the account.

example output from a sshare command
[abc123@login7(eureka) chem_reservation]$ sshare -a --account=chemistry
            Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ---------
chemistry                            8423    0.112157   735603420      0.169374   0.351074
chemistry                bobby          1    0.007477           0      0.011292   0.351074
chemistry                 tony          1    0.007477           0      0.011292   0.351074
chemistry                susan          1    0.007477   203592409      0.055052   0.006076
chemistry                user1          1    0.007477           0      0.011292   0.351074
chemistry                user2          1    0.007477           0      0.011292   0.351074
chemistry              someguy          1    0.007477           0      0.011292   0.351074
chemistry              chemist          1    0.007477          38      0.011292   0.351074
chemistry                user3          1    0.007477   517403742      0.122502   0.000012
RawShares:

Chemistry has 8423 RawShares. Each user of that lab has a RawShare of its parent, this means that all the users in chemistry pull from the total Share of the Account and do not have their own individual subShares of the account Share. Thus all users in this lab have full access to the full Share of the Account.

NormShares:

NormShares is the chemistry account’s RawShares divided by the total number of RawShares given out to all accounts on the cluster. NormShare is the fraction of the cluster the account has been contributed/invested, for the chemistry account this is about 11.21% of Eureka.

RawUsuage:

RawUsage is the amount of usage of the account/user has used on Eureka. This RawUsage is also adjusted by the halflife that is set for the cluster which is 30 days. This means that usage in the last 30 days counts at full cost, from 60 days ago costs half, usage 90 days ago one fourth. So RawUsage is the aggregate of the account’s past usage with this halflife weighting factor. The RawUsage for the account is the sum of the RawUsage for each user, thus sshare is an effective way to figure out which users have contributed the most to the account’s score.

EffectvUsage:

EffectvUsage is the account’s RawUsage divided by the total RawUsage for the cluster. Thus EffectvUsage is the percentage of the cluster the account has actually used. For chemistry they have used 16.9% of the cluster.

Fairshare:

The Fairshare score is calculated using the following formula. f = 2^(-EffectvUsage/NormShares) from this number, we can assess how much an account is using of their contribution/investment in Eureka.

1.0:

Un-used. The account has not run any jobs recently.

1.0 > f > 0.5:

Under-utilization. The account is under-utilizing their share. For example, if the fairshare score is 0.75 an account has recently underutilized their share of the resources 1:2.

0.5:

Average utilization. The account on average is using exactly as much as their share.

0.5 > f > 0:

Over-utilization. The account has overused their share. For example, if the fairshare score is 0.25 an account has recently over-utilized their share of the cluster 2:1.

0:

No share left. The account has vastly overused their share.

8.4.2. Job priority

Individual job priority are calculated based on an account’s fairshare and a jobs age. Job Priority is an integer number that adjudicates the position of a job in the pending queue relative to other jobs. The first component to job priority is FairShare score. The second component is Job Age. This priority accrues over time gaining a maximum value at 7 days. As the job sits in the queue waiting to be scheduled, its priority is gradually increasing due to the job’s age. Thus even jobs from accounts that have low priority will eventually run due to the growth in their job age priority.

These two components are put together to make up an individual job’s priority. You can see this for specific jobs by using the sprio command.

Important

Fairshare does not stop jobs from running, it only influences their priority over other jobs. There is no quotas or limits on how much a user can submit/run on eureka and all jobs submitted will eventually run.

Note

This material was based on https://www.rc.fas.harvard.edu/resources/documentation/fairshare/ explanation of fairshare.

8.5. Matlab on Eureka

Some notes on using Matlab specifically in the context of the Eureka Cluster.

Note

Unlimited workers for parallel server is currently set up for Matlab 2019a/2019b/2020a.

Tip

Setting up Matlab to submit to the cluster via its GUI can be tricky, please seek Get HPC support if you need assistance/advice.

8.5.1. Submitting Matlab jobs

8.5.1.1. Slurm submission script

Matlab’s code can be run straight off the command line and therefore be submitted to be executed on the cluster in a slurm submission script. This allows you to bypass using the GUI and simply run the code you have developed on the cluster.

Examples can be found in our git repo: https://gitlab.surrey.ac.uk/rcs/eureka-examples

8.5.1.2. Matlab GUI

If you access the Eureka cluster via our remotelabs web portal you can access the Matlab graphical user interface.

Note

see RemoteLabs web portal for more information on how to access the web portal

When using the Matlab GUI with remotelabs, you can add a Eureka cluster profile, enabling you run jobs on the cluster and use parallel pools on the cluster for parallel for loops etc. In order to use this functionality, you need to set up the cluster profile as shown below:

  1. Launch Matlab by opening a terminal in the Remote Desktop session, loading the Matlab R2019b module, and then execute the Matlab command.

    [abc123@vis1(eureka) ~]$ module load matlab/R2019b
    [abc123@vis1(eureka) ~]$ matlab
    MATLAB is selecting SOFTWARE OPENGL rendering.
    
  2. Create a slurmprofile for the cluster:

    From Matlab’s top task bar: Environment > Parallel > create and manage clusters > Add Cluster Profile > Slurm. This will create blank slurm profile called slurmprofile1.

    ../_images/matlab-clustermenu.png

    ../_images/matlab-clusterprofile.png

  3. Select slurmprofile1 on the left, right click and rename this to “eureka”. Once done select EDIT, set the following settings below as shown and leave the rest unchanged, once done click Done.

    ../_images/cmr.png

    ../_images/nw.png

    ../_images/rt.png

    ../_images/nwr.png

  4. Once the above setup, you can validate the above by clicking the validation button, if the above is correctly set, all validations test should pass. As shown below:

    ../_images/con-test.png

8.5.1.3. Remote job submission

Caution

This method is experimental and not yet fully tested

It is possible to submit matlab jobs directly through matlab to run on Eureka. For more information please see the following: https://www.mathworks.com/help/parallel-computing/batch-processing.html.

In order to submit jobs to Eureka you must first do the following:

  1. On the matlab prompt, straight after matlab is opened you must set your fully qualified hostname:

    >> pctconfig('hostname','myhostname.eps.surrey.ac.uk')

  2. Some example submit code:

    • create parallel code example to work with

      >> edit parallel_example
      tic ;
      parfor (i=1:5000) ;
      c(:,i) = eig(rand(1000));
      end ;
      toc ;
      delete(gcp)
      
    • Submit the code to run on Eureka

      >> c = parcluster('Eureka');                 %Choose parallel pool to use
      >> c.AdditionalProperties.time = '24:00:00'; %Setup up Time for Jobs
      >> c.AdditionalProperties.constraints = "[ib|op]"; %Setup constraints to run job
      >> j = c.batch('parallel_example','Pool',10); %Submit Job to Eureka to run on 10 workers(cores)
      >> diary(j)                                   %Check output from job
      
  3. A second example:

    • Open an interactive parallel pool, run your code, then delete the pool.

      >> parpool('Eureka',10)                     %startup parallel pool on eureka with 10 cores/workers
      >> tic ; parfor (i=1:5000) ; c(:,i) = eig(rand(1000)); end ; toc
      >> delete(gcp)                              %deletes all parallel pools
      

8.5.2. Monitoring a Matlab job

Jobs submitted to the cluster to run, can be monitored and managed through Matlab too. The monitoring window can be accessed via the top task bar: Environment > Parallel > Monitor Jobs.

../_images/monitor-jobs.png