6. Eureka2 documentation

A collection of Topics about working with the Eureka2 HPC Cluster.

The Eureka HPC service consists of our main, shared, HPC clusters. This means that they are open for use to anyone at the University, whereas other clusters at Surrey are owned by and reserved for the private use of certain research groups at the University.

If you would like to request access to the eureka clusters please see Requesting access to HPC

If you are looking to purchase some HPC compute resources for your research we would always encourage you to invest in Eureka, see Purchasing HPC servers for more information.

6.1. Eureka2 cluster overview

Eureka2 is our most modern shared HPC cluster. Its open to use for anyone at the University of Surrey.

Eureka2 is currently a homogenous cluster however this is likely not going to remain the case as we add new nodes and partitions in the future.

6.1.1. Cluster specification

Eureka2 cluster specification

OS

Rocky Linux 8

Fabric

Mellanox Infiniband EDR 100Gb/s

Parallel storage

BeeGFS 70TB

Standard Storage

NFS 30GB (Personal Quota)

Scheduler/Queue

Slurm

Open OnDemand

https://eureka2-ondemand.surrey.ac.uk

Login Node

eureka2.surrey.ac.uk

32 x CPU node

2x AMD EPYC 7452 @ 3.3 GHz

512GB RAM

4 x High memory node

2x AMD EPYC 7452 @ 3.3 GHz

2TB RAM

2 x GPU node

2x AMD EPYC 7513 @ 3.7 GHz

512GB RAM

3x A100 80GB

Total CPU Cores

2400+

Total Memory

25TB

6.1.2. Software

The Eureka clusters host a wide variety of software and development tools relevant to Science, Engineering and Statistical computing workloads. Compilers such GCC and Intel, scripting languages such as Julia, R and Python as well as a wide range of standard software such as matlab, mathematica, lammps, castep and gromacs plus many more…

If there is an application you need installed on the cluster please submit a request. See Request new software

If you have your own builds of software, you are free to build and use your own version in your hpc storage areas.

6.1.3. BeeGFS Parallel scratch storage

The BeeGFS filesystem has been tuned to ensure we are getting the best performance possible from the system.

The optimum configuration for performance yielded the following results in benchmark tests:

Peak write

Peak read

Agg write @128 threads

Agg read @128 threads

Single thread read

Single thread write

47 GB/s

48 GB/s

46.3 GB/s

48 GB/s

3.6 GB/s

3.6 GB/s

These results are based on sequential read/writes and an “N to N” files to thread ratio (a file per thread).

../_images/beegfs_sequentialIO.png

Sequential IO showed no significant performance improvement increasing the number of threads beyond 128.

We conducted similar benchmarks using random read/writes (rather than sequential) and this yielded interesting results some continued performance gains beyond 128 threads.

../_images/beegfs_randomIO.png

Random IO showed some continued performance improvement beyond 128 threads.

The Eureka 2 BeeGFS storage summary:

  • 2 storage servers

  • 48 NVME drives

  • 6 storage targets per server

  • Total usable capacity of 70 TB

6.2. Using Slurm on Eureka2

This section covers some of the specifics of using slurm on the Eureka Clusters.

For general information about the Slurm scheduler please see Slurm - job scheduler

6.2.1. Eureka2 partitions (Queues)

On Eureka2 there are 4 different partitions (Queues) to which you can submit your jobs: shared, debug, high_mem and gpu.

The configurations of the partitions are summarised in the table below:

Name

Node count

Time limit

Purpose

debug

2

4 hrs default, 8 hrs maximum

Debugging jobs that can eventually run across all nodes.

shared

30

1 day default, 1 week maximum

Day to Day production jobs.

high_mem

4

1 day default, 1 week maximum

Jobs that require a large amount of memory.

gpu

2

1 day default, 1 week maximum

Jobs that require GPU compute.

6.2.2. Eureka2 accounting and usage

XDMoD: https://eureka2-xdmod.surrey.ac.uk/

  • A graphical user interface with extensive graphic and analytical capability.

  • Detailed utilization metrics including number of jobs, CPU hours, wait times, job size, etc.

  • Customizable Metric Explorer where users can generate custom plots comparing multiple metrics

  • A custom report builder for the automatic generation of detailed periodic reports.

6.3. Eureka2 quick start

To help users get started quickly, we recommend using the eureka2-ondemand web interface: https://eureka2-ondemand.surrey.ac.uk

6.4. Eureka2 GPUs

Eureka2 currently has 6x Nvidia A100 80GB GPUs. A number of these GPUs are partitioned up into smaller GPUs (Multi-Instance GPUs or MIG), allowing us to run more GPU jobs simultaneously. For more information on MIG, please see Nvidia’s documentation https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

The table below details the current type of GPUs available on the cluster:

Type

Total Count

Node

Description

1g.10gb

7

gpu-node01

1 compute instance & 10 GB memory

2g.20gb

3

gpu-node01

2 compute instances & 20 GB memory

3g.40gb

4

2x gpu-node01

2x gpu-node02

3 compute instances & 40 GB memory

a100

2

gpu-node02

A non MIG’d A100 with 80 GB memory

Use the following options to submit a job to the gpu partition using the default job QoS:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:<type>:<number_of_gpus>

For example to request 2x 2g.20gb GPUs for your job, you would add #SBATCH --gres=gpu:2g.20gb:2 to your submission script or to request a single full A100 GPU #SBATCH --gres=gpu:a100:1.

The number and type of MIG GPUs is subject to change in the future as we work out what is the best layout for users’ needs. Any changes will be announced on the Eureka HPC teams channel in the Research Computing Community Team.

6.5. Job priority and “Fairshare” on Eureka2

6.5.1. Fairshare

On Eureka2, each user is associated with a Slurm - job scheduler account usually related to their research group, department or Faculty. Users belong to accounts, and accounts have shares associated with them based on their contribution/investment to Eureka2. These shares determine how much of the cluster that research group/department has invested.

Different groups over the university have contributed different amounts of resources to Eureka2. In order to serve the great variety of groups and the contribution/investment, a method of fairly adjudicating job priority is required. This is the goal of Fairshare. Fairshare allows those users who have not fully used their contribution/investment to get higher priority for their jobs on the cluster over jobs by groups that have used more than their contribution/investment. The cluster is a limited resource and Fairshare allows us to ensure everyone gets a fair opportunity to use it regardless of how big or small the group is.

6.5.1.1. Fairshare example

To see how much your group/account has used of their Fairshare, the following command sshare -a --account=<account name> can be used to show a summary of this information.

The account we use in this example is chemistry. The first line of the sshare gives the summary for the whole account, with the additional lines giving a summary per user on the account.

Example output from a sshare command
[abc123@login1(eureka2) chem_reservation]$ sshare -a --account=chemistry
            Account       User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ---------
chemistry                            8423    0.112157   735603420      0.169374   0.351074
chemistry                bobby          1    0.007477           0      0.011292   0.351074
chemistry                 tony          1    0.007477           0      0.011292   0.351074
chemistry                susan          1    0.007477   203592409      0.055052   0.006076
chemistry                user1          1    0.007477           0      0.011292   0.351074
chemistry                user2          1    0.007477           0      0.011292   0.351074
chemistry              someguy          1    0.007477           0      0.011292   0.351074
chemistry              chemist          1    0.007477          38      0.011292   0.351074
chemistry                user3          1    0.007477   517403742      0.122502   0.000012
RawShares:

Chemistry has 8423 RawShares. Each user of that lab has a RawShare of its parent, this means that all the users in chemistry pull from the total Share of the Account and do not have their own individual subShares of the account Share. Thus all users in this lab have full access to the full Share of the Account.

NormShares:

NormShares is the chemistry account’s RawShares divided by the total number of RawShares given out to all accounts on the cluster. NormShare is the fraction of the cluster the account has been contributed/invested, for the chemistry account this is about 11.21% of Eureka.

RawUsuage:

RawUsage is the amount of usage of the account/user has used on Eureka. This RawUsage is also adjusted by the halflife that is set for the cluster which is 30 days. This means that usage in the last 30 days counts at full cost, from 60 days ago costs half, usage 90 days ago one fourth. So RawUsage is the aggregate of the account’s past usage with this halflife weighting factor. The RawUsage for the account is the sum of the RawUsage for each user, thus sshare is an effective way to figure out which users have contributed the most to the account’s score.

EffectvUsage:

EffectvUsage is the account’s RawUsage divided by the total RawUsage for the cluster. Thus EffectvUsage is the percentage of the cluster the account has actually used. For chemistry they have used 16.9% of the cluster.

Fairshare:

The Fairshare score is calculated using the following formula. f = 2^(-EffectvUsage/NormShares) from this number, we can assess how much an account is using of their contribution/investment in Eureka.

1.0:

Un-used. The account has not run any jobs recently.

1.0 > f > 0.5:

Under-utilization. The account is under-utilizing their share. For example, if the fairshare score is 0.75 an account has recently underutilized their share of the resources 1:2.

0.5:

Average utilization. The account on average is using exactly as much as their share.

0.5 > f > 0:

Over-utilization. The account has overused their share. For example, if the fairshare score is 0.25 an account has recently over-utilized their share of the cluster 2:1.

0:

No share left. The account has vastly overused their share.

6.5.2. Job priority

Individual job priority are calculated based on an account’s fairshare and a jobs age. Job Priority is an integer number that adjudicates the position of a job in the pending queue relative to other jobs. The first component to job priority is FairShare score. The second component is Job Age. This priority accrues over time gaining a maximum value at 7 days. As the job sits in the queue waiting to be scheduled, its priority is gradually increasing due to the job’s age. Thus even jobs from accounts that have low priority will eventually run due to the growth in their job age priority.

These two components are put together to make up an individual job’s priority. You can see this for specific jobs by using the sprio command.

Important

Fairshare does not stop jobs from running, it only influences their priority over other jobs. There is no quotas or limits on how much a user can submit/run on eureka and all jobs submitted will eventually run.

Note

This material was based on https://www.rc.fas.harvard.edu/resources/documentation/fairshare/ explanation of fairshare.