3. HPC data storage
3.1. HPC Cluster local storage
Clusters at Surrey usually have two local filesystems. Each have a specific function and purpose and will affect how you work with and manage your data.
Cluster Name |
Home Directory (NFS) |
Parallel Scratch (BeeGFS) |
Parallel Scratch Paths |
---|---|---|---|
Eureka2 |
30 GB (Personal Quota - fixed) |
70 TB (FS Total) |
/parallel_scratch/<username> |
Eureka |
7.5 TB (FS Total) |
56 TB (FS Total) |
/users/<username>/parallel_scratch (this is a symbolic link to /mnt/beegfs/users/<username> |
3.1.1. HPC home directory
Your home directory /users/<username>
is the space where you are taken to when you log in to a cluster.
This filesystem is local to the cluster, and is separate to the standard university home directory e.g. /user/HS100/<username>
.
This space is where you should store data you want to keep such as input files and outputs from your simulations, or code you are developing/working on (this should also be pushed to GitLab) and its subsequent executables.
This space is a communal storage area for all users and all data in this space is backed up. There are currently no limits on how much each user can store in this space. This space however is not for permanent research data storage. Data needs to be taken elsewhere for long-term storage once you have finished with your work/project.
For research data storage please see: https://filestorage.surrey.ac.uk.
For code development please see our university GitLab: https://gitlab.surrey.ac.uk.
3.1.2. HPC parallel scratch
Your parallel scratch path, as shown in the table above, is the space for your scratch data, which is for Heavy parallel Input/Output and Read/Write workloads during simulations. This includes any temporary files that may get written by your code during a simulation, large data sets that need to be read/written before,during or after a simulation starts or any excessive output such as writing 1000s of lines of data.
This space is a communal storage area and NO data in this space is backed up. There are currently no limits on how much each user can store in this space,
this space only for temporary data storage when running simulations, any important data you are writing to this area that you want to keep should be
copied back to your Home Directory /users/<username>
.
Note
Please practice good citizenship in this space and ensure you clean up any temporary files which are written during simulations that you don’t want after they are finished.
3.1.3. Checking your HPC Local storage usage
To check how much space you are using in your home directory
/users/<username>
you can use the command:
du -hs /users/<username>
or
ncdu /users/<username>
[abc123@login7(eureka) ~]$ du -hs /users/abc123
399M /users/abc123/
[abc123@login7(eureka) ~]$
To check how much space you are using in parallel_scratch directory
/users/<username>/parallel_scratch
you can use the command:
du -hs /users/<username>/parallel_scratch # On Eureka
du -hs /parallel_scratch/users/<username> # On Eureka2
or
ncdu /users/parallel_scratch/<username> # On Eureka
ncdu /parallel_scratch/users/<username> # On Eureka2
[abc123@login7(eureka) ~]$ du -hs /users/abc123/parallel_scratch/
30G /users/abc123/parallel_scratch/
[abc123@login7(eureka) ~]$
Tip
If your simulations are deterministic, you can get away with just keeping the input files once your finished with the data generated/your project.
Tip
- If you are familiar enough with the home and parallel storage areas, you could create a symbolic link from your home directory to the parallel scratch area for convenience.
[abc123@login(eureka2) ~] ln -s /parallel_scratch/$USER ~/parallel_scratch
3.2. Transferring data onto HPC
There are multiple ways to transfer data to and from the cluster you use. The main ways are using scp and rsync, or for windows users an SFTP client.
Below are the hostnames to use for the respective HPC clusters:
Eureka: eureka.surrey.ac.uk
Eureka2: eureka2.surrey.ac.uk
Kara: kara.ati.surrey.ac.uk
Kara02: kara02.eps.surrey.ac.uk
3.2.1. scp (Linux/Mac)
To securely copy data to a remote host:
$ scp –r <Directory> username@remotehost:/path/to/remotedir/
Examples:
[abc123@login7(eureka) ~]$ scp -r IMPORTANT_DATA abc123@myhost:~
abc123@myhosts password:
DATA_FILE_4.txt 100% 0 0.0KB/s 00:00
DATA_FILE_3.txt 100% 0 0.0KB/s 00:00
DATA_FILE_1.txt 100% 0 0.0KB/s 00:00
[abc123@myhost ~]$ scp -r IMPORTANT_INPUT_FILES abc123@eureka:~
abc123@myhosts password:
DATA_FILE_4.txt 100% 0 0.0KB/s 00:00
DATA_FILE_3.txt 100% 0 0.0KB/s 00:00
DATA_FILE_1.txt 100%
3.2.2. rsync (Linux/Mac)
To synchronise a directory from a local machine to a remote machine (or vice versa):
$ rsync –avz <Directory> user@remotehost:/path/to/remotedir/
Examples:
[abc123@myhost ~]$ rsync -avz IMPORTANT_INPUT_FILES abc123@eureka:/users/abc123/
abc123@myhosts password:
sending incremental file list
IMPORTANT_INPUT_FILES/
IMPORTANT_INPUT_FILES/INPUT_FILE_1.in
IMPORTANT_INPUT_FILES/INPUT_FILE_2.in
IMPORTANT_INPUT_FILES/INPUT_FILE_3.in
IMPORTANT_INPUT_FILES/INPUT_FILE_4.in
sent 306 bytes received 99 bytes 810.00 bytes/sec
total size is 0 speedup is 0.00
[abc123@login7(eureka) ~]$ rsync -avz IMPORTANT_DATA abc123@myhost:/user/HS204/abc123/
abc123@myhosts password:
sending incremental file list
IMPORTANT_DATA/
IMPORTANT_DATA/DATA_FILE_1.txt
IMPORTANT_DATA/DATA_FILE_2.txt
IMPORTANT_DATA/DATA_FILE_3.txt
IMPORTANT_DATA/DATA_FILE_4.txt
sent 290 bytes received 92 bytes 69.45 bytes/sec
total size is 0 speedup is 0.00
Caution
PLEASE ENSURE you use the trailing “/” exactly as shown above to avoid overwriting any folders/data, as the rsync command is sensitive to the way in which trailing slashes are used.
Note:
Rsync is very useful for copying, moving and backing up/synchronising data, but it can be very easy to make a mistake in a command, slashes in paths make a big difference.
Read a guide to ensure your doing exactly what you want (test out commands):
3.2.3. Windows data transfer methods
MobaXterm allows you to transfer files to/from a cluster using a psuedo-terminal, so you can use all the previously mentioned rsync and scp commands.
MobaXterm also has a SFTP function (Secure File Transfer Protocol) this allows for a drag and drop style transfer of data:
Note:
Windows-based editors (e.g. notepad++) may put an extra “carriage return” (^M) character at the end of each line of text.
This will cause problems for most Linux-based applications. To correct this problem, execute the built-in utility dos2unix on each ASCII file on Eureka you transfer to it from windows. An example is shown below:
[abc123@login7(eureka) ~]$ dos2unix example.txt
dos2unix: converting file water.inp to Unix format ...
3.3. Working with data on HPC local storage
If you need to work with files stored on the HPC local storage there are a number of ways you can do this.
If you’re comfortable working in the terminal, you can use an SSH connection and use all of the CLI tools you are used to, such as Vim, Emacs, Nano etc.
If you prefer a graphical user interface you can get a desktop session via the RemoteLabs web portal.
See connecting-hpc for more information.
3.3.1. Remote Development with Visual Studio Code
Microsoft Visual Studio code has a feature that allows you to connect to a remote filesystem via ssh to work with your files remotely.
https://code.visualstudio.com/docs/remote/ssh
To use this you will need to have Microsoft Visual Studio Code installed on your workstation.
If you have a university managed machine you can Open a support ticket to request an install.
If you are using a personal/Self managed machine you can install this yourself.
Note
Visual Studio code is not installed on the clusters login nodes as the application uses a lot of system resources, particularly with multiple instances of the program running inside multiple user sessions.
We encourage you to use this feature which will enable you to work with the files in the HPC local storage as if they were files stored locally on your workstation. This will help to keep development workloads off the clusters and improve usability for all.