HPC data storage¶
HPC local storage¶
HPC clusters at Surrey each have dedicated high performance filesystems. These filesystems are specific to each cluster and accessible to each node of the cluster via the cluster’s private storage network.
Each have a specific function and purpose and will affect how you work with and manage your data.
- User Home Directory:
/users/<username>- User Home Directory Quota:
30 GB (Personal Quota - fixed)
- User Home Directory backed up?:
yes
- High performance Parallel Scratch:
/parallel_scratch/<username>- High performance Parallel Scratch:
105 TB (FS Total)
- filesystem:
BeeGFS
- User Home Directory:
/mnt/fast/nobackup/users/<username>- User Home Directory Quota:
200 GB
- User Home Directory backed up?:
No
- High performance Parallel Scratch:
/mnt/fast/nobackup/scratch4weeks- High performance Parallel Scratch:
160 TB (FS Total)
- High performance Parallel Scratch retention policy:
Data is automatically deleted after 4 weeks of no activity on the file.
- filesystem:
wekafs
- User Home Directory:
/users/<username>- User Home Directory Quota:
None, 7.5 TB (Filesystem Total)
- User Home Directory backed up?:
yes
- High performance Parallel Scratch:
/users/<username>/parallel_scratch(this is a symbolic link to /mnt/beegfs/users/<username>)- High performance Parallel Scratch:
56 TB (FS Total)
- filesystem:
BeeGFS
Tip
More specific detail on each cluster’s individual local high performance storage can be found on the HPC clusters pages.
User home directory¶
Personal HPC storage space
Your home directory is your personal dedicated storage area on the HPC cluster.
This filesystem is local to the cluster, and is separate to the standard university home directory, e.g. /user/HS100/<username>.
This space is where you should store data you want to keep, such as input files and outputs from your simulations, or code you are developing/working on (this should also be pushed to GitLab) and its subsequent executables.
This space is a communal storage area for all users, and on some clusters this space is backed up. Usually, there is a quota applied to the User home directories to limit the amount of data each user can store in this area. For details on the specifics of data storage areas on each cluster, please see the tabs above.
Warning
This space is not for permanent research data storage. research data requiring long term storage and protection should be transferred to a project space on the network file store or SharePoint.
For research data storage, please see: https://filestorage.surrey.ac.uk.
For code development, please see our university GitLab
For more information on NFS
HPC high performance scratch¶
high performance storage for temporary storage of scratch data
Each cluster has a scratch storage space for temporary storage of data generated by your jobs or data that will be processed by your jobs.
The path to these areas can be seen in the tabs at the top of the page. This area is the storage space for your scratch data, which is for heavy parallel Input/Output and read/write workloads during simulations. This includes any temporary files that may get written by your code during a simulation, large data sets that need to be read/written before, during or after a simulation starts or any excessive output such as writing 1000s of lines of data.
This space is a communal storage area and data in this space is NOT BACKED UP! There are currently no limits on how much each user can store in this space,
this space only for temporary data storage when running simulations, any important data you are writing to this area that you want to keep should be
copied back to your home directory /users/<username> or off the cluster to a project space on the network file store or SharePoint.
Note
Please practice good citizenship in this space and ensure you clean up any temporary files which are written during simulations that you don’t want after they are finished. Abandoned data on this space may get deleted.
Checking your HPC local storage usage¶
To check how much space you are using in your user home directory, you can use the command:
du -hs </path/to/homedir>
or
ncdu <path/to/homedir>
[abc123@login1(eureka2) ~]$ du -hs /users/abc123
399M /users/abc123/
To check how much space you are using in the scratch directory, you can use the
dforncducommand and provide the full path to your directory:
du -hs </path/to/directory>
or
ncdu </path/to/directory>
[abc123@login1(eureka2) ~]$ du -hs /parallel_scratch/abc123
30G /parallel_scratch/abc123
Tip
If your simulations are deterministic, you probably only need to keep the input files once you’re finished with the data generated/your project.
Transferring data to/from HPC storage¶
There are multiple ways to transfer data to and from the cluster you use. The main ways are using SCP and rsync, or for Windows users an SFTP client.
There are a number of methods utilising the command line or GUI tools, detailed below.
Note
The hostnames to use when transferring data to or from the respective HPC clusters are:
Eureka: eureka.surrey.ac.uk
Eureka2: eureka2.surrey.ac.uk
Kara02: kara02.eps.surrey.ac.uk
AISURREY: datamove1.surrey.ac.uk, datamove2.surrey.ac.uk or datamove3.surrey.ac.uk
Open OnDemand¶
If the cluster has Open OnDemand you can log in and use the “Files” feature to manage your files on the cluster via a web interface.
SCP (Linux/macOS)¶
To securely copy data to a remote host:
$ scp –r <Directory_to_copy> username@remotehost:/path/to/remotedir/
Examples:
[abc123@login7(eureka) ~]$ scp -r IMPORTANT_DATA abc123@myhost:~
abc123@myhosts password:
DATA_FILE_4.txt 100% 0 0.0KB/s 00:00
DATA_FILE_3.txt 100% 0 0.0KB/s 00:00
DATA_FILE_1.txt 100% 0 0.0KB/s 00:00
[abc123@myhost ~]$ scp -r IMPORTANT_INPUT_FILES abc123@eureka:~
abc123@myhosts password:
DATA_FILE_4.txt 100% 0 0.0KB/s 00:00
DATA_FILE_3.txt 100% 0 0.0KB/s 00:00
DATA_FILE_1.txt 100%
rsync (Linux/macOS)¶
To synchronise a directory from a local machine to a remote machine (or vice versa):
$ rsync –avz <Directory> user@remotehost:/path/to/remotedir/
Examples:
[abc123@myhost ~]$ rsync -avz IMPORTANT_INPUT_FILES abc123@eureka:/users/abc123/
abc123@myhosts password:
sending incremental file list
IMPORTANT_INPUT_FILES/
IMPORTANT_INPUT_FILES/INPUT_FILE_1.in
IMPORTANT_INPUT_FILES/INPUT_FILE_2.in
IMPORTANT_INPUT_FILES/INPUT_FILE_3.in
IMPORTANT_INPUT_FILES/INPUT_FILE_4.in
sent 306 bytes received 99 bytes 810.00 bytes/sec
total size is 0 speedup is 0.00
[abc123@login7(eureka) ~]$ rsync -avz IMPORTANT_DATA abc123@myhost:/user/HS204/abc123/
abc123@myhosts password:
sending incremental file list
IMPORTANT_DATA/
IMPORTANT_DATA/DATA_FILE_1.txt
IMPORTANT_DATA/DATA_FILE_2.txt
IMPORTANT_DATA/DATA_FILE_3.txt
IMPORTANT_DATA/DATA_FILE_4.txt
sent 290 bytes received 92 bytes 69.45 bytes/sec
total size is 0 speedup is 0.00
Caution
PLEASE ENSURE you use the trailing “/” exactly as shown above to avoid overwriting any folders/data, as the rsync command is sensitive to the way in which trailing slashes are used.
Danger
Rsync is very useful for copying, moving and backing up/synchronising data, but it can be very easy to make a mistake in a command, slashes in paths make a big difference.
Read a guide to ensure you’re doing exactly what you want (test out commands):
Windows data transfer methods¶
MobaXterm allows you to transfer files to/from a cluster using a pseudo-terminal, so you can use all the previously mentioned rsync and SCP commands.
MobaXterm also has a SFTP function (Secure File Transfer Protocol) this allows for a drag and drop style transfer of data:
Note:
Windows-based editors (e.g. notepad++) may put an extra “carriage return” (^M) character at the end of each line of text.
This will cause problems for most Linux-based applications. To correct this problem, execute the built-in utility dos2unix on each ASCII file on Eureka you transfer to it from Windows. An example is shown below:
[abc123@login7(eureka) ~]$ dos2unix example.txt
dos2unix: converting file water.inp to Unix format ...
Working with data on HPC local storage¶
If you need to work with files stored on the HPC local storage there are a number of ways you can do this.
If the cluster has Open OnDemand you can log in and use the web interface to work with your files on the cluster’s local storage.
If you’re comfortable working in the terminal, you can use an SSH connection and use all the CLI tools you are used to, such as Vim, Emacs, Nano etc.
If you prefer a graphical user interface, you can get a desktop session via the RemoteLabs web portal.
See connecting-hpc for more information.
Remote development with Visual Studio Code¶
Microsoft Visual Studio Code has a feature that allows you to connect to a remote filesystem via ssh to work with your files remotely.
https://code.visualstudio.com/docs/remote/ssh
To use this, you will need to have Microsoft Visual Studio Code installed on your workstation.
If you have a university managed machine you can hpc-support-ticket to request an installation.
If you are using a personal/self managed machine you can install this yourself.
Note
Visual Studio Code is not installed on the cluster’s login nodes as the application uses a lot of system resources, particularly with multiple instances of the program running inside multiple user sessions.
We encourage you to use this feature which will enable you to work with the files in the HPC local storage as if they were files stored locally on your workstation. This will help to keep development workloads off the clusters and improve usability for all.