HPC Etiquette¶
Appropriate Usage of Research Computing Facilities¶
The compute facilities should only be used for work relating to University of Surrey academic research. They should not be used for personal or other uses and are subject to the University’s IT acceptable use policy.
Any malicious abuse or misuse of the computing facilities may lead to suspension of your access. This includes any deliberate attempt to circumvent job schedulers to consume more than your entitled share of resources.
If your jobs are causing significant issues or detrimental impacts to the cluster, they may be stopped by IT services to preserve the system. In emergency situations, when the system is in danger of crashing, immediate action may be taken without prior notification.
Local Rules¶
Help others.
If you see someone asking a question in the Teams Community and you think you can help, please respond. By sharing your expertise, we can collectively elevate the quality of research across the university.
Do not claim or request more resources than you need.
Benchmark Your Tasks: Prior to submitting your HPC jobs, benchmark your specific tasks against the processing cores, RAM, and GPUs you plan to use. This ensures that your resource requests accurately reflect your requirements.
Avoid Excess: Requesting more resources than necessary can lead to longer wait times for everyone and waste valuable computational power.
Fairshare Impact: Consistently underusing requested resources may negatively affect your Fairshare, resulting in longer queue times in the future. See Job priority and “Fairshare”.
Test your code.
Running buggy code wastes both your time and cluster resources. Take a moment to sanity check your code and run small-scale tests before launching large jobs. If you are unsure how to develop effective tests, the RSE team is available to support you.
Do not assume—please ask.
If you are uncertain about whether a practice is acceptable or if you are using a resource correctly, ask for help. Your questions might even highlight areas where our documentation could be improved, potentially preventing costly mistakes.
Avoid manipulating the job scheduler.
Diagnose First: If your job is idle or delayed, first check whether your resource requests, partition choice, or job dependencies might be the cause. Familiarize yourself with the scheduling policies and priority systems, see HPC job scheduler (Slurm), to diagnose the issue.
Respect the System: Do not try to bypass scheduling policies by creating unnecessary processes, exploiting loopholes, or otherwise manipulating the system. If you are unsure why your job is not starting, contact the Research Computing team for assistance.
Do not clog up the clusters’ login nodes.
Intensive tasks must not be run on the head or login nodes, which are reserved for activities such as editing code, running job scripts, submitting jobs, or managing the queuing system. See also the Mistakes to Avoid page.