Attention
Advance Notice: Bessemer will be retired at the end of the day on Friday 31st October 2025.
Using GPUs on Stanage
There are two types of GPU node in Stanage which differ in terms of GPU architecture (NVIDIA A100 and H100), the number of GPUs per node and GPU interconnect technologies (inc bandwidth) (see Stanage hardware specifications). At present you need to decide which node type to target when submitting a batch job or starting an interactive session on a worker node.
Before proceeding, ensure you’ve worked through our introductory GPU tutorial.
Interactive use of the GPUs
Note
See requesting an interactive session on slurm if you’re not already familiar with the concept.
Attention
Interactive use of GPUs is strongly discouraged, as they are a valuable and limited resource. Please use interactive GPU sessions only for short debugging, essential visualisation, or compiling GPU-enabled software. All other GPU workloads must be submitted as batch jobs.
To start an interactive session with access to one GPU on a GPU node (Stanage hardware specifications):
srun --partition=gpu --qos=gpu --gres=gpu:1 --mem=82G --pty bash
srun --partition=gpu-h100 --qos=gpu --gres=gpu:1 --mem=82G --pty bash
Note: you can now request GPUs using --gpus=N
on Stanage (as an alternative to --gres=gpu:N
), following a recent Slurm upgrade.
Interactive sessions default to just 2 GB of CPU RAM, which is far less than the 80 GB of GPU RAM available on each NVIDIA A100 or H100 GPU. This mismatch can cause problems — for instance, failing to transfer data between CPU and GPU due to insufficient CPU-side memory.
The examples above request 82 GB CPU RAM, giving you a slight buffer above the GPUs 80 GB.
Please also carefully consider your --cpus-per-task
and --time
requests - shorter sessions tend to start sooner.
Warning
Usage of the H100 GPUs requires the --partition=gpu-h100
and --gres=gpu:1
arguments to be set in your submission scripts.
This is to ensure usage is “opt in” by users as the slightly different architecture of these GPUs to the existing A100 GPUs may necessitate changes to batch submission scripts and selected software versions.
Submitting GPU batch jobs
Note
See submitting jobs on slurm if you’re not already familiar with the concept.
To run batch jobs on GPU nodes, ensure your job submission script includes a request for GPUs,
e.g. for two GPUs use --gres=gpu:2
:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --qos=gpu
#SBATCH --gres=gpu:2
#SBATCH --mem=82G
# Your code below...
#!/bin/bash
#SBATCH --partition=gpu-h100
#SBATCH --qos=gpu
#SBATCH --gres=gpu:2
#SBATCH --mem=82G
# Your code below...
Requesting GPUs and multiple CPU cores from the scheduler
To request four separate Slurm tasks within a job, each of which has eight CPU cores and with four (A100) GPUs available to the entire job (shared between tasks):
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --qos=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:4 # 4 GPUs for job
Note that:
The GPUs are (unintuitively) shared between the Slurm tasks.
It’s not possible to request
--gpus-per-node
,--gpus-per-task
or--gpus-per-socket
on Stanage at this time (unlike on Bessemer).Not all nodes have four GPUs (Stanage hardware specifications).
Stanage GPU Resources
GPU-enabled Software
Applications
None yet
Libraries
Development Tools
Training materials
The Research Software Engineering team have developed an undergraduate teaching module on CUDA; lecture notes and lecture recordings for that module are accessible here for anyone with a University account.