Attention

Advance Notice: Bessemer will be retired at the end of the day on Friday 31st October 2025.

GPU nodes (Computer Science)

GPU nodes purchased for Bessemer by the Department of Computer Science (DCS) for use by DCS research staff, their collaborators and their research students.

Hardware specifications

Eight nodes (bessemer-node030 to bessemer-node037) each have:

Processors	2x Intel Xeon Gold 6138 (2.00GHz; 20 cores per CPU)
RAM	192GB (DDR4 @ 2666 MHz)
NUMA nodes	2x
GPUS	4x NVIDIA Tesla V100 SXM2 (32GB RAM each; NVLINK interconnects between GPUs)
Networking	25 Gbps Ethernet
Local storage	140 GB of temporary storage under `/scratch` (2x SSD RAID1)

Requesting Access

Access to the node is managed by the RSE team. Access policy:

PhD students, researchers and staff in Computer Science can all request access to the nodes.
Access to others who are collaborating on projects with some Computer Science / RSE involvement can be made on a case-by-case basis.
Access to Computer Science MSc and BSc students can be made on a case-by-case basis.

A number of other users were granted access before this policy was developed.

To request access complete this Google Form and someone within the RSE team will then respond with further information.

Using the nodes

There are several ways to access these nodes. The type of access granted for a job depends on which Slurm Account and Partition are requested at job submission time.

1. DCS test/debugging access

E.g. for short test batch jobs or for interactive debugging.

To submit a job via this route, you need to specify a *Partition* and *Account* when submitting a batch job or starting an interactive session:

Partition: dcs-gpu-test
Account: dcs-res (members of DCS) or dcs-collab (collaborators of DCS)
QoS: do not specify one i.e. do not use the --qos parameter.

Resource limits per job:

Exactly 1 or 2 GPUs must be requested
Default run-time: 30 minutes
Maximum run-time: 30 minutes
Number of CPU cores, amount of RAM and number of GPUs in a single node i.e. multi-node jobs are not permitted.

Each user can run a maximum of two of these jobs concurrently.

2. DCS access for larger jobs

If you want to run a longer batch job which uses up to all the resources available in one of these nodes, or a longer interactive job which uses up to 2 GPUs, then you can specify a different Partition when submitting a batch job or starting an interactive session:

Partition: dcs-gpu
Account: dcs-res (members of DCS) or dcs-collab (collaborators of DCS)
QoS: do not specify one i.e. do not use the --qos parameter.

Please only run batch jobs this way: long-running interactive sessions that are associated with large resource requests are often an inefficient way of using cluster resources.

Resource limits per job:

1 to 4 GPUs must be requested for batch jobs
1 or 2 GPUs must be requested for interactive sessions
Default run-time: 8 hours
Maximum run-time: 7 days
Number of CPU cores, amount of RAM and number of GPUs in a single node i.e. multi-node jobs are not permitted.

Checking Queue and Node Status

Using the squeue and sinfo SLURM Commands it is possible to query the status of these nodes. Knowing how many jobs are queued for these nodes, and the status of the nodes can be helpful when estimating when your jobs will run.

squeue can be used to view running and queued jobs for specific partitions, using -p <partition_list>. Requesting non default format options such as the time limit for jobs can help estimate when your jobs may begin to run, using -o of -O.

squeue -p dcs-gpu,dcs-gpu-test -o "%.18i %.12j %.12u %.12b %.2t %.10M %.10l %R"

Which will produce output similar to:

  JOBID         NAME         USER TRES_PER_NOD ST       TIME TIME_LIMIT NODELIST(REASON)
XXXXXXX     job_name     USERNAME   gres:gpu:1 PD       0:00    1:00:00 (Resources)
YYYYYYY     job_name     USERNAME   gres:gpu:1  R   12:34:56 7-00:00:00 bessemer-nodeNNN
...

sinfo can be used to query the status of nodes within a partition. For GPU nodes it is useful to also request Gres and GresUsed:

sinfo -p dcs-gpu -N -O "NodeList,Available,Gres,GresUsed,CPUsState"

When all GPUs in the partition are being used, the output will be similar to:

NODELIST            AVAIL               GRES                GRES_USED           CPUS(A/I/O/T)
bessemer-nodeNNN    up                  gpu:v100:4(S:0)     gpu:v100:4(IDX:0-3) 16/24/0/40
...