GPU nodes purchased for Bessemer by the Department of Computer Science (DCS) for use by DCS research staff, their collaborators and their research students.
Eight nodes (bessemer-node030
to bessemer-node037
) each have:
Processors |
2x Intel Xeon Gold 6138 (2.00GHz; 40 cores per CPU) |
RAM |
192GB (DDR4 @ 2666 MHz) |
NUMA nodes |
2x |
GPUS |
4x NVIDIA Tesla V100 SXM2 (32GB RAM each; NVLINK interconnects between GPUs) |
Networking |
25 Gbps Ethernet |
Local storage |
140 GB of temporary storage under |
Access to the node is managed by the RSE team. Access policy:
PhD students, researchers and staff in Computer Science can all request access to the nodes.
Access to others who are collaborating on projects with some Computer Science / RSE involvement can be made on a case-by-case basis.
Access to Computer Science MSc students can be made on a case-by-case basis.
A number of other users were granted access before this policy was developed.
To request access complete this Google Form and someone within the RSE team will then respond with further information.
There are several ways to access these nodes. The type of access granted for a job depends on which Slurm Account and Partition are requested at job submission time.
E.g. for short test batch jobs or for interactive debugging.
To submit a job via this route, you need to specify a *Partition* and *Account* when submitting a batch job or starting an interactive session:
Partition: dcs-gpu-test
Account: dcs-res
(members of DCS) or dcs-collab
(collaborators of DCS)
QoS: do not specify one i.e. do not use the --qos
parameter.
Resource limits per job:
Exactly 1 or 2 GPUs must be requested
Default run-time: 30 minutes
Maximum run-time: 30 minutes
Number of CPU cores, amount of RAM and number of GPUs in a single node i.e. multi-node jobs are not permitted.
Each user can run a maximum of two of these jobs concurrently.
If you want to run a longer job that uses up to all the resources available in one of these nodes then you can specify a different Partition when submitting a batch job or starting an interactive session:
Partition: dcs-gpu
Account: dcs-res
(members of DCS) or dcs-collab
(collaborators of DCS)
QoS: do not specify one i.e. do not use the --qos
parameter.
Please only run batch jobs this way: long-running interactive sessions that are associated with large resource requests are often an inefficient way of using cluster resources.
Resource limits per job:
At least one GPU must be requested
Default run-time: 8 hours
Maximum run-time: 7 days
Number of CPU cores, amount of RAM and number of GPUs in a single node i.e. multi-node jobs are not permitted.