GPU nodes for specific Computer Science academics
Four academics in the Department of Computer Science (DCS) share two NVIDIA V100 GPU nodes in Bessemer:
Academic |
Node |
Slurm Account name |
Slurm Partition name |
---|---|---|---|
|
|
(see notes below) |
|
|
|
(see notes below) |
|
|
|
(see notes below) |
|
|
|
(see notes below) |
Hardware specifications
bessemer-node041
and bessemer-node042
Processors |
2x Intel Xeon Gold 6138 (2.00GHz; 20 cores per CPU) |
RAM |
192GB (DDR4 @ 2666 MHz) |
NUMA nodes |
2x |
GPUS |
4x NVIDIA Tesla V100 SXM2 (16GB RAM each; NVLINK interconnects between GPUs) |
Networking |
25 Gbps Ethernet |
Local storage |
140 GB of temporary storage under |
Requesting access
Users other than the listed academics should contact one of those academics should they want access to these nodes.
That academic can then grant users access to the relevant SLURM Account (e.g. dcs-acad1
)
via this web interface.
Using the nodes
There are several ways to access these nodes. The type of access granted for a job depends on which SLURM Account and Partition are requested at job submission time. Only certain users have access to a given Account.
bessemer-node041 and bessemer-node042: non-preemptable access to half a node
Each of four academics (plus their collaborators) have ring-fenced, on-demand access to the resources of half a node.
To submit a job via this route, you need to specify a Partition and an Account when submitting a batch job or starting an interactive session:
Partition:
dcs-acad
Account:
dcs-acadX
whereX
is 1, 2, 3 or 4 and varies between the academics).QoS: do not specify one i.e. do not use the
--qos
parameter.
Resource limits per job:
Default run-time: 8 hours
Maximum run-time: 7 days
CPU cores: 20
GPUs: 2
Memory: 96 GB
bessemer-node041 and bessemer-node042: preemptable access to both nodes
If any of the four academics (or their collaborators) want to run a larger job that requires up to all the resources available in one of these two nodes then they can specify a different Partition when submitting a batch job or starting an interactive session:
Partition:
dcs-acad-pre
Account:
dcs-acadX
whereX
is 1, 2, 3 or 4 and varies between the academics).QoS: do not specify one i.e. do not use the
--qos
parameter.
However, to facilitate fair sharing of these GPU nodes jobs submitted via this route are preemptable:
they will be stopped mid-execution if a job is submitted to the dcs-acad
partition (see above)
that requires those resources.
When a job submitted by this route is preempted by another job the preempted job is terminated and re-queued.
Resource limits per job:
Number of CPU cores, amount of RAM and number of GPUs in a single node i.e. multi-node jobs are not permitted.
Same default and maximum run-time (as above).