Slurm is a highly scalable cluster management and job scheduling system, used in Bessemer. As a cluster workload manager, Slurm has three key functions:
it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work,
it provides a framework for starting, executing, and monitoring work on the set of allocated nodes,
it arbitrates contention for resources by managing a queue of pending work.
Launch an interactive session on a worker node using the command:
srun --pty bash -i
You can request an interactive node with multiple CPU cores by using the command:
srun -c="N" --pty bash -i
The parameter “N” represents the number of CPU cores upto 4 per interactive job. Please note that requesting multiple cores in an interactive node depends on the availability. During peak times, it is unlikely that you can successfully request a large number of cpu cores interactively. Therefore, it may be a better approach to submit your job non-interactively.
You can request additional memory (parameter “nn” represents the amount of memory):
srun --mem="NN"G --pty bash -i
You can submit your job, using a shell script. A general job-submission shell script contains the “bang-line” in the first row.
Next you may specify some additional options, such as memory,CPU or time limit.
Load the approipate modules if necessery.
module use "PATH" module use "MODULE NAME"
Finally, run your program by using the Slurm “srun” command.
The next example script requests 40 CPU cores in total and 64Gb memory. Notifications will be sent to an email address.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks-per-node=40 #SBATCH --mem=64000 #SBATCH --firstname.lastname@example.org module load OpenMPI/3.1.3-GCC-8.2.0-2.31.1 srun --export=ALL program
Maximum 40 cores can be requested per node in the general use queues.
Save the shell script (let’s say “submission.sh”) and use the command
Note the job submission number. For example:
Submitted batch job 1226
Check your output file when the job is finished.
Name your submission:
Specify nodes and tasks for MPI jobs:
#SBATCH --nodes=1 #SBATCH --ntasks-per-node=16
Specify the output file name:
For the full list of the available options please visit the Slurm manual webpage at https://slurm.schedmd.com/pdfs/summary.pdf.
Display the job queue. Jobs typically pass through several states in the course of their execution. The typical states are PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED.
Shows job details:
Details the HPC nodes:
Deletes job from queue: