Fork me on GitHub

Slurm Workload Manager

Slurm is a highly scalable cluster management and job scheduling system, used in Bessemer. As a cluster workload manager, Slurm has three key functions:

  • it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work,
  • it provides a framework for starting, executing, and monitoring work on the set of allocated nodes,
  • it arbitrates contention for resources by managing a queue of pending work.

Request an Interactive Shell

Launch an interactive session on a worker node using the command:

srun --pty bash -i

You can request an interactive node with multiple CPU cores by using the command:

srun -c="N" --pty bash -i

The parameter “N” represents the number of CPU cores upto 4 per interactive job. Please note that requesting multiple cores in an interactive node depends on the availability. During peak times, it is unlikely that you can successfully request a large number of cpu cores interactively. Therefore, it may be a better approach to submit your job non-interactively.

You can request additional memory (parameter “nn” represents the amount of memory):

srun --mem="NN"G --pty bash -i

Submitting Non-Interactive Jobs

Write a job-submission shell script

You can submit your job, using a shell script. A general job-submission shell script contains the “bang-line” in the first row.

#!/bin/bash

Next you may specify some additional options, such as memory,CPU or time limit.

#SBATCH --"OPTION"="VALUE"

Load the approipate modules if necessery.

module use "PATH"
module use "MODULE NAME"

Finally, run your program by using the Slurm “srun” command.

srun "PROGRAM"

The next example script requests 40 CPU cores in total and 64Gb memory. Notifications will be sent to an email address.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --mem=64000
#SBATCH [email protected]

module load OpenMPI/3.1.3-GCC-8.2.0-2.31.1

srun --export=ALL program

Maximum 40 cores can be requested per node in the general use queues.

Job Submission

Save the shell script (let’s say “submission.sh”) and use the command

sbatch submission.sh

Note the job submission number. For example:

Submitted batch job 1226

Check your output file when the job is finished.

cat "JOB_NAME"-1226.out

Additional options for job submission

Name your submission:

#SBATCH --comment=test_job

Specify nodes and tasks for MPI jobs:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16

Memory allocation:

#SBATCH --mem=16000

Specify the output file name:

#SBATCH --output=output.%j.test.out

Request time:

#SBATCH --time=00:30:00

Email notification:

For the full list of the available options please visit the Slurm manual webpage at https://slurm.schedmd.com/pdfs/summary.pdf.

Key SLURM Scheduler Commands

Display the job queue. Jobs typically pass through several states in the course of their execution. The typical states are PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED.

squeue

Shows job details:

sacct -v

Details the HPC nodes:

sinfo

Deletes job from queue:

scancel "JOB_ID"