salloc

Important

Contrary to the description on this page and defaults, the salloc command on the Bessemer cluster (only) has been reconfigured to provide an alternative method to spawn an interactive session.

If no command is specified when invoking salloc, then after the resources are allocated a new shell starts inside the requested resources rather than starting a new user shell on the same machine.

salloc is a SLURM scheduler command used to allocate a Slurm job allocation, which is a set of resources (nodes), possibly with some set of constraints (e.g. number of processors per node).

When salloc successfully obtains the requested allocation, it then runs the command specified by the user on the current machine and then revokes the allocation.

If no command is specified, then by default salloc starts the user’s default shell on the same machine.

While in an sbatch job or interactive session, salloc can also be used to allocate part of the jobs resources to specific srun sub-tasks.

Documentation

Documentation is available on the system using the command

$ man salloc

Usage

The salloc command is used to request an allocation from the SLURM scheduler and takes the same arguments as srun.

The salloc command can be used with a subsequent command or script to request an allocation then run that command/script on the current machine (not on the allocated nodes/resources). When the command / script is finished the allocation will be revoked.

The command may be any program the user wishes. Some typical commands are seen below:

$ salloc --nodes=1 --ntasks-per-node=1 --mem-per-cpu=2G --time=01:00:00 srun mycommand

Or:

$ salloc --nodes=1 --ntasks-per-node=1 --mem-per-cpu=2G --time=01:00:00 srun myscript.sh

An example is seen as follows with the sleep 10 command with a single task on a single node for 1 minute:

$ salloc --nodes=1 --ntasks-per-node=1 --mem-per-cpu=2G --time=00:01:00 bash -c 'sleep 10'
salloc: Pending job allocation 2165971
salloc: job 2165971 queued and waiting for resources
salloc: job 2165971 has been allocated resources
salloc: Granted job allocation 2165971              #The sleep command starts
salloc: Relinquishing job allocation 2165971        #The sleep command has finished.
salloc: Job allocation 2165971 has been revoked.

This has requested the allocation then waited 10 seconds and then cancelled the allocation.

To demonstrate that the command after salloc runs on the current machine and not on the allocated nodes/resources see the example below:

$ hostname
bessemer-node001.shef.ac.uk
$ salloc --nodes=1 --ntasks-per-node=1 --mem-per-cpu=2G --time=01:00:00 hostname
salloc: Pending job allocation 2165974
salloc: job 2165974 queued and waiting for resources
salloc: job 2165974 has been allocated resources
salloc: Granted job allocation 2165974
bessemer-node001.shef.ac.uk
salloc: Relinquishing job allocation 2165974
salloc: Job allocation 2165974 has been revoked.

As is seen in line 1, we are on the login node as indicated by the hostname command. When we execute the salloc command to reserve resources and instruct it to run the hostname command it again shows the same host as the command executed by salloc occurs on the current machine not on the allocated nodes/resources.

When the salloc command is invoked without a command it will run the user’s default shell. This in effect provides an allocation for which srun jobs can be dispatched to on the fly. This can be advantegous as scripts/commands ran with srun can run immediately, since the resources are allocated already.

This is another example, requesting a single node job with 4 tasks (1 CPU per task) a total of 2GB memory for an hour without a command:

$ salloc --nodes=1 --ntasks-per-node=4 --mem=2G --time=01:00:00

The output will look like the below:

$ salloc --nodes=1 --ntasks-per-node=4 --mem=2G --time=01:00:00
salloc: Pending job allocation 2117564
salloc: job 2117564 queued and waiting for resources
salloc: job 2117564 has been allocated resources
salloc: Granted job allocation 2117564

The allocation command will wait until the resource request is fulfilled and then return to the login node (by running the default shell.)

Warning

When you are finished with your tasks please ensure that you release / cancel your allocation using the scancel command: scancel $SLURM_JOB_ID, so compute resources are not trapped and idle.

The allocation will then be available for use using the srun command. You can see running allocations by showing them with sacct :

$ sacct
JobID        JobName    Partition  Account    AllocCPUS  State      ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2117564      interacti+ interacti+       free          4    RUNNING      0:0

And dispatching the command hostname to each of the 4 tasks with srun:

$ srun hostname
bessemer-node001.shef.ac.uk
bessemer-node001.shef.ac.uk
bessemer-node001.shef.ac.uk
bessemer-node001.shef.ac.uk

Note that there is no need to supply the SLURM_JOB_ID variable as when salloc ran it spawned you a subshell with this varible set for the fulfilled allocation.

It is seen that the srun command is ran in each task, if you want to run a single task but with multiple cores you can make a request using the -c or --cpus-per-task arguments.

More specific information for using the salloc command can be found by running the salloc command with the --help flag or by visiting the slurm page on salloc.