Embarrassingly Parallel

Overview

Slurm job arrays enable you to submit jobs that run multiple times with the same Slurm parameters. Use the --array Slurm argument to define array indices, e.g. --array=1-10,12-15. The $SLURM_ARRAY_TASK_ID environment variable provides each job with its corresponding array index.

Example:

#!/bin/bash
#SBATCH --array=1-10

# Each job processes a different input file
srun python my_script.py input_${SLURM_ARRAY_TASK_ID}

In scientific computing, it is often necessary to run the same program multiple times with varying datasets or parameters.

When these runs do not depend on or communicate with each other, they can be executed in parallel as separate Slurm jobs. This type of parallelism is referred to as embarrassingly parallel.

Slurm provides a feature called job arrays, which allows users to efficiently submit and manage multiple independent instances of the same job script.

Array jobs enable you to manage large-scale workloads on the cluster. In Parallel Computing, we explore alternatives.

Note

Although the examples here are drawn from scientific computing, job arrays are equally useful in many other fields. You might use them to analyse batches of text files, run different models on survey data, or automate large-scale media processing — anywhere you need to repeat a task across multiple inputs or parameters.

Introduction

Array jobs facilitate parallel computations. They are useful when you need to run a job multiple times with only minor variations. For example, you may need to execute 1000 jobs, each with a different random seed, or apply the same operation across multiple datasets. This can be accomplished with a single array job.

A Slurm job array consists of multiple jobs that share the same batch submission script. The --array directive specifies how many times the script should be executed, for instance:

#SBATCH --array=0-4

This command creates an array of five jobs (tasks) indexed from 0 to 4. Each task is a duplicate of the submitted batch script, automatically queued in Slurm. The SLURM_ARRAY_TASK_ID environment variable is used to assign a unique identifier to each task, which can be leveraged for handling input/output files.

../_images/Array-jobs.png

--array via the command line

The --array option can also be specified as a command-line argument when using sbatch. This is useful for managing job arrays without modifying the script.

Important

Since array jobs create multiple identical job instances, it is crucial to understand their impact on the file system:

  • Does the script rely on libraries or environments stored in the working directory?

  • How much input data does each task require?

  • How much output data does each job generate?

For example, launching an array job with hundreds of tasks that depend on a Python environment stored on shared storage may cause significant file system load due to repeated access to thousands of files.

If you are unsure how your job will behave, seek guidance from the IT Services’ Research and Innovation team.

Your First Array Job

Note

The necessary scripts for the upcoming exercises are located in our hpc-examples repository. This repository is accessible on our Stanage HPC cluster. To utilise it, load the module:

module load hpc-examples

After loading, you can access the examples scripts via the $HPC_EXAMPLES environment variable.

For example, you can then run slurm/pi.py in the following way:

python $HPC_EXAMPLES/slurm/pi.py

Let’s see an array job in practice. Look at the script ${HPC_EXAMPLES}/array/array_example.sh

#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=200M
#SBATCH --output=array_example_%A_%a.out
#SBATCH --array=0-15

# You may put the commands below:

# Job step
srun echo "I am array task number" $SLURM_ARRAY_TASK_ID

Submitting the job script with sbatch ${HPC_EXAMPLES}/array/array_example.sh will return a message such as:

Submitted batch job 5825026

This job ID belongs to the primary array job, which encompasses all individual tasks in the array. Each task is also assigned a unique array task ID.

As multiple jobs run simultaneously, each requires a unique output file to prevent overwriting. By default, Slurm names the output files as slurm-${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}.out.

You can override this with --output=FILENAME, using placeholders %A for the job ID and %a for the array task ID.

Once the jobs complete, output files will appear in your working directory:

$ ls
array_example_5825026_0.out   array_example_5825026_12.out  array_example_5825026_15.out
array_example_5825026_3.out  array_example_5825026_6.out  array_example_5825026_9.out
array_example_5825026_10.out  array_example_5825026_13.out  array_example_5825026_1.out
array_example_5825026_4.out  array_example_5825026_7.out  array_example.sh
array_example_5825026_11.out  array_example_5825026_14.out  array_example_5825026_2.out
array_example_5825026_5.out  array_example_5825026_8.out

You can inspect any output file using cat:

$ cat array_example_5825026_11.out
I am array task number 11

Important

Array indices do not need to be sequential. If specific tasks fail, you can re-run only those with --array=1,4. The --array argument can also be supplied directly to sbatch from the command line.

More Examples

The following examples demonstrate how to effectively use job arrays and leverage the $SLURM_ARRAY_TASK_ID environment variable.

  • You need a clear mapping between job indices and configurations, which could be filenames, pre-defined parameter sets, or external configuration files.

  • Ensure the mapping remains consistent throughout multiple job runs to avoid inconsistencies.

Processing Multiple Input Files

Often, computations require processing different input files. The $SLURM_ARRAY_TASK_ID variable can dynamically assign files to jobs:

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=1G
#SBATCH --array=0-29

srun ./my_application -input input_data_${SLURM_ARRAY_TASK_ID}

Hardcoding Arguments in the Batch Script

Suppose you want to run a π estimation simulation with five different seed values, each executing 2.5 million iterations. The following two examples show different ways to embed these arguments in the batch script.

Case-Based Argument Selection

The script ${HPC_EXAMPLES}/array/pi_array_hardcoded_case.sh uses a case statement to choose arguments:

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-hardcoded
#SBATCH --output=pi-array-hardcoded_%a.out
#SBATCH --array=0-4

module load hpc-examples
echo "[DEBUG] Module path: $MODULEPATH"
module list || echo "[DEBUG] Failed to load hpc-examples"
echo "[DEBUG] HPC_EXAMPLES = $HPC_EXAMPLES"
ls -l ${HPC_EXAMPLES}/slurm/pi.py

case $SLURM_ARRAY_TASK_ID in
   0)  SEED=234 ;;
   1)  SEED=18  ;;
   2)  SEED=23  ;;
   3)  SEED=50  ;;
   4)  SEED=432 ;;
esac

python3 ${HPC_EXAMPLES}/slurm/pi.py 2500000 --seed=$SEED > pi_$SEED.json

Submit the script with:

$ module load hpc-examples
$ sbatch ${HPC_EXAMPLES}/array/pi_array_hardcoded_case.sh
Submitted batch job 5825718

Each task produces its own output, such as:

$ cat pi_18.json
{"pi_estimate": 3.1411456, "iterations": 2500000, "successes": 1963216}

Using Bash Arrays for Parameter Selection

An alternative approach using Bash arrays is demonstrated in the script ${HPC_EXAMPLES}/array/pi_array_hardcoded_array.sh :

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-hardcoded-case
#SBATCH --output=pi-array-hardcoded-case_%a.out
#SBATCH --array=0-4

SEED_ARRAY=(
   234
   18
   23
   50
   432
)

SEED=${SEED_ARRAY[$SLURM_ARRAY_TASK_ID]}

srun python3 ${HPC_EXAMPLES}/slurm/pi.py 2500000 --seed=$SEED > pi_$SEED.json

Submit the job with:

$ module load hpc-examples
$ sbatch ${HPC_EXAMPLES}/array/pi_array_hardcoded_array.sh

Reading Parameters from a File

Rather than hardcoding values, you can store them in a file and read them dynamically. For example, running pi.py with different iteration values:

Create a file named iterations.txt containing:

100
1000
50000
1000000

We can modify the previous script to read values using sed (see sed and man sed) ${HPC_EXAMPLES}/array/pi_array_parameter.sh :

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-parameter
#SBATCH --output=pi-array-parameter_%a.out
#SBATCH --array=1-4

ml hpc-examples

n=$SLURM_ARRAY_TASK_ID
iteration=`sed -n "${n} p" iterations.txt`      # Get n-th line (1-indexed) of the file
srun python3 ${HPC_EXAMPLES}/slurm/pi.py ${iteration} > pi_iter_${iteration}.json

This approach can be extended to read multiple parameters from CSV files or similar structured data formats.

Two-Dimensional Array Scanning (Advanced)

If your tasks are very short (a few minutes), launching numerous individual jobs can lead to scheduling inefficiencies and an overwhelming number of output files. In such cases, it is beneficial to group multiple tasks within a single array job.

Important

Ideally, each array job should run for at least 30 minutes. If your tasks are shorter than this, consider combining multiple runs into a single job to reduce scheduling overhead and improve efficiency.

A simple way to achieve this is by introducing a loop inside your Slurm script. For example, if you need to run a simulation with 50 different seed values, you can process them in groups of 10, reducing the number of array jobs to just 5. This significantly decreases the load on the scheduler.

An example implementation is provided in the script ${HPC_EXAMPLES}/array/pi_array_grouped.sh:

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-grouped
#SBATCH --output=pi-array-grouped_%a.out
#SBATCH --array=1-4

ml hpc-examples

# Lets create a new folder for our output files
mkdir -p json_files

CHUNKSIZE=10
n=$SLURM_ARRAY_TASK_ID
indexes=`seq $((n*CHUNKSIZE)) $(((n + 1)*CHUNKSIZE - 1))`

for i in $indexes
do
   srun python3 ${HPC_EXAMPLES}/slurm/pi.py 1500000 --seed=$i > json_files/pi_$i.json
done

Exercises

Array Job Exercises

Array-1: Compute n-grams with Array Jobs

Computing n-grams across the Gutenberg-Fiction dataset can take considerable time. Using array jobs is an efficient way to parallelise the process. Follow along with this example:

The following batch script ${HPC_EXAMPLES}/ngrams/array.sh calculates 3-grams in 20 batches, saving each result to a separate file:

#!/bin/bash
#SBATCH --mem=50G
#SBATCH --array=1-20
#SBATCH --time=00:15:00
#SBATCH --job-name=words-array

module load hpc-examples

mkdir -p ${TMPDIR}/ngrams-output/

python3 ${HPC_EXAMPLES}/ngrams/count.py ${HPC_EX_DATA}/Gutenberg-Fiction.zip \
  -n 3 --words \
  --start=$SLURM_ARRAY_TASK_ID --step=20 \
  -o ${TMPDIR}/ngrams-output/ngrams3-words-all-array_$SLURM_ARRAY_TASK_ID.out

# Next we merge the 20 individual output files into a single dataset and output 
# into the current working directory (since once the job completes $TMPDIR will be deleted).

python3 ${HPC_EXAMPLES}/ngrams/combine-counts.py ${TMPDIR}/ngrams-output/ngrams3-words-all-array_* -o ngrams3-words-all.out

The final output now contains all computed n-grams:

$ head -5 ngrams3-words-all.out
30224 ["i", "don", "t"]
18737 ["one", "of", "the"]
15954 ["out", "of", "the"]
14749 ["there", "was", "a"]
13122 ["it", "was", "a"]

Further Exercises

Array-2: Array Jobs and Random Seeds

Create an array job that runs ${HPC_EXAMPLES}/slurm/pi.py with different combinations of iteration counts and seed values. Save the results to separate files and keep the standard output (#SBATCH --output=FILE) distinct from the standard error (#SBATCH --error=FILE).

Array-3: Merging Outputs

Use the script ${HPC_EXAMPLES}/slurm/pi_aggregation.py to aggregate results from multiple output files. This will compute a more precise estimate of Pi.

Array-4: Applying Array Jobs to Your Own Work

Consider your typical workload. How could you divide it into smaller, independent tasks that can be processed in parallel using array jobs? Would it be more efficient to break larger tasks into multiple smaller ones?

(Advanced) Array-5: Using Advanced Indexing

Create a job array that runs every alternate index, such as 1, 3, 5, etc. The slurm sbatch manual page provides helpful details.

Solution

You can specify a step size for the job array using a colon and number after the range. For example: --array=1-X:2

Array-6: Varying Memory Requirements

Construct an array job that runs ${HPC_EXAMPLES}/slurm/memory-use.py with five different memory requirements (50M, 100M, 500M, 1000M, 5000M). Request 250M of memory for the array job itself. Observe whether any of the jobs fail.

Is this an appropriate use of array jobs?

Solution

At a minimum, the 5G job should fail. The 500M and 1G jobs also exceed their requested memory, but SLURM tolerates slight overuse before terminating jobs, so they may still succeed.

This is an incorrect use of array jobs. Arrays are intended for multiple tasks with identical resource requirements, as each task is allocated the same resources.

See Also

What’s Next?

The next tutorial covers shared memory parallelism.


This material contains material adapted from Aalto Scientific Computing Documentation, licensed under CC BY 4.0 Changes were made to the original content.