Quick start

Beginners quick start: run your first job on Stanage 

This section outlines the absolute essentials to get you connected and submitting jobs. If a step isn’t clear, refer to the linked pages.

Before you start

You should:

Get an account: If you don’t yet have access, start here: Getting an Account
Be able to SSH in: You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN. Follow: Connecting to a cluster using SSH

Connect and use the login node appropriately

Open a terminal and SSH to Stanage. Once logged in you should land on a login node (e.g. login1).

Login nodes are for: editing files, preparing jobs, compiling, submitting jobs, checking results.
Login nodes are not for: running heavy computations.

If you need a refresher on basic shell commands, work through this workshop (You need to know how to move around and edit files.)

Put your code and data in the right place

Know where your files should live: Using the wrong filesystem can significantly impact performance and reliability. Read: Filestores

Typically:

Home: small config, scripts, source code (not backed up; treat it accordingly).
Fastdata: working datasets and results (not backed up; treat it accordingly).
Shared project storage: datasets and results you care about, login node only (backed up).
Scratch: temporary per-job I/O (suited to heavy I/O and many small files; copy anything you need before the job finishes).

If you want to transfer files to/from the cluster see Transferring files.

Interactive jobs (shell access on a compute node)

If you need to run commands (including loading software and testing code), do it on a compute node (worker node) via an interactive job.

srun --pty bash -i

Specify resources explicitly to request more RAM or CPU:

srun --mem=8000 --cpus-per-task=2 --pty bash -i

Interactive job guidance lives here: Interactive Jobs

Load software (modules)

Most software on Stanage is provided via environment modules.

Basics:

module avail
module spider <software-name>
module load <some/module>
module list

Proper guide: Activating software using Environment Modules

Your first batch job (copy/paste example)

Most work on Stanage is run as batch jobs: you write a job script, submit it, and the scheduler runs it on compute (worker) nodes.

Create a job script called hello.slurm, using either nano (see ‘create a text file’ section of the above linked workshop) or vim text editor:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:05:00
#SBATCH --mem=1000M
#SBATCH --cpus-per-task=1

hostname
date
echo "Hello from $SLURM_JOB_ID"
sleep 30
echo "Done"

Submit the job:

sbatch hello.slurm

It will respond with a job id.

Check the job output:

squeue --me

Read the output:

cat slurm-<jobid>.out

Read Job Submission and Control for more examples.

Many software have example batch scripts here: Software on Stanage.

Stopping a job

scancel <jobid>

Investigating failures

Start with:

your .out file (and any error file you configured)
sacct for exit state / resource usage
the debugging section in the docs

Choosing sensible resources

Start with something known, then refine — don’t guess.

Use the runtime and memory you see on a laptop/workstation as a baseline (assume single CPU cores on the cluster may be similar or slightly slower)
Add modest headroom for a first run (e.g. 1.5–2× runtime)
After the job finishes, check what it actually used with seff: seff <jobid>
Tighten time and memory requests in the next submission based on this output

Smaller, tighter requests usually start sooner. For more guidance, see: Choosing appropriate compute resources

Need help?

Before emailing support, check:

Then contact research-it@sheffield.ac.uk with:

job id(s)
the job script
the output/error logs
what you expected vs what happened

Experienced users quick start: essentials 

You already know what a scheduler is and why login nodes are not for compute. This section highlights the Stanage-specific essentials for experienced HPC users.

If anything below feels unfamiliar, read the full docs page linked in that section.

Access and connection

You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN: Connecting to a cluster using SSH
To reduce how often you need to reauthenticate with MFA (connection reuse, persistent sessions, tmux, PuTTY settings), see: Tips for working with MFA

Filesystems and where to run I/O

Read Filestores first, then decide where to put data.

Minimal mental model:

Home: small, convenient, not your archive (not backed up; treat it accordingly)
Fastdata: big, your real work data (not backed up; treat it accordingly)
Shared project storage: datasets and results you care about, login node only (backed up)
Scratch: per-job temporary; stage in/out inside the job

If you’re doing heavy I/O, make sure you’ve read Filestores .

Interactive jobs (debugging / short tests)

Use srun for an interactive allocation:

srun --mem=8000 --cpus-per-task=2 --pty bash -i

Full guidance is within: Interactive Jobs

Software environment

Stanage uses Environment Modules for most centrally-provided software: Activating software using Environment Modules

Handy commands:

module avail
module spider <name>
module show <module>
module purge
module load some/module
module list

Minimal batch job script

#!/bin/bash
#SBATCH --job-name=<name>
#SBATCH --time=HH:MM:SS
#SBATCH --mem=<n>M
#SBATCH --cpus-per-task=<n>

set -euo pipefail
module purge
module load <what-you-need>

# (optional) stage to parscratch / local as per filestore guidance
# run your code
# (optional) stage results back

Choosing resources

If you want jobs to start promptly and not get killed for exceeding limits, read: Choosing appropriate compute resources

Key ideas:

request only what you’ll use (fairshare and queueing exist for a reason)
set time/mem/CPU explicitly in scripts
understand --cpus-per-task vs --ntasks (OpenMP vs MPI)

Batch job submission and monitoring

Core docs: Job Submission and Control

Submit:

sbatch job.slurm

Monitor:

squeue --me

Cancel:

scancel <jobid>

Check accounting (after completion):

sacct -j <jobid>

Check efficiency using seff (after completion):

seff <jobid>

For job arrays, dependencies, and other workflow features, see: advanced job submission control.

Environment control: see basics and SLURM variables.

When using srun, we recommend --export=ALL (see srun versus mpirun or mpiexec)

Parallel jobs (MPI / OpenMP / arrays)

To choose the right approach for your workload (arrays, OpenMP, MPI, GPUs), see Parallel Computing.

GPUs

If you need GPUs, start with GPU computing tutorial then read Using GPUs on Stanage.

GPUs are a high-value resource, so request only what you need. Only request multiple GPUs if you’re sure your code can use them efficiently.

Debugging and profiling

Start with debugging failed jobs.

Then move to advanced job profiling and analysis when you need real answers.

Reference and FAQs

Support

If you contact research-it@sheffield.ac.uk, include:

job id(s)
job script(s)
output/error logs
relevant module list (module list) and key env vars
what you expected vs what happened