Quick start
Beginners quick start: run your first job on Stanage
This section outlines the absolute essentials to get you connected and submitting jobs. If a step isn’t clear, refer to the linked pages.
Before you start
You should:
Get an account: If you don’t yet have access, start here: Getting an Account
Be able to SSH in: You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN. Follow: Connecting to a cluster using SSH
Connect and use the login node appropriately
Open a terminal and SSH to Stanage.
Once logged in you should land on a login node (e.g. login1).
Login nodes are for: editing files, preparing jobs, compiling, submitting jobs, checking results.
Login nodes are not for: running heavy computations.
If you need a refresher on basic shell commands, work through this workshop (You need to know how to move around and edit files.)
Put your code and data in the right place
Know where your files should live: Using the wrong filesystem can significantly impact performance and reliability. Read: Filestores
Typically:
Home: small config, scripts, source code (not backed up; treat it accordingly).
Fastdata: working datasets and results (not backed up; treat it accordingly).
Shared project storage: datasets and results you care about, login node only (backed up).
Scratch: temporary per-job I/O (suited to heavy I/O and many small files; copy anything you need before the job finishes).
If you want to transfer files to/from the cluster see Transferring files.
Interactive jobs (shell access on a compute node)
If you need to run commands (including loading software and testing code), do it on a compute node (worker node) via an interactive job.
srun --pty bash -i
Specify resources explicitly to request more RAM or CPU:
srun --mem=8000 --cpus-per-task=2 --pty bash -i
Interactive job guidance lives here: Interactive Jobs
Load software (modules)
Most software on Stanage is provided via environment modules.
Basics:
module avail
module spider <software-name>
module load <some/module>
module list
Proper guide: Activating software using Environment Modules
Your first batch job (copy/paste example)
Most work on Stanage is run as batch jobs: you write a job script, submit it, and the scheduler runs it on compute (worker) nodes.
Create a job script called hello.slurm, using either nano (see ‘create a text file’ section of the above linked workshop) or vim text editor:
#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:05:00
#SBATCH --mem=1000M
#SBATCH --cpus-per-task=1
hostname
date
echo "Hello from $SLURM_JOB_ID"
sleep 30
echo "Done"
Submit the job:
sbatch hello.slurm
It will respond with a job id.
Check the job output:
squeue --me
Read the output:
cat slurm-<jobid>.out
Read Job Submission and Control for more examples.
Many software have example batch scripts here: Software on Stanage.
Stopping a job
scancel <jobid>
Investigating failures
Start with:
your
.outfile (and any error file you configured)sacctfor exit state / resource usagethe debugging section in the docs
Choosing sensible resources
Start with something known, then refine — don’t guess.
Use the runtime and memory you see on a laptop/workstation as a baseline (assume single CPU cores on the cluster may be similar or slightly slower)
Add modest headroom for a first run (e.g. 1.5–2× runtime)
After the job finishes, check what it actually used with seff:
seff <jobid>Tighten time and memory requests in the next submission based on this output
Smaller, tighter requests usually start sooner. For more guidance, see: Choosing appropriate compute resources
Need help?
Before emailing support, check:
Then contact research-it@sheffield.ac.uk with:
job id(s)
the job script
the output/error logs
what you expected vs what happened
Experienced users quick start: essentials
You already know what a scheduler is and why login nodes are not for compute. This section highlights the Stanage-specific essentials for experienced HPC users.
If anything below feels unfamiliar, read the full docs page linked in that section.
Access and connection
You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN: Connecting to a cluster using SSH
To reduce how often you need to reauthenticate with MFA (connection reuse, persistent sessions, tmux, PuTTY settings), see: Tips for working with MFA
Filesystems and where to run I/O
Read Filestores first, then decide where to put data.
Minimal mental model:
Home: small, convenient, not your archive (not backed up; treat it accordingly)
Fastdata: big, your real work data (not backed up; treat it accordingly)
Shared project storage: datasets and results you care about, login node only (backed up)
Scratch: per-job temporary; stage in/out inside the job
If you’re doing heavy I/O, make sure you’ve read Filestores .
Interactive jobs (debugging / short tests)
Use srun for an interactive allocation:
srun --mem=8000 --cpus-per-task=2 --pty bash -i
Full guidance is within: Interactive Jobs
Software environment
Stanage uses Environment Modules for most centrally-provided software: Activating software using Environment Modules
Handy commands:
module avail
module spider <name>
module show <module>
module purge
module load some/module
module list
Minimal batch job script
#!/bin/bash
#SBATCH --job-name=<name>
#SBATCH --time=HH:MM:SS
#SBATCH --mem=<n>M
#SBATCH --cpus-per-task=<n>
set -euo pipefail
module purge
module load <what-you-need>
# (optional) stage to parscratch / local as per filestore guidance
# run your code
# (optional) stage results back
Choosing resources
If you want jobs to start promptly and not get killed for exceeding limits, read: Choosing appropriate compute resources
Key ideas:
request only what you’ll use (fairshare and queueing exist for a reason)
set time/mem/CPU explicitly in scripts
understand
--cpus-per-taskvs--ntasks(OpenMP vs MPI)
Batch job submission and monitoring
Core docs: Job Submission and Control
Submit:
sbatch job.slurm
Monitor:
squeue --me
Cancel:
scancel <jobid>
Check accounting (after completion):
sacct -j <jobid>
Check efficiency using seff (after completion):
seff <jobid>
For job arrays, dependencies, and other workflow features, see: advanced job submission control.
Environment control: see basics and SLURM variables.
When using srun, we recommend --export=ALL (see srun versus mpirun or mpiexec)
Parallel jobs (MPI / OpenMP / arrays)
To choose the right approach for your workload (arrays, OpenMP, MPI, GPUs), see Parallel Computing.
GPUs
If you need GPUs, start with GPU computing tutorial then read Using GPUs on Stanage.
GPUs are a high-value resource, so request only what you need. Only request multiple GPUs if you’re sure your code can use them efficiently.
Debugging and profiling
Start with debugging failed jobs.
Then move to advanced job profiling and analysis when you need real answers.
Reference and FAQs
Support
If you contact research-it@sheffield.ac.uk, include:
job id(s)
job script(s)
output/error logs
relevant module list (
module list) and key env varswhat you expected vs what happened