Quick start

Beginners quick start: run your first job on Stanage

This section outlines the absolute essentials to get you connected and submitting jobs. If a step isn’t clear, refer to the linked pages.

Before you start

You should:

  • Get an account: If you don’t yet have access, start here: Getting an Account

  • Be able to SSH in: You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN. Follow: Connecting to a cluster using SSH

Connect and use the login node appropriately

Open a terminal and SSH to Stanage. Once logged in you should land on a login node (e.g. login1).

  • Login nodes are for: editing files, preparing jobs, compiling, submitting jobs, checking results.

  • Login nodes are not for: running heavy computations.

If you need a refresher on basic shell commands, work through this workshop (You need to know how to move around and edit files.)

Put your code and data in the right place

Know where your files should live: Using the wrong filesystem can significantly impact performance and reliability. Read: Filestores

Typically:

  • Home: small config, scripts, source code (not backed up; treat it accordingly).

  • Fastdata: working datasets and results (not backed up; treat it accordingly).

  • Shared project storage: datasets and results you care about, login node only (backed up).

  • Scratch: temporary per-job I/O (suited to heavy I/O and many small files; copy anything you need before the job finishes).

If you want to transfer files to/from the cluster see Transferring files.

Interactive jobs (shell access on a compute node)

If you need to run commands (including loading software and testing code), do it on a compute node (worker node) via an interactive job.

srun --pty bash -i

Specify resources explicitly to request more RAM or CPU:

srun --mem=8000 --cpus-per-task=2 --pty bash -i

Interactive job guidance lives here: Interactive Jobs

Load software (modules)

Most software on Stanage is provided via environment modules.

Basics:

module avail
module spider <software-name>
module load <some/module>
module list

Proper guide: Activating software using Environment Modules

Your first batch job (copy/paste example)

Most work on Stanage is run as batch jobs: you write a job script, submit it, and the scheduler runs it on compute (worker) nodes.

Create a job script called hello.slurm, using either nano (see ‘create a text file’ section of the above linked workshop) or vim text editor:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:05:00
#SBATCH --mem=1000M
#SBATCH --cpus-per-task=1

hostname
date
echo "Hello from $SLURM_JOB_ID"
sleep 30
echo "Done"

Submit the job:

sbatch hello.slurm

It will respond with a job id.

Check the job output:

squeue --me

Read the output:

cat slurm-<jobid>.out

Read Job Submission and Control for more examples.

Many software have example batch scripts here: Software on Stanage.

Stopping a job

scancel <jobid>

Investigating failures

Start with:

  • your .out file (and any error file you configured)

  • sacct for exit state / resource usage

  • the debugging section in the docs

Choosing sensible resources

Start with something known, then refine — don’t guess.

  • Use the runtime and memory you see on a laptop/workstation as a baseline (assume single CPU cores on the cluster may be similar or slightly slower)

  • Add modest headroom for a first run (e.g. 1.5–2× runtime)

  • After the job finishes, check what it actually used with seff: seff <jobid>

  • Tighten time and memory requests in the next submission based on this output

Smaller, tighter requests usually start sooner. For more guidance, see: Choosing appropriate compute resources

Need help?

Before emailing support, check:

Then contact research-it@sheffield.ac.uk with:

  • job id(s)

  • the job script

  • the output/error logs

  • what you expected vs what happened

Experienced users quick start: essentials

You already know what a scheduler is and why login nodes are not for compute. This section highlights the Stanage-specific essentials for experienced HPC users.

If anything below feels unfamiliar, read the full docs page linked in that section.

Access and connection

  • You will need DUO Multifactor Authentication to authenticate; off campus you will also need the University SSL VPN: Connecting to a cluster using SSH

  • To reduce how often you need to reauthenticate with MFA (connection reuse, persistent sessions, tmux, PuTTY settings), see: Tips for working with MFA

Filesystems and where to run I/O

Read Filestores first, then decide where to put data.

Minimal mental model:

  • Home: small, convenient, not your archive (not backed up; treat it accordingly)

  • Fastdata: big, your real work data (not backed up; treat it accordingly)

  • Shared project storage: datasets and results you care about, login node only (backed up)

  • Scratch: per-job temporary; stage in/out inside the job

If you’re doing heavy I/O, make sure you’ve read Filestores .

Interactive jobs (debugging / short tests)

Use srun for an interactive allocation:

srun --mem=8000 --cpus-per-task=2 --pty bash -i

Full guidance is within: Interactive Jobs

Software environment

Stanage uses Environment Modules for most centrally-provided software: Activating software using Environment Modules

Handy commands:

module avail
module spider <name>
module show <module>
module purge
module load some/module
module list

Minimal batch job script

#!/bin/bash
#SBATCH --job-name=<name>
#SBATCH --time=HH:MM:SS
#SBATCH --mem=<n>M
#SBATCH --cpus-per-task=<n>

set -euo pipefail
module purge
module load <what-you-need>

# (optional) stage to parscratch / local as per filestore guidance
# run your code
# (optional) stage results back

Choosing resources

If you want jobs to start promptly and not get killed for exceeding limits, read: Choosing appropriate compute resources

Key ideas:

  • request only what you’ll use (fairshare and queueing exist for a reason)

  • set time/mem/CPU explicitly in scripts

  • understand --cpus-per-task vs --ntasks (OpenMP vs MPI)

Batch job submission and monitoring

Core docs: Job Submission and Control

Submit:

sbatch job.slurm

Monitor:

squeue --me

Cancel:

scancel <jobid>

Check accounting (after completion):

sacct -j <jobid>

Check efficiency using seff (after completion):

seff <jobid>

For job arrays, dependencies, and other workflow features, see: advanced job submission control.

Environment control: see basics and SLURM variables.

When using srun, we recommend --export=ALL (see srun versus mpirun or mpiexec)

Parallel jobs (MPI / OpenMP / arrays)

  • To choose the right approach for your workload (arrays, OpenMP, MPI, GPUs), see Parallel Computing.

GPUs

If you need GPUs, start with GPU computing tutorial then read Using GPUs on Stanage.

GPUs are a high-value resource, so request only what you need. Only request multiple GPUs if you’re sure your code can use them efficiently.

Debugging and profiling

Start with debugging failed jobs.

Then move to advanced job profiling and analysis when you need real answers.

Reference and FAQs

Support

If you contact research-it@sheffield.ac.uk, include:

  • job id(s)

  • job script(s)

  • output/error logs

  • relevant module list (module list) and key env vars

  • what you expected vs what happened