Attention

Advance Notice: Bessemer will be retired at the end of the day on Friday 31st October 2025.

PyTorch

PyTorch is an open source machine learning library for Python, based on Torch. It is used for applications such as natural language processing.

About PyTorch on Stanage

Note

A GPU-enabled worker node must be requested in order to enable GPU acceleration. See Using GPUs on Stanage for more information.

As PyTorch and all its dependencies are written in Python, it can be installed locally in your home directory. The use of Conda (Python) is recommended as it is able to create virtual environment(s) in your home directory, allowing for the installation of new Python packages without needing admin permissions. However, as official Conda support for PyTorch as ended, PyTorch itself should now be installed using pip within a conda environmment.

Note

The H100 GPU nodes in Stanage (see Stanage specs) require torch >= 2.0.0 built using CUDA 11.8 or newer.

Torch >= 2.1.0 can be installed using pip from pypi or https://download.pytorch.org/whl/cu121.
Torch >= 2.0.0 can be installed using pip from https://download.pytorch.org/whl/cu118.
Torch < 2.0.0 is not compatible with the H100 GPUs.
Official conda support has ended, but conda-forge may still work.

For more information on how to install pytorch using CUDA >= 11.8, see the torch documentation

Installation in Home Directory

Conda is used to create a virtual python environment for installing your local version of PyTorch.

Warning

Torch requires more than 2GB of CPU RAM for installation so you must use the --mem=8G flag to request more memory. 8G means 8 GB of CPU RAM.

First request an interactive session, e.g. with Interactive Jobs or optionally with GPU Interactive use of the GPUs.

# To request 8GB of CPU RAM for the session
srun --mem=8G --pty bash

# NB Each NVIDIA A100 (and H100) GPU in Stanage has 80GB of GPU RAM
srun --partition=gpu --qos=gpu --gres=gpu:1 --mem=82G --pty bash

Warning

Usage of the H100 GPUs requires the --partition=gpu-h100 and --gres=gpu:1 arguments to be set in your submission scripts. This is to ensure usage is “opt in” by users as the slightly different architecture of these GPUs to the existing A100 GPUs may necessitate changes to batch submission scripts and selected software versions.

Then PyTorch can be installed by the following

# Load the conda module
module load Anaconda3/2022.05

# (Only needed if we're using GPU) Load a cuDNN module
# (which in this case implicitly loads CUDA 12.1.1)
module load cuDNN/8.9.2.26-CUDA-12.1.1

# Create an conda virtual environment called 'pytorch'
conda create -n pytorch python=3.10

# Activate the 'pytorch' environment
source activate pytorch

# Install the latest stable PyTorch release
python -m pip install torch torchvision

Every Session Afterwards and in Your Job Scripts

Every time you use a new session or within your job scripts, the modules must be loaded and conda must be activated again. Use the following command to activate the Conda environment with PyTorch installed:

# Load the conda module
module load Anaconda3/2022.10
# *Only needed if we're using GPU* Load the CUDA and cuDNN module
module load cuDNN/8.9.2.26-CUDA-12.1.1
# Activate the 'pytorch' environment
source activate pytorch

Testing your PyTorch installation

To ensure that PyTorch was installed correctly, we can verify the installation by running sample PyTorch code e.g. an example from the official PyTorch getting started guide (replicated below).

Here we construct a randomly-initialized tensor:

import torch
x = torch.rand(5, 3)
print(x)

The output should be something similar to:

tensor([[0.3380, 0.3845, 0.3217],
        [0.8337, 0.9050, 0.2650],
        [0.2979, 0.7141, 0.9069],
        [0.1449, 0.1132, 0.1375],
        [0.4675, 0.3947, 0.1426]])

Additionally, to check if your GPU driver and CUDA is enabled and accessible by PyTorch, run the following commands to return whether or not the CUDA driver is enabled:

import torch
torch.cuda.is_available()

The output should be:

True