Python

This page documents the “anaconda” installation on Stanage. This is the recommended way of using Python, and the best way to be able to configure custom sets of packages for your use.

“conda” a Python package manager, allows you to create “environments” which are sets of packages that you can modify. It does this by installing them in your home area. This page will guide you through loading conda and then creating and modifying environments so you can install and use whatever Python packages you need.

Using Conda Python

Attention

We recommend that you use the following 2022 (sub)version of Anaconda3: Anaconda3/2022.05

The latest module, Anaconda3/2022.10 , is under investigation as this has demonstrated odd behaviour on conda environment exit (for some users). We will investigate this, and advise in due course.

After connecting to Stanage (see Establishing a SSH connection), start an interactive session with the following command:

srun --pty bash -i

Anaconda Python can be loaded with one of the following:

module load Anaconda3/2024.02-1
module load Anaconda3/2022.10
module load Anaconda3/2022.05
module load Anaconda3/2021.11
module load Anaconda3/2020.11
module load Anaconda3/2019.07

The root conda environment (the default) provides Python 3 and no extra modules, it is automatically updated, and not recommended for general use, just as a base for your own environments.

Warning

Due to Anaconda being installed in a module you must use the source command instead of conda when activating or deactivating environments!

Creating a Conda Environment

Every user can create their own environments, and packages shared with the system-wide environments will not be reinstalled or copied to your file store, they will be symlinked, this reduces the space you need in your /users/$USER directory to install many different Python environments.

To create a clean environment with just Python 3.8 and numpy you can run:

conda create -n mynumpy python=3.8 numpy

This will download the latest release of Python 3.8 and numpy, and create an environment named mynumpy.

Any version of Python or list of packages can be installed:

conda create -n myscience python=3.5 numpy=1.15.2 scipy

If you wish to modify an existing environment, such as one of the anaconda installations, you can clone that environment:

conda create --clone myscience -n myexperiment

This will create an environment called myexperiment which has all the same conda packages as the myscience environment.

How to avoid large conda environments filling up your home directory

Home directories have limited space and can often reach their quota limit. Conda environments exponentionally take up space, if you have or want to create one or more large Conda environments (e.g. containing bulky Deep Learning packages such as TensorFlow or PyTorch) then there’s a risk you’ll quickly use up your home directory’s storage quota.

To avoid this, build your conda environments in a fastdata area

  1. Create a .condarc file in your home directory if it does not already exist.

  2. Add an envs_dirs: and pkgs_dirs: section to your .condarc file as shown below:

pkgs_dirs:
- /mnt/parscratch/users/$USER/anaconda/.pkg-cache/

envs_dirs:
- /mnt/parscratch/users/$USER/anaconda/.envs
  1. We recommend users create their own personal folder in the /fastdata area. As this doesn’t exist by default, you can create it with safe permissions. See fastdata area

  2. Then create .envs and .pkg-cache directories in your fastdata area as shown below:

mkdir -p /mnt/parscratch/users/$USER/anaconda/.pkg-cache/  /mnt/parscratch/users/$USER/anaconda/.envs

Installations of environments and package caching should now occur in your fastdata area

Installing Packages Inside a Conda Environment

Once you have created your own environment you can install additional packages or different versions of packages into it. There are two methods for doing this, conda and pip, if a package is available through conda it is strongly recommended that you use conda to install packages. You can search for packages using conda:

conda search pandas

then install the package using:

conda install pandas

if you are not in your environment you will get a permission denied error when trying to install packages, if this happens, create or activate an environment you own.

If a package is not available through conda you can search for and install it using pip, i.e.:

pip search colormath
pip install colormath

Using conda Environments

Once the conda module is loaded you have to load or create the desired conda environments. For the documentation on conda environments see the conda documentation.

You can load a conda environment with:

source activate myexperiment

where myexperiment is the name of the environment, and unload one with:

source deactivate

which will return you to the root environment.

It is possible to list all the available environments with:

conda env list

Provided system-wide are a set of anaconda environments, these will be installed with the anaconda version number in the environment name, and never modified. They will therefore provide a static base for derivative environments or for using directly.

Using Conda and Python in a batch job

Create a batch job submission script called myscript.slurm that is similar to the following:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

export SLURM_EXPORT_ENV=ALL
module load Anaconda3/2022.10

# We assume that the conda environment 'myexperiment' has already been created
source activate myexperiment
srun python mywork.py

Then submit this to Slurm by running:

sbatch myscript.slurm

Further Conda Python Learning Resources

The resources and training courses below may be of interest:


Python Virtual Environments

Please note conda is the recommended way of using Python, and the best way to be able to configure custom sets of packages for your use, allowing greater flexibility and ease when handling complex dependencies. However, for simpler Python-only projects, venv can be a lightweight and effective alternative.

Why Use venv with Python?

  1. Optimised Performance - Python modules available via the module system are compiled specifically for Stanage’s hardware, which means they are optimised for the CPU microarchitecture, leading to improved performance for compute-intensive tasks.

  2. Seamless Integration with Other module system Software - By using --system-site-packages, your virtual environment can inherit system-level packages like wheel.

  3. Lightweight and Fast: - venv environments are lightweight compared to Conda environments, as they do not need to install a separate Python distribution. This means that the virtual environment only needs to store the user-specific packages and symlinks to the system-level Python libraries, reducing the amount of disk space required.

When Should You Use venv Instead of Conda?

  • Performance-critical applications: When you want to take advantage of the CPU-specific optimisations.

  • Integration with other module system software: Particularly when working with tightly coupled HPC tools like MPI.

  • Minimalist environments: When you prefer lightweight management and the overhead of managing a separate Conda distribution is unnecessary.

Conda offers a wider selection of precompiled packages, but for environments tailored to the cluster hardware, venv is lighter and avoids some of the conflicts that can arise with Conda’s own Python distribution.

This guide explains how to set up Python environments using venv as an alternative to Conda, specifically for users who want to leverage the Python interpreters optimised for the HPC environment. By following this guide, you will learn how to create, activate, and manage a virtual environment that integrates seamlessly with other module system built software.

After connecting to Stanage (see Establishing a SSH connection), start an interactive session with the following command:

srun --pty bash -i

Python can be loaded with one of the following:

module load Python/3.11.3-GCCcore-12.3.0
module load Python/3.10.8-GCCcore-12.2.0
module load Python/3.10.8-GCCcore-12.2.0-bare
module load Python/3.10.4-GCCcore-11.3.0
module load Python/3.10.4-GCCcore-11.3.0-bare
module load Python/3.9.6-GCCcore-11.2.0
module load Python/3.9.6-GCCcore-11.2.0-bare
module load Python/3.9.5-GCCcore-10.3.0
module load Python/3.9.5-GCCcore-10.3.0-bare
module load Python/3.8.6-GCCcore-10.2.0
module load Python/3.8.2-GCCcore-9.3.0
module load Python/2.7.18-GCCcore-12.2.0-bare
module load Python/2.7.18-GCCcore-11.3.0-bare
module load Python/2.7.18-GCCcore-11.2.0-bare
module load Python/2.7.18-GCCcore-10.2.0
module load Python/2.7.18-GCCcore-9.3.0

This command loads a version of Python, built using a version of GCCcore. Make sure to adjust the module name to match GCC/GCCcore version of any other modules you need to load.

Note

System-site packages for:
  • Python-bare setuptools, pip

  • Python flit_core, packaging, pip, setuptools, setuptools-scm, tomli, typing_extensions, wheel

Create a Virtual Environment with venv

Now that the optimised Python interpreter is loaded, you can create a virtual environment using venv.

To keep your virtual environments organised, you can set a variable in your .bashrc to define a consistent location:

echo export VENV_HOME=$HOME/.venvs >> ~/.bashrc
source ~/.bashrc

Note

For larger environments or for heavy users - due to limited home directory storage we recommend building your enviroments in parscratch:

mkdir -p /mnt/parscratch/users/$USER/
echo export VENV_HOME=/mnt/parscratch/users/$USER/.venvs >> ~/.bashrc
source ~/.bashrc

Let’s create a new venv which will allow you to install user-specific packages while still inheriting system-wide packages.

python -m venv --system-site-packages $VENV_HOME/my_venv
  • --system-site-packages This flag allows the virtual environment to use Python module system site packages.

  • my_venv This is the name of your virtual environment. You can replace it with any name you prefer.

After creating the virtual environment, activate it to begin using it:

source $VENV_HOME/my_venv/bin/activate

Once activated, you will see (my_venv) appear at the beginning of your command prompt, indicating that you are now working within the virtual environment.

(my_venv) [te1st@node001 [stanage] ~]$

Within the virtual environment, you can install any additional Python packages you need. For instance, to install numpy and pandas:

pip install numpy pandas

These packages will be installed locally in your virtual environment without affecting the system-wide packages. However, we could instead load a SciPy-bundle module, which adds its packages to the PYTHONPATH, making numpy and pandas importable (refer to Available System-site packages).

To list your virtual environments:

ls $VENV_HOME

To exit your virtual environment once you’re done, simply run the following command:

deactivate

Example Using Module System Packages with venv

If your project requires additional software that is available as a module, you can load those modules after creating or activating your virtual environment, such as:

module load mpi4py/3.1.4-gompi-2022b

This allows your virtual environment to access additional module system tools or libraries. The combination of Python packages in venv and modules like mpi4py ensures you have the necessary tools configured for optimal performance and compatibility.

Suppose you are working on a project that requires mpi4py for parallel computing. You can use the modular system loaded Python interpreter along with a venv environment that inherits the optimised MPI configuration:

Since we are demonstrating two approaches — one straightforward and another slightly safer by ensuring Python version consistency — you can choose the option (tab) that best suits your project’s needs.

module load Python/3.10.8-GCCcore-12.2.0
python -m venv --system-site-packages $VENV_HOME/my_venv
source $VENV_HOME/my_venv/bin/activate
module load mpi4py/3.1.4-gompi-2022b

In this example, mpi4py is already available from the module loaded system installation, enabling you to import mpi4py directly in your Python script. The OpenMPI module is also loaded to ensure proper MPI functionality.

Using this Python venv in a batch job:

Create a batch job submission script called batch.sh that is similar to the following:

#!/bin/bash
#SBATCH --ntasks-per-node=4
#SBATCH --time=10:00
#SBATCH --mem=4000

export SLURM_EXPORT_ENV=ALL
module load Python/3.10.8-GCCcore-12.2.0
module load mpi4py/3.1.4-gompi-2022b

# Use the virtual environment 'my_env' you already created
source $VENV_HOME/my_venv/bin/activate

# Run the Python script using srun, which can leverage MPI
srun python my_mpi_work.py

Where import mpi4py is present in my_mpi_work.py, and we use srun so that tasks are properly distributed when using MPI.

Then submit this to Slurm by running:

sbatch batch.sh
Key Points:
  • Use $VENV_HOME for consistent environment location.

  • Append $EBVERSIONPYTHON to environment names to avoid mixing Pyhton versions, whilst also setting $MY_ENV.

Available System-site packages

Examples of some module system packages that have python modules which can be imported are:

AmberTools

GitPython

mpi4py

PyYAML

archspec

gmpy2

networkx

scikit-build

astropy

h5py

ParaView

snakemake

Biopython

hypothesis

Pillow

sympy

CDFlib

IPython

pkgconfig

TensorFlow

cppy

jax

poetry

VTK

dill

Mako

protobuf-python

Xarray

flatbuffers

matplotlib

pybind11

yaff

GDAL

Meson

pytest-xdist

With a given module loaded you can list available python modules which we can import with the command:

ls $(echo $PYTHONPATH | awk -F: '{print $1}')

For example:

[te1st@node001 [stanage] ~]$ module load SciPy-bundle/2023.02-gfbf-2022b
[te1st@node001 [stanage] ~]$ ls $(echo $PYTHONPATH | awk -F: '{print $1}')
beniget  bottleneck  deap  gast  mpmath  numexpr  numpy  omp  pandas  ply  pythran  scipy

Here, for clairty we have removed files such as *.dist-info from the terminal output.

Attention

Always load a package module that matches the toolchain (i.e., GCC version) as that of the Python module you are using.


Installation notes

Anaconda3

Anaconda was installed using Easybuild, build details can be found in folder $EBROOTANACONDA3/easybuild with a given module loaded.

Python

Python was installed using Easybuild, build details can be found in folder $EBROOTPYTHON/easybuild with a given module loaded.