Activating software using Environment Modules

Note

Additional detailed information on the Environment Modules software can be found on the project’s site.

Overview and rationale

‘Environment Modules’ are the mechanism by which much of the software is made available to the users of our clusters (Stanage and Bessemer).

To make a particular piece of software available a user will load a module e.g. on Stanage you can load a particular version of the OpenFOAM application (version 22.06, built with a particular compiler, BLAS library and MPI implementation (collectively the foss-2022a toolchain)) with:

module load OpenFOAM/v2206-foss-2022a

This command manipulates environment variables to make this piece of software available. If you then want to switch to using a different version of scotch (should another be installed on the cluster you are using) then you can run:

module unload OpenFOAM/v2206-foss-2022a

Then load the other version.

You may wonder why modules are necessary: why not just install packages provided by the vendor of the operating system installed on the cluster? In shared high-performance computing environments such as our clusters:

  • Users typically want control over the version of applications that is used (e.g. to give greater confidence that results of numerical simulations can be reproduced);

  • Users may want to use applications built using compiler X rather than compiler Y as compiler X might generate faster code and/or more accurate numerical results in certain situations;

  • Users may want a version of an application built with support for particular parallelisation mechanisms such as MPI for distributing work within and between machines, OpenMP for distributing work between CPU cores or CUDA for parallelisation on GPUs);

  • Users may want an application built with support for a particular library.

There is therefore a need to maintain multiple versions of the same applications on our clusters. Module files allow users to select and use the versions they need for their research.

If you switch to using a cluster other than Stanage or Bessemer then you will likely find that environment modules are used there too. Modules are not the only way of managing software on clusters: increasingly common approaches include:

  • The Conda package manager (Python-centric but can manage software written in any language);

  • Apptainer/Singularity, a means for deploying software in containers (similar to Docker.


Basic guide

You can list all (loaded and unloaded) modules on our clusters using:

module avail

You can then load a module using e.g.:

module load GEOS/3.9.1-GCC-11.2.0

Note

Modules are not available on Bessemer’s login nodes. You must start an interactive job on a worker node using srun (see Job Submission and Control) before any of the following commands will work.

You can then load further modules e.g.:

module load PROJ/8.1.0-GCCcore-11.2.0

Confirm which modules you have loaded using:

module list

If you want to stop using a module (by undoing the changes that loading that module made to your environment):

module unload  PROJ/8.1.0-GCCcore-11.2.0

Or to unload all loaded modules:

module purge

To learn more about what software is available on the system and discover the names of module files, you can view the online documentation for

The name of a Module should tell you:

  • The type of software (application, library, development tool (e.g. compiler), parallel computing software);

  • The name and version of the software;

  • The name and version of compiler that the software was built using (if applicable; not all installed software was installed from source);

  • The name and version of used libraries that distinguish the different installs of a given piece of software (e.g. the version of OpenMPI an application was built with).

Some other things to be aware of:

  • You can load and unload modules in both interactive and batch jobs;

  • Modules may themselves load other modules. If this is the case for a given module then it is typically noted in our documentation for the corresponding software;

  • Available applications and application versions may differ between our clusters;

  • The order in which you load modules may be significant (e.g. if module A sets SOME_ENV_VAR=apple and module B sets SOME_ENV_VAR=pear);

  • Related module files e.g. multiple versions of the same application typically cannot be loaded concurrently.


Searching for Modules

You can search for a module using:

module -t --redirect avail |& grep -i somename

Where you replace somename with the string you wish to search for.

You may wish to setup a bash alias in your $HOME/.bashrc file with this as a short cut e.g. :

alias modulefind="module -t --redirect avail |& grep -i"

After sourcing $HOME/.bashrc this command can then be called like so:

$ source $HOME/.bashrc
$ modulefind fftw
FFTW.MPI/
FFTW.MPI/3.3.10-gompi-2022a
FFTW.MPI/3.3.10-gompi-2022b
FFTW/
FFTW/3.3.8-gompi-2019b
FFTW/3.3.8-gompi-2020a
FFTW/3.3.8-gompi-2020b
FFTW/3.3.10-GCC-11.3.0
FFTW/3.3.10-GCC-12.2.0
imkl-FFTW/
imkl-FFTW/2021.4.0-iimpi-2021b
imkl-FFTW/2022.1.0-iimpi-2022a
imkl-FFTW/2022.2.1-iimpi-2022b

Another option is to use:

module spider somename

Behind the scenes

Let’s look at what happens when you load an environment module. If we inspect the contents of a module file we see something like:

$ module show dev/NAG/6.1
-------------------------------------------------------------------
/usr/local/modulefiles/dev/NAG/6.1:

module-whatis   Makes the NAG Fortran Compiler v6.1 available
conflict        dev/NAG
prepend-path    PATH /usr/local/packages/dev/NAG/6.1/bin
prepend-path    MANPATH /usr/local/packages/dev/NAG/6.1/man
setenv          NAG_KUSARI_FILE /usr/local/packages/dev/NAG/license.lic

Here we see:

  • The full path to the file that contains the definition of this module;

  • A line briefly describing the purpose of the module (which could have been viewed separately using module whatis dev/NAG/6.1);

  • An instruction not to load any other module files that start with dev/NAG as they will cause a conflict;

  • A directory is prepended to the standard PATH variable: this ensures that executables relating to dev/NAG/6.1 are preferentially used unrelated executables in PATH directories that share the same filenames. Note that this directory is specific to this version (6.1) of the application we want to use;

  • A directory is prepended to the standard MANPATH variable to ensure that the documentation (man pages) that the vendor bundled with the application can be found;

  • An application-specific environment variable, NAG_KUSARI_FILE, is set (here to ensure that the application can find a license file).

If you run the ‘env’ command before and after loading a module you can see the effect of these changes.


Convenient ways to set up your environment for different projects

If you regularly need to activate multiple modules whilst working on a given project it may be tempting to add the necessary module load commands to a shell startup script (e.g. the .bashrc script in your home directory). However, this is a bad idea for several reasons:

  • Over time you will forget what is in your .bashrc and may forget that your workflow is dependent on modules loaded by the script;

  • Your .bashrc script may not be managed using version control (e.g. Git) or, if it is, it is unlikely to be in the same repository as your project scripts/code;

  • If someone asks you in three months’ time what version of an application you used to run a simulation will you be able to tell them?

A better approach is to create a module-loading script inside the directory containing your project’s other scripts then source (run) this script.

For example, you could have project scripts stored in a directory called /home/te1st/proj1.

You could create a script in that directory called setup_env.sh containing:

module load compilers/pgi/13.1
module load mpi/pgi/openmpi/1.6.4

Then if you want to load these modules in an interactive session or in a batch job you could run:

source /home/te1st/proj1/setup_env.sh

If you want to run the job on Stanage and Bessemer (which provide different software / module files) you could adapt your script to load different modules depending on which cluster you are using:

if [[ "$HOSTNAME" == *"stanage"* ]]; then
    # On Stanage:
    module load some/module
    module load another/module
elif [[ "$HOSTNAME" == *"bessemer"* ]]; then
    # On Bessemer:
    hostname="bessemer"
    module load different/module
fi

Managing your environment this way is more likely to result in reproducible research, particularly if changes to the content of /home/te1st/proj1 are tracked using Git or another version control tool


Managing your own module files

Modules are a great way of loading/unloading software installed in non-standard places. You may therefore want to use them to manage software installed in

  • your home directory

  • a directory shared by your research group

If you want your own Modules, you typically need to create a hierarchy of directories and files. Within a base directory the relative path to a given module file determines the name you need to use to load it. Access the directories stored in the variable $MODULEPATH to:

  • see the files that provide all cluster-wide modules and

  • get an understanding of the (Tcl) syntax and structure of module files.

A tutorial on how to write module files is not provided here (but may be in future).

Once you’ve created a set of module files within a directory you can make the module system aware of them by running:

module use /the/path/to/my/modules

The next time you run module avail you will see that your modules are listed alongside the cluster-wide modules.

If you no longer want to to have access to your own module files then you can run:

module unuse /the/path/to/my/modules

Compiling software dependent on modules

In most cases, if you are compiling software with dependencies on modules the only actions you need to take are to load the required modules, run any ./configure or CMake steps and then run the make, make check (if available) and make install commands to build, check and install the software.

Once the software is installed, each time you use the software you must first load the modules used to compile it. This is necessary to make the required libraries and other files used during the compilation available to the program.

For more detailed information on the software installation process, please see: Installing software to the clusters.

You will have to construct/edit your own customised makefile which may have to reference specific libraries and paths if:

  • There are no preconfiguration steps available to generate a suitable makefile based on the current shell environment after loading modules.

  • An example makefile for editing is provided.

  • No makefile is provided.

In this case, you can use the module show modulename command to show how the module file for your loaded software module/s are interacting with your shell environment to populate the $PATH, $LD_LIBRARY_PATH and other environment variables.

You can then navigate to any directories of interest or use the find or grep commands to search them as required.


Module Command Reference

Here is a list of the most useful module commands. For full details, type man module at the command prompt on one of the clusters.

  • module list – lists currently loaded modules

  • module avail – lists all available modules

  • module load modulename – loads module modulename

  • module unload modulename – unloads module modulename

  • module switch oldmodulename newmodulename – switches between two modules

  • module show modulename - Shows how loading modulename will affect your environment

  • module purge – unload all modules

  • module help modulename – may show longer description of the module if present in the modulefile

  • man module – detailed explanation of the above commands and others

  • ml --help – outlines module shorthand commands