Attention

The ShARC HPC cluster was decommissioned on the 30th of November 2023 at 17:00. It is no longer possible for users to access that cluster.

4. Jupyter on SHARC: preparing your environment

4.1. Background

Once you have a Jupyter Notebook server running (e.g. on a cluster worker node) you typically want to consider your execution environment, which is your choice of

  • Kernel (code cell language)

  • software packages/libraries

4.1.1. Kernels

Jupyter Notebooks were originally known as IPython Notebooks but now Jupyter supports Notebook code cells written in many different languages. This is achieved by the (per-user) Jupyter Notebook server sending the contents of code cells to a Kernel for evaluation. The most widely used Kernels are:

  • the IPython Kernel, the default Kernel, which can run code cells containing Python >= 3.3 and Python 2.7;

  • IRKernel, a Kernel for R >= 3.2.

4.1.2. Packages

Most notebooks make use of external software packages for e.g. fitting statistical models to data. There are several different ways you might enable/load packages on ShARC (including module files and Apptainer containers) but if using Jupyter it is recommended that you install and activate software using the conda package manager if possible. This can be done using Jupyter’s graphical interface (in most cases) or from the command-line.

4.1.3. Environments

conda packages are installed into environments. An environment is an isolated collection of packages. You might create:

  • One environment for one project containing Python 3, the IPython Kernel and the pandas and matplotlib Python packages (plus dependencies) for data analysis.

  • Another environment for a second project containing R, the IRKernel and the dplyr and gplot2 R packages.

  • A third environment containing Python 2.7 plus pandas but no Kernel for work that doesn’t involve Jupyter Notebooks.

conda allows users to

  • Install and manage packages without a system adminstrator needing to be involved;

  • Isolate and audit the set of packages used per project (good for reproducible research)

  • Share environment definitions with others and with automated test/build systems (i.e. continuous integration)

4.2. Using conda on ShARC via Jupyter

At this time, Conda environments on ShARC can only be inspected, created and modified from the command-line, not directly using JupyterLab’s interface. However, we can use a JupyterLab-provided Terminal to give us a command-line environment, so we don’t need to leave JupyterHub/JupyterLab for this.

4.2.1. Creating a new conda environment

Before we run a Notebook we typically want to create a new conda environment containing the packages we are interested in plus a Jupyter Kernel.

See the general documentation for using conda on ShARC for generic instructions on how to create conda environments from the command-line. Note that if you are using a JupyterLab Terminal then you do not need to load conda using module load .... Make sure you install a package containing a Jupyter Kernel (e.g. ipykernel for Python work or r-ipykernel for R work) into your environment.

When following that documentation you might want to use the following as starting points for creating Jupyter(Hub)-compatible environments:

Python 3:

conda create -n example-python-env python=3.10 ipykernel

R:

conda create -n example-r-env r-irkernel jupyter_client libiconv

Python from the Intel Python Distribution:

conda create -n example-intel-python-env -c intel intelpython3_core
ipykernel jupyter_client

Note

If you want to create one or more large Conda environments then there is a risk this will fill up your 10GB home directory.

You may want to explicitly tell Conda to create environments in a location with a larger quota.

4.2.2. Capturing the state of an environment

It is important to track the versions of packages you used to generate your research outputs, primarily to allow you and others to easily repeat your workflows. Ideally you should manage a file detailing the packages in your environment, plus your Notebook and other project files, in a version control system such as git.

After creating/modifying an environment:

  • Click on the export icon (left-most icon beneath Action) for a given environment to download a conda environment definition file.

  • Alternatively you can generate a definition file from a JupyterLab Terminal:

    source activate my-env-name
    cd /data/$USER/research-project-7
    conda env export > environment.yml
    

4.3. Next

After you have assessed what environments you have available, you can start creating, editing and running Jupyter Notebooks.