Fork me on GitHub

4. Jupyter on SHARC: preparing your environment

4.1. Background

Once you have a Jupyter Notebook server running (e.g. on a cluster worker node) you typically want to consider your execution environment, which is your choice of

  • Kernel (code cell language)
  • software packages/libraries

4.1.1. Kernels

Jupyter Notebooks were originally known as IPython Notebooks but now Jupyter supports Notebook code cells written in many different languages. This is achieved by the (per-user) Jupyter Notebook server sending the contents of code cells to a Kernel for evaluation. The most widely used Kernels are:

  • the IPython Kernel, the default Kernel, which can run code cells containing Python >= 3.3 and Python 2.7;
  • IRKernel, a Kernel for R >= 3.2.

4.1.2. Packages

Most notebooks make use of external software packages for e.g. fitting statistical models to data. There are several different ways you might enable/load packages on ShARC (including module files and Singularity containers) but if using Jupyter it is recommended that you install and activate software using the conda package manager if possible. This can be done using Jupyter’s graphical interface (in most cases) or from the command-line.

4.1.3. Environments

conda packages are installed into environments. An environment is an isolated collection of packages. You might create:

  • One environment for one project containing Python 3, the IPython Kernel and the pandas and matplotlib Python packages (plus dependencies) for data analysis.
  • Another environment for a second project containing R, the IRKernel and the dplyr and gplot2 R packages.
  • A third environment containing Python 2.7 plus pandas but no Kernel for work that doesn’t involve Jupyter Notebooks.

conda allows users to

  • Install and manage packages without a system adminstrator needing to be involved;
  • Isolate and audit the set of packages used per project (good for reproducible research)
  • Share environment definitions with others and with automated test/build systems (i.e. continuous integration)

4.2. Using conda on ShARC via Jupyter

From the browser tab containing the Jupyter’s file browser activate the Conda tab within Jupyter’s interface.

You should then see something like:

../../_images/jupyter-conda-view.png

Here we have (in anticlockwise order) lists of:

  1. All conda environments that have been found and can be activated from within Jupyter;
  2. Packages (inc. versions) that can be installed into a selected environment; by default only the latest version from the default conda channel (package repository) can be installed this way.
  3. Packages that are installed in the environment selected in the above pane (plus the package version and the build version (e.g. version of Python it was built for)).

These three views may take a few seconds to populate after clicking Conda.

Warning

This interface currently does not reliably allow conda environments to be created or modified. Until this issue is resolved you can use this interface to inspect your environments but you should create and modify conda environments from a terminal.

4.2.1. Creating a new conda environment

Before we run a Notebook we typically want to create a new conda environment containing the packages we are interested in plus a Jupyter Kernel.

See the general documentation for using conda on ShARC for generic instructions on how to create conda environments from the command-line. Note that if you are using a Jupyter Terminal then you do not need to load conda using module load .... Make sure you install a package containing a Jupyter Kernel (e.g. ipykernel for Python work) into your environment.

When following that documentation you might want to use the following as starting points for creating Jupyter(Hub)-compatible environments:

Python 3:

conda create -n example-python-env python=3.6 ipykernel

R:

conda create -n example-r-env python=3.6 r-irkernel jupyter_client libiconv

Python from the Intel Python Distribution:

conda create -n example-intel-python-env -c intel intelpython3_core
ipykernel jupyter_client

4.2.2. Capturing the state of an environment

It is important to track the versions of packages you used to generate your research outputs, primarily to allow you and others to easily repeat your workflows. Ideally you should manage a file detailing the packages in your environment, plus your Notebook and other project files, in a version control system such as git.

After creating/modifying an environment:

  • Click on the export icon (left-most icon beneath Action) for a given environment to download a conda environment definition file.

  • Alternatively you can generate a definition file from a Jupyter Terminal:

    source activate my-env-name
    cd /data/$USER/research-project-7
    conda env export > environment.yml