Fork me on GitHub

Jupyter and JupyterHub

Introduction

Jupyter Notebooks are executable documents containing: formatted text, formatted maths, chunks of code plus figures, tables and textual output generated by that code.

Notebooks can be used:

Jupyter itself is web application that interprets, runs and renders Notebooks. You interact with it by As you interact with it by just connecting from your web browser the Jupyter server software can be running on your local machine or a remote server (which may have more memory, CPU cores and/or GPUs than your local machine).

On the university’s ShARC cluster a (beta) JupyterHub service allows a user to:

  1. Log in to the JupyterHub web inteface,
  2. Specify what resources (memory, CPU cores, GPUs) they want for a Jupyter session,
  3. Start and run a Jupyter Notebook server on a worker node in the cluster using these resources.

Status of and maintenance of ShARC’s JupyterHub service

This service is currenty experimental. If you use this service and encounter a problem, please provide feedback to w.furnass@sheffield.ac.uk.

The server that provider the JupyterHub service is typically rebooted at 03:26 on the 2nd Tuesday of the month to install security updates.

JupyterHub on a Grid Engine cluster: internal workings

The hub of JupyterHub has several components:

  • an authenticator that allows users to log in, possibly using externally-stored identity information;
  • a database of user and state information;
  • a spawner that can start single-user Jupyter Notebook servers on demand.
JupyterHub architecture

There is also a web proxy that first routes web connections from a given user to the hub for authentication and possibly choosing spawner options then, after a single-user Jupyter server has been spawned, certain web connections are forwarded to the Jupyter Notebook server. From the user’s perspective it appears that they are interacting with a single web application, even though at times they might be talking to a single-user Jupyter server that running on a different machine to the Hub.

ShARC uses a custom spawner, sgespawner, that spawns single-user Jupyter servers on one or more worker nodes on ShARC by submitting batch jobs to the Grid Engine job scheduler.

The JupyterHub and sgespawner configuration allows the user to specify the Grid Engine resources required for the Jupyter session in advance via a web form then these resources are requested as part of the batch job submission.

Further details of how JupyterHub and sgespawner are configured on ShARC can be found in this repository.

Credits

The JupyterHub service on ShARC is currently developed and maintained by the University’s Research Software Engineering team. This work has been funded by OpenDreamKit, a Horizon2020 European Research Infrastructure project (676541) that aims to advance the open source computational mathematics ecosystem.

OpenDreamKit logo

Using Jupyter on Iceberg

From a web browser navigate to:

https://jupyter.shef.ac.uk

The JupyterHub service for Iceberg predates that of ShARC and is much more basic. There is currently no way you can request specific resources (multiple CPU cores, GPU(s), more RAM etc) from the cluster’s job scheduler when using JupyterHub. Your Notebook session will therefore get one CPU core and the default amount of RAM per job for the cluster you’ve connected to.