News - 2026-01-12

Happy New Year, and welcome to the first HPC newsletter of 2026.

This newsletter’s details:

Have your say in future newsletters

We’re planning to feature more of your work in future HPC newsletters — including research stories, publications, notable results, and interesting use-cases.

In future newsletters, we hope to share research spotlights and outline how others can take part, either by answering a few short questions or by having a relaxed 20–30 minute chat.

The aim is to highlight the breadth of research supported by HPC and give proper visibility to the people behind it.

Access to External GPU-Accelerated HPC Resources

In addition to the GPU nodes in Stanage University of Sheffield researchers can also access a range of external HPC systems with GPU capability for suitable workloads. Available options include:

  • N8 Bede GPU cluster (V100 & Grace Hopper) – Sheffield is a partner organisation, and project resource requests can be made via the University’s shared facilities route.

  • National and European facilities (AIRR / Isambard-AI, Dawn, EuroHPC, and others) – Access is available through competitive calls, each with specific eligibility and workload requirements. Current opportunities

If you are interested in applying for a specific external resource call then please contact IT Services’ Research and Innovation team for support with your application.

Note

Applications must be submitted by University staff. PGRs may access these resources via their PI once an award has been granted.

Scheduled Maintenance

Routine Stanage Maintenance (proposed)

We intend to introduce periodic, pre-announced maintenance windows on Stanage to allow for security updates, firmware upgrades, and general system housekeeping.

A two-day maintenance window is currently planned: from 08:00 on 9th February 2026 until 17:00 on 10th February 2026. Further details have been shared via email.

Network maintenance – 17th January 2026

There will be network infrastructure maintenance on Saturday 17th January, during which connectivity to Stanage may be briefly disrupted. This may affect SSH access and file transfers for up to one hour at some point during the day. Further details have been shared via email.

Upcoming change: salloc behaviour for interactive jobs

The behaviour of salloc will be updated to align with Slurm’s recommended usage, changing salloc so that it behaves more like how srun is currently used for interactive jobs.

The main benefit of using salloc for interactive work is that it allows you to launch srun commands within an interactive allocation, which is not possible when using srun alone. This enables, for example, interactive testing of MPI workloads.

The change is planned for the coming months, and users will be notified once it has been applied.

More H100 NVL GPU Nodes on Stanage

In addition to the two H100 GPU nodes added last month, a further two new H100 GPU nodes are now available on Stanage in the gpu-h100-nvl partition. Each node is equipped with 4x NVIDIA H100 NVL GPUs, bringing the total to 16x H100 NVL GPUs.

These GPUs are designed for high-performance computing and AI workloads, delivering significant performance improvements over the older NVIDIA H100 PCIe GPUs. Please see New H100 NVL GPU Nodes on Stanage for more details, including earlier benchmarking results.

Full hardware specifications are available at GPU nodes, with usage guidance in Using GPUs on Stanage.

To support the new NVIDIA H100 NVL GPUs hosted on Intel-based nodes, we have updated the icelake software stack to include the same GPU-related packages as the znver3 stack.

A new Blackwell-based GPU node (1× server with 8× RTX 6000 Pro GPUs) has been ordered. It will be some time before this is operational, and support under EL7 is unlikely.

H100 NVL GPU node gpu31 returned to service

Sub-NUMA clustering, an Intel feature on modern processes, was inadvertently enabled on Stanage node gpu31. The node was taken out of service on 23rd December 2025 and the feature disabled.

While the node was drained, we re-ran a small set of NCCL benchmarks. The configuration changes made during maintenance resulted in an improvement of ~18% in basic NCCL tests.

Further testing also showed that moving from CUDA 11.x to CUDA 12.1 / NCCL 2.18.3 provides additional gains, with average NCCL bus bandwidth increasing by ~31–37% and peak bandwidth by ~44–47% in these tests.

The node is now configured consistently with the other H100-NVL nodes and shows a clear performance improvement.

New software installations

We have recently installed the following new software on Stanage:

Icelake

Znver3

  • No new software added

Lustre filesystem usage

This area is currently quite full (71% utilisation). Please can you remove any data from this filesystem that you no longer need by either deleting it or migrating what you want to e.g. a Shared Research Area. Also, please keep in mind that the Lustre filesystem in Stanage should primarily be treated as a temporary file store: it is optimised for performance and has no backups, so any data of value should not be kept on Lustre long-term.

Upcoming Training

Below are our key research computing training dates for the following month. You can register for these courses and more at MyDevelopment .

Warning

For our taught postgraduate users who don’t have access to MyDevelopment, please email us at mailto:researchcomputing@sheffield.ac.uk with the course you want to register for, and we should be able to help you.

  • 13/01/2026 - HPC Training Course.

  • 14/01/2026 - Supervised Machine Learning.

  • 20/01/2026 - Introducing AI into Research.

  • 23/01/2026 - Python Programming 1.

  • 27/01/2026 - Temporal Analysis in Python.

  • 29/01/2026 - Introduction to Linux and Shell Scripting.

  • 30/01/2026 - Python Programming 2.

  • 05/02/2026 - HPC Training Course.

  • 06/02/2026 - Python Programming 3.

  • 12/02/2026 - R programming 1.

  • 19/02/2026 - R programming 2.

The following training sessions are offered by our third-party collaborators:

EPCC (providers of the ARCHER2 HPC service) are running the following training sessions: