CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA. It allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing, an approach known as General Purpose GPU (GPGPU) computing.
You need to first request one or more GPUs within an interactive session or batch job on a worker node. For example, to request a single GPU for an interactive session on a worker node:
qrshx -l gpu=1
See Using GPUs on ShARC for more information on how to request a GPU-enabled node for an interactive session or job submission.
You then need to load a version of the CUDA library (and compiler). There are several versions of the CUDA library available. As with much software installed on the cluster, versions of CUDA are activated via the ‘module load’ command:
module load libs/CUDA/11.0.2/binary module load libs/CUDA/10.2.89/binary module load libs/CUDA/10.1.243/binary module load libs/CUDA/10.0.130/binary module load libs/CUDA/9.1.85/binary module load libs/CUDA/9.0.176/binary module load libs/CUDA/8.0.44/binary module load libs/CUDA/7.5.18/binary
To then confirm which version of CUDA you are using:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Thu_Jun_11_22:26:38_PDT_2020 Cuda compilation tools, release 11.0, V11.0.194 Build cuda_11.0_bu.TC445_37.28540450_0
Important To compile CUDA programs you also need a compatible version of the GCC compiler:
CUDA 7.x and 8.x: GCC >= 4.7.0 (to allow for the use of c++11 features) and < 5.0.0
CUDA 9.x: GCC < 7.0.0
An example of the use of
nvcc (the CUDA compiler):
will compile the CUDA program contained in the file
You do not need to be using a GPU-enabled node to compile the sample programs but you do need a GPU to run them.
# Load modules module load libs/CUDA/11.0.2/binary # Copy CUDA samples to a local directory # It will create a directory called NVIDIA_CUDA-11.0_Samples/ mkdir cuda_samples cd cuda_samples cp -r $CUDA_SDK . # Compile (this will take a while) cd NVIDIA_CUDA-11.0_Samples/ make
make command then runs the
nvcc CUDA compiler and
generates a binary executable that you can then run on a node with
an NVIDIA GPU installed.
A basic test is to run one of the resulting binaries,
To achieve the best possible performance whilst being portable, GPU code should be generated for the architecture(s) it will be executed upon.
This is controlled by specifying
-gencode arguments to NVCC which,
allows for ‘fatbinary’ executables that are optimised for multiple device architectures.
-gencode argument requires two values,
the virtual architecture and real architecture,
for use in NVCC’s two-stage compilation.
-gencode=arch=compute_60,code=sm_60 specifies a virtual architecture of
compute_60 and real architecture
To support future hardware of higher compute capability,
-gencode argument can be used to enable Just in Time (JIT) compilation of embedded intermediate PTX code.
This argument should use the highest virtual architecture specified in other gencode arguments
for both the
The minimum specified virtual architecture must be less than or equal to the Compute Capability of the GPU used to execute the code.
Public GPU nodes in ShARC contain Tesla K80 GPUs, which are compute capability 37.
To build a CUDA application which targets the public GPUS nodes, use the following
nvcc filename.cu \ -gencode=arch=compute_37,code=sm_37 \ -gencode=arch=compute_37,code=compute_37
ShARC also contains Tesla P100 GPUs and Tesla V100 GPUs in private nodes,
which are compute capability 60 and 70 respectively.
To build a CUDA application which targets any GPU on ShARC (either public or private),
use the following
-gencode arguments (for CUDA 9.0 and above):
nvcc filename.cu \ -gencode=arch=compute_37,code=sm_37 \ -gencode=arch=compute_60,code=sm_60 \ -gencode=arch=compute_70,code=sm_70 \ -gencode=arch=compute_70,code=compute_70
SM 60 for Pascal GPUs is only available for CUDA 8.0 and above.
SM 70 for Volta GPUs is only available for CUDA 9.0 and above.
nvprof, NVIDIA’s CUDA profiler,
cannot write output to the
This is because the profiler’s output is a SQLite database
and SQLite requires a filesystem that supports file locking
but file locking is not enabled on the (Lustre) filesystem mounted on
(for performance reasons).
Run the command:
Example output is:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.51.05 Sun Jun 28 10:33:40 UTC 2020 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)
These are primarily for system administrators.
The NVIDIA device driver is installed and configured using the
gpu-nvidia-driver systemd service (managed by Puppet).
This service runs
/usr/local/scripts/gpu-nvidia-driver.sh at boot time to:
Check the device driver version and uninstall it then reinstall the target version if required;
nvidia kernel module;
Create several device nodes in