Attention

WARNING: From 9am on 19th August until 5pm on 2nd September there will be no access to the Stanage HPC cluster.

We will send an email to notify you when Stanage is back online and available for job submission.

DRMAA

The Distributed Resource Management Application API (DRMAA) is a specification for programatically submitting/controlling jobs to/on job scheduling software such as the Grid Engine software used on the ShARC cluster.

Support for DRMAA is built in to Grid Engine and other scheduling software such as Torque, PBS, HTCondor and SLURM.

There are DRMAA bindings (i.e. wrappers) available for several programming languages.

Why is it useful?

Using DRMAA you may find it easier to automate the submission and control of jobs.

Also, you might want to use a Computational Pipeline manager (such as Ruffus) to run sets of related (possibly dependent) jobs on a cluster; these may use DRMAA beind the scenes to create and manage jobs.

Using DRMAA from Python

Install DRMAA bindings

The Python bindings for DRMAA are compatible with Python 2 and Python 3. The simplest way to install them is to use a conda environment e.g.

# Connect to ShARC
ssh sharc
# Start an interactive session
qrshx
# Ensure conda is available
module load apps/python/conda
# Create a conda environment
conda create -n drmaa-test-sharc python=3.6
# Activate this environment
source activate drmaa-test-sharc
# Install the python bindings for DRMAA
pip install drmaa

Submitting a job

You can then submit a job using a script like this:

drmaa_submit_job.py

#!/usr/bin/env python

from __future__ import print_function
import drmaa
import os


def main():
    """
    Create a DRMAA session then submit a job.
    Note, need file called myjob.sh in the current directory.
    """
    with drmaa.Session() as s:
        print('Creating a job template')
        jt = s.createJobTemplate()
        # The job is to run an executable in the current working directory
        jt.remoteCommand = os.path.join(os.getcwd(), 'myjob.sh')
        # Arguments to the remote command
        jt.args = ['alpha', 'beta']
        # Join the standard output and error logs
        jt.joinFiles = True

        job_id = s.runJob(jt)
        print('Job {} submitted'.format(job_id))

        print('Cleaning up')
        s.deleteJobTemplate(jt)


if __name__ == '__main__':
    main()

where myjob.sh is:

#!/bin/bash
echo "Hello from ${JOB_ID}. I received these arguments: $1, $2"

To actually submit the job:

$ # Make your job script executable
$ chmod +x myjob.sh

$ python drmaa_submit_job.py
Creating a job template
Job 401022 submitted
Cleaning up

$ cat myjob.sh.o401022
Hello from 401022. I received these arguments: alpha, beta

You can submit multiple jobs iteratively, reusing your DRMAA job template for efficiency.

If you want to submit a job array rather than a single job then you can call the runBulkJob method instead of runJob.

Warning

Starting a single DRMAA session then attempting to submit jobs from different Python threads currently does not work.

Email notifications

You can enable email notifications for jobs finishing or aborting by setting two attributes of your JobTemplate object (jt in this example):

jt.blockEmail = False
jt.email = ['a.person@sheffield.ac.uk']

This is equivalent to having the following in a Grid Engine batch job submission script:

#$ -m ea
#$ -M a.person@sheffield.ac.uk

Other useful job template attributes

Requesting arbitrary resources from Grid Engine such as real memory (rmem), parallel environments (e.g. -pe mpi), run time (h_rt), Grid Engine Projects e.g.
```
jt.nativeSpecification = '-l h_rt=00:30:00 -l rmem=2G -pe smp 2'
```

Set the paths to the job standard output and error files:

jt.outputPath = os.path.join(os.getcwd(), 'output.log')
jt.errorPath = os.path.join(os.getcwd(), 'error.log')

Send standard output and error to the same file:
```
jt.joinFiles = True
```
Name your job (equivalent to -N my_job in a Grid Engine batch job submission script):
```
jt.jobName = 'my_job'
```

Set one or more environment variables in your job using a Python dictionary:

jt.jobEnvironment = {'SOMEKEY': 'somevalue', 'ANOTHERKEY', 'anothervalue'}

Warning

You cannot presently request one or more GPUs using DRMAA without explicitly using the gpu Grid Engine Project i.e. jobs created using a Job Template jt where jt.nativeSpecification = '-l gpu=1' will be rejected by the job scheduler but jobs with jt.nativeSpecification = '-P gpu -l gpu=1' will be accepted. This is due to a bug in the scheduler.

Waiting on jobs

Within a DRMAA Session you can wait indefinitely or forever for one, several or all submitted jobs to complete. See the documentation for more information including an example. Jobs are identified by (Grid Engine) job ID, so they do not need to have been submitted by DRMAA.

Controlling jobs

Within a DRMAA Session you can also terminate, suspend, resume, hold and release a job. See the documentation for more information including an example. Again, jobs are identified by (Grid Engine) job ID, so they do not need to have been submitted by DRMAA.

Checking the status of a job

Within a DRMAA Session you can check to see if any job is queuing, running, has completed successfully, has failed or is in some other state. See the documentation for more information including an example.

Futher information

See the documentation for the DRMAA Python bindings; you may find the enclosed reference information useful. Note that not all features of these bindings are described in this documentation!

Java bindings

Java bindings for DRMAA are not currently available on ShARC.

Administrator notes

Grid Engine’s implementation of the DRMAA shared library lives at $SGE_ROOT/lib/lx-amd64/libdrmaa.so.

It is discovered by bindings using the DRMAA_LIBRARY_PATH environment variable, which is set to the above for all users using a shell script in /etc/profile.d/.