Attention

The ShARC HPC cluster was decommissioned on the 30th of November 2023 at 17:00. It is no longer possible for users to access that cluster.

GATK

The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Interactive Usage

After connecting to ShARC (see Establishing a SSH connection), start an interactive sesssion with the qsh, qrshx or qrsh command.

The latest version of GATK (currently 4.1.4) is made available with the command

module load apps/gatk/4.1.4/binary

Version 4.1.4 of GATK also loads Java 1.8 module.

An environment variable called GATKHOME is created by the module command that contains the path to the requested version of GATK.

Thus, you can run the program with the command

gatk Anytool toolArgs

You can obtain a full list of tools using

gatk --list

Installation notes

GATK 4.1.4 was installed using install_gatk.sh script, the module file is /usr/local/modulefiles/apps/gatk/4.1.4/binary.