Attention
The ShARC HPC cluster was decommissioned on the 30th of November 2023 at 17:00. It is no longer possible for users to access that cluster.
GATK
The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Interactive Usage
After connecting to ShARC (see Establishing a SSH connection), start an interactive sesssion with the qsh
, qrshx
or qrsh
command.
The latest version of GATK (currently 4.1.4) is made available with the command
module load apps/gatk/4.1.4/binary
Version 4.1.4 of GATK also loads Java 1.8 module.
An environment variable called GATKHOME
is created by the module command that contains the path to the requested version of GATK.
Thus, you can run the program with the command
gatk Anytool toolArgs
You can obtain a full list of tools using
gatk --list
Installation notes
GATK 4.1.4 was installed using
install_gatk.sh
script, the module
file is
/usr/local/modulefiles/apps/gatk/4.1.4/binary
.