The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
After connecting to ShARC (see Establishing a SSH connection), start an interactive sesssion with the
The latest version of GATK (currently 4.1.4) is made available with the command
module load apps/gatk/4.1.4/binary
Version 4.1.4 of GATK also loads Java 1.8 module.
An environment variable called
GATKHOME is created by the module command that contains the path to the requested version of GATK.
Thus, you can run the program with the command
gatk Anytool toolArgs
You can obtain a full list of tools using