SAMtools
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that is
flexible enough to store all the alignment information generated by various alignment programs
simple enough to be easily generated by alignment programs or converted from existing alignment formats
compact in file size
allows most of operations on the alignment to work on a stream without loading the whole alignment into memory
allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Usage
SAMtools can be activated using the module file:
module load SAMtools/1.9-foss-2018b
Note: The module file also loads the compiler Easybuild foss-2018b toolchain (including GCC 7.3.0).
Test
Using the tutorial provided at http://quinlanlab.org/tutorials/samtools/samtools.html :
$ cd ~
$ mkdir samtools-demo
$ cd samtools-demo
$ curl https://s3.amazonaws.com/samtools-tutorial/sample.sam.gz > sample.sam.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 371M 100 371M 0 0 29.3M 0 0:00:12 0:00:12 --:--:-- 33.2M
$ gzip -d sample.sam.gz
$ samtools view -S -b sample.sam > sample.bam
$ samtools view sample.bam | head
HWI-ST354R:351:C0UPMACXX:5:1115:20112:49057 99 1 861268 60 100M = 861543
375 TCCCTCACAGGGTCTGCCTCGGCTCTGCTCGCAGGGAAAAGTCTGAAGACGCTTATGTCCAAGGGGATCCTGCAGGTGCATCCTCCGATCTGCGACTGCC
CCCFFFFFHHHGFHJIIJJJJJIJJJJJJJJIIJJIIJJIGCHCHGGIGIIJIJGHGFFFFFFFDD@BDCCCDDDDDCDDECC@C9<@BBDDDDDDD59>
MC:Z:100M MD:Z:100 RG:Z:1719PC0017_51 NM:i:0 MQ:i:60 AS:i:100 XS:i:0
$ # Further steps truncated.
Installation notes
SAMtools was compiled using EasyBuild. The module file generated is
/usr/local/modulefiles/live/eb/all/SAMtools/1.9-foss-2018b
and was
tested as per the tutorial above.