SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM aims to be a format that is
flexible enough to store all the alignment information generated by various alignment programs
simple enough to be easily generated by alignment programs or converted from existing alignment formats
compact in file size
allows most of operations on the alignment to work on a stream without loading the whole alignment into memory
allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
SAMtools can be activated using the module file:
module load apps/SAMtools/1.7/gcc-4.9.4
Note: The module file also loads the compiler gcc/4.9.4.
Download the following example files using:
cd some_directory cp -r /usr/local/packages/apps/SAMtools/1.7/gcc-4.9.4/examples . cd examples # sample data # ex1.fa - contains 2 sequences cut from the human genome # ex1.sam.gz - contains MAQ alignments # 00README.txt contains instructions and description of results samtools #gives a list of options samtools faidx ex1.fa #index the reference FASTA samtools view -S -b -t ex1.fa.fai -o ex1.bam ex1.sam.gz #convert SAM to BAM samtools index ex1.bam #build index for BAM samtools view ex1.bam seq2:450-550 #view a portion of BAM file samtools tview -p seq2:450 ex1.bam exl.fa #visually inspect alignments at same location samtools mpileup -f ex1.fa ex1.bam #view data in pileup format samtools mpileup -vu -f ex1.fa ex1.bam > ex1.vcf #generate uncompressed VCF file of variants samtools mpileup -g -f ex1.fa ex1.bam > ex1.bcf #generate compressed VCF file of variants
SAMtools was compiled using the
install_SAMtools.sh script, the module