STAR

STAR (Spliced Transcripts Alignment to a Reference) is a software for RNA sequence alignment. STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.

The latest STAR manual can be found at: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf which will detail the many available command arguments.


Interactive usage

After connecting to Bessemer (see Establishing a SSH connection), start an interactive session with the srun --pty bash -i command.

STAR can be loaded with the command:

module load STAR/2.7.6a-GCC-9.3.0

After this any of the STAR commands can be run from the terminal prompt. The available commands can be obtained using:

STAR --help

Batch usage

Download and unpack the input files from https://www.gencodegenes.org/human/. We will use the comprehensive gene annotation (regions: PRI), for the purpose for this example we use only the data/annotations for chromosome 19 to map to.

#!/bin/bash
# download the "Genome sequence, primary assembly (GRCh38)" fasta file
wget https://ndownloader.figshare.com/files/14669702 && \
mv 14669702 GRCh38.primary_assembly.genome.chr19.fa.gz && \
gunzip GRCh38.primary_assembly.genome.chr19.fa.gz

# download the annotations that correspond to it
wget https://ndownloader.figshare.com/files/14658830 && \
mv 14658830 gencode.v29.primary_assembly.annotation.chr19.gtf.gz && \
gunzip gencode.v29.primary_assembly.annotation.chr19.gtf.gz

The following is an example batch submission script, my_job.sh, to run the executable STAR. The script requests 4 cores using the OpenMP library and multi-threading with a runtime of 5 minutes and 2 GB of real memory per core to generate a genome index using the above data/annotations.

#!/bin/bash
#SBATCH --job-name=STAR_test
#SBATCH --cpus-per-task=4
#SBATCH --mem=2000
#SBATCH --output=output_STAR_4.%j.out
#SBATCH --time=00:05:00
#SBATCH --mail-user=a.person@sheffield.ac.uk
#SBATCH --mail-type=ALL

module load STAR/2.7.6a-GCC-9.3.0

STAR --runThreadN $SLURM_NTASKS --runMode genomeGenerate --genomeSAindexNbases 11 --genomeDir ./STAR --genomeFastaFiles GRCh38.primary_assembly.genome.chr19.fa \
--sjdbGTFfile gencode.v29.primary_assembly.annotation.chr19.gtf

The job is submitted to the queue by typing:

$ sbatch my_job.sh

The output file will be written in the subdirectory STAR.


Installation notes

Installation method

This section is primarily for administrators of the system. STAR has been installed using the default Easybuild config files.

Build logs and test reports can be found in $EBROOTSTAR/easybuild with a given module loaded.


Testing

Testing has been conducted by running the genome indices generation job as detailed in the batch job above.

The output file should resemble:

$ cat output_STAR_4.3711295.out
Dec 12 13:59:01 ..... started STAR run
Dec 12 13:59:01 ... starting to generate Genome files
Dec 12 13:59:02 ..... processing annotations GTF
Dec 12 13:59:02 ... starting to sort Suffix Array. This may take a long time...
Dec 12 13:59:03 ... sorting Suffix Array chunks and saving them to disk...
Dec 12 13:59:55 ... loading chunks from disk, packing SA...
Dec 12 13:59:59 ... finished generating suffix array
Dec 12 13:59:59 ... generating Suffix Array index
Dec 12 14:00:02 ... completed Suffix Array index
Dec 12 14:00:02 ..... inserting junctions into the genome indices
Dec 12 14:00:19 ... writing Genome to disk ...
Dec 12 14:00:20 ... writing Suffix Array to disk ...
Dec 12 14:00:20 ... writing SAindex to disk
Dec 12 14:00:20 ..... finished successfully