STAR

STAR (Spliced Transcripts Alignment to a Reference) is a software for RNA sequence alignment. STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.

The latest STAR manual can be found at: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf which will detail the many available command arguments.

Interactive usage

After connecting to Stanage (see Establishing a SSH connection), start an interactive session with the following command:

srun --pty bash -i

STAR can be loaded with the command:

module load STAR/2.7.10b-GCC-11.3.0

After this any of the STAR commands can be run from the terminal prompt. The available commands can be obtained using:

STAR --help

Batch usage

Download and unpack the input files from https://www.gencodegenes.org/human/. We will use the comprehensive gene annotation (regions: PRI), for the purpose for this example we use only the data/annotations for chromosome 19 to map to.

#!/bin/bash
# download the "Genome sequence, primary assembly (GRCh38)" fasta file
wget https://ndownloader.figshare.com/files/14669702 && \
mv 14669702 GRCh38.primary_assembly.genome.chr19.fa.gz && \
gunzip GRCh38.primary_assembly.genome.chr19.fa.gz

# download the annotations that correspond to it
wget https://ndownloader.figshare.com/files/14658830 && \
mv 14658830 gencode.v29.primary_assembly.annotation.chr19.gtf.gz && \
gunzip gencode.v29.primary_assembly.annotation.chr19.gtf.gz

The following is an example batch submission script, my_job.sh, to run the executable STAR. The script requests 4 cores using the OpenMP library and multi-threading with a runtime of 5 minutes and 1 GB of memory to generate a genome index using the above data/annotations.

#!/bin/bash
#SBATCH --job-name=STAR_test
#SBATCH --cpus-per-task=4
#SBATCH --mem=1000
#SBATCH --output=output_STAR_4.%j.out
#SBATCH --time=00:05:00
#SBATCH --mail-user=a.person@sheffield.ac.uk
#SBATCH --mail-type=ALL

module load STAR/2.7.10b-GCC-11.3.0

STAR --runThreadN $SLURM_CPUS_PER_TASK --runMode genomeGenerate --genomeSAindexNbases 11 --genomeDir ./STAR --genomeFastaFiles GRCh38.primary_assembly.genome.chr19.fa \
--sjdbGTFfile gencode.v29.primary_assembly.annotation.chr19.gtf

The job is submitted to the queue by typing:

$ sbatch my_job.sh

The output file will be written in the subdirectory STAR.

Installation notes

Installation method

This section is primarily for administrators of the system. STAR has been installed using the default Easybuild config files.

Build logs and test reports can be found in $EBROOTSTAR/easybuild with a given module loaded.

Testing

Testing has been conducted by running the genome indices generation job as detailed in the batch job above.

The output file should resemble:

$ cat output_STAR_4.1239773.out
STAR --runThreadN 4 --runMode genomeGenerate --genomeSAindexNbases 11 --genomeDir ./STAR --genomeFastaFiles GRCh38.primary_assembly.genome.chr19.fa --sjdbGTFfile gencode.v29.primary_assembly.annotation.chr19.gtf
STAR version: 2.7.10b   compiled: 2023-10-10T17:29:00+0100 node128:/dev/shm/STAR/2.7.10b/GCC-11.3.0/STAR-2.7.10b/source
Jan 23 16:13:23 ..... started STAR run
Jan 23 16:13:23 ... starting to generate Genome files
Jan 23 16:13:23 ..... processing annotations GTF
Jan 23 16:13:25 ... starting to sort Suffix Array. This may take a long time...
Jan 23 16:13:25 ... sorting Suffix Array chunks and saving them to disk...
Jan 23 16:13:42 ... loading chunks from disk, packing SA...
Jan 23 16:13:43 ... finished generating suffix array
Jan 23 16:13:43 ... generating Suffix Array index
Jan 23 16:13:45 ... completed Suffix Array index
Jan 23 16:13:45 ..... inserting junctions into the genome indices
Jan 23 16:13:52 ... writing Genome to disk ...
Jan 23 16:13:52 ... writing Suffix Array to disk ...
Jan 23 16:13:53 ... writing SAindex to disk
Jan 23 16:13:53 ..... finished successfully