Attention

Advance Notice: Bessemer will be retired at the end of the day on Friday 31st October 2025.

STAR

STAR (Spliced Transcripts Alignment to a Reference) is a software for RNA sequence alignment. STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.

The latest STAR manual can be found at: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf which will detail the many available command arguments.

Interactive usage

After connecting to Bessemer (see Establishing a SSH connection), start an interactive session with the srun --pty bash -i command.

STAR can be loaded with the command:

module load STAR/2.7.6a-GCC-9.3.0

After this any of the STAR commands can be run from the terminal prompt. The available commands can be obtained using:

STAR --help

Batch usage

Download and unpack the input files from https://www.gencodegenes.org/human/. We will use the comprehensive gene annotation (regions: PRI), for the purpose for this example we use only the data/annotations for chromosome 19 to map to.

#!/bin/bash
# download the "Genome sequence, primary assembly (GRCh38)" fasta file
wget https://ndownloader.figshare.com/files/14669702 && \
mv 14669702 GRCh38.primary_assembly.genome.chr19.fa.gz && \
gunzip GRCh38.primary_assembly.genome.chr19.fa.gz

# download the annotations that correspond to it
wget https://ndownloader.figshare.com/files/14658830 && \
mv 14658830 gencode.v29.primary_assembly.annotation.chr19.gtf.gz && \
gunzip gencode.v29.primary_assembly.annotation.chr19.gtf.gz

The following is an example batch submission script, my_job.sh, to run the executable STAR. The script requests 4 cores using the OpenMP library and multi-threading with a runtime of 5 minutes and 2 GB of real memory per core to generate a genome index using the above data/annotations.

#!/bin/bash
#SBATCH --job-name=STAR_test
#SBATCH --cpus-per-task=4
#SBATCH --mem=2000
#SBATCH --output=output_STAR_4.%j.out
#SBATCH --time=00:05:00
#SBATCH --mail-user=a.person@sheffield.ac.uk
#SBATCH --mail-type=ALL

module load STAR/2.7.6a-GCC-9.3.0

STAR --runThreadN $SLURM_NTASKS --runMode genomeGenerate --genomeSAindexNbases 11 --genomeDir ./STAR --genomeFastaFiles GRCh38.primary_assembly.genome.chr19.fa \
--sjdbGTFfile gencode.v29.primary_assembly.annotation.chr19.gtf

The job is submitted to the queue by typing:

$ sbatch my_job.sh

The output file will be written in the subdirectory STAR.

Installation notes

Installation method

This section is primarily for administrators of the system. STAR has been installed using the default Easybuild config files.

Build logs and test reports can be found in $EBROOTSTAR/easybuild with a given module loaded.

Testing

Testing has been conducted by running the genome indices generation job as detailed in the batch job above.

The output file should resemble:

$ cat output_STAR_4.3711295.out
Dec 12 13:59:01 ..... started STAR run
Dec 12 13:59:01 ... starting to generate Genome files
Dec 12 13:59:02 ..... processing annotations GTF
Dec 12 13:59:02 ... starting to sort Suffix Array. This may take a long time...
Dec 12 13:59:03 ... sorting Suffix Array chunks and saving them to disk...
Dec 12 13:59:55 ... loading chunks from disk, packing SA...
Dec 12 13:59:59 ... finished generating suffix array
Dec 12 13:59:59 ... generating Suffix Array index
Dec 12 14:00:02 ... completed Suffix Array index
Dec 12 14:00:02 ..... inserting junctions into the genome indices
Dec 12 14:00:19 ... writing Genome to disk ...
Dec 12 14:00:20 ... writing Suffix Array to disk ...
Dec 12 14:00:20 ... writing SAindex to disk
Dec 12 14:00:20 ..... finished successfully