STAR
STAR (Spliced Transcripts Alignment to a Reference) is a software for RNA sequence alignment. STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays. The latest STAR manual can be found at: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf which will detail the many available command arguments.
A limited collection of STAR genomes is available from http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/
Interactive usage
After connecting to Bessemer (see Establishing a SSH connection), start an interactive session with the
srun --pty bash -i
command.
The latest version of STAR (currently version 2.7.6a) is made available with the command:
$ module load STAR/2.7.6a-GCC-9.3.0
After this any of the STAR commands can be run from the terminal prompt. The available commands can be obtained using:
$ STAR --help
Batch usage
The following is an example batch submission script, my_job.sh
, to run the executable STAR
with input
files from https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/GRCh38_Ensembl99_sparseD3_sjdbOverhang99/.
The script requests 4 cores using the OpenMP parallel environment smp
with a runtime of 30 minutes and 6 GB of real memory per core to
generate a genome index.
#!/bin/bash
#SBATCH --job-name=STAR_smp_test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=24000
#SBATCH --output=output_STAR_smp_4
#SBATCH --time=00:30:00
#SBATCH --mail-user=a.person@sheffield.ac.uk
#SBATCH --mail-type=ALL
module load STAR/2.7.6a-GCC-9.3.0
STAR --runThreadN $SLURM_NTASKS --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir ./ \
--genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbOverhang 99 \
--sjdbGTFfile Homo_sapiens.GRCh38.99.gtf --limitGenomeGenerateRAM 15000000000 --genomeSAsparseD 3 \
--limitIObufferSize 50000000 --limitSjdbInsertNsj 383200
The job is submitted to the queue by typing:
$ sbatch my_job.sh
Installation notes
Installation method
STAR was installed using Easybuild 4.4.0, build details can be found
in /usr/local/packages/live/eb/STAR/2.7.6a-GCC-9.3.0/easybuild/
Testing
Testing has been conducted by running the genome indices generation job as detailed in the batch job above.
The output logs should resemble: https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/GRCh38_Ensembl99_sparseD3_sjdbOverhang99/log
Modulefiles
The module file is on the system at
/usr/local/modulefiles/live/eb/all/STAR/2.7.6a-GCC-9.3.0
.