Alignment for Circular RNA Fusion Junction Reads
CIRCexplorer2 supports TopHat2/TopHat-Fusion and other aligners (STAR, segemehl, BWA and MapSplice). Although different aligners showed slight difference in circular RNA identification, TopHat2/TopHat-Fusion has a perfect match with Cufflinks. As a result, TopHat2/TopHat-Fusion is recommended in alignment step, especially for circular RNA characterization pipeline.
TopHat2/TopHat-Fusion
Because TopHat2 needs gene annotation file for better alignment, you could select one GTF file from hg19_ref.gtf
, hg19_kg.gtf
and hg19_ens.gtf
. In addition, TopHat2 needs genome index files for bowtie2, and TopHat-Fusion require indices for bowtie1, so you could index the genome sequence in advance or let CIRCexplorer2 align
to do it from scratch. (See Setup)
- From index files (
bowtie1_index
is the prefix for bowtie1 index files, andbowtie2_index
is the prefix for bowtie2 index files):
CIRCexplorer2 align -G hg19_kg.gtf -i bowtie1_index -j bowtie2_index RNA_seq.fastq > CIRCexplorer2_align.log
- Or from genome sequence:
CIRCexplorer2 align -G hg19_kg.gtf -g hg19.fa RNA_seq.fastq > CIRCexplorer2_align.log
Note
- Because Cufflinks is well compatible with TopHat2/TopHat-Fusion, it is recommended to use TopHat2/TopHat-Fusion alignment for characterization pipeline.
CIRCexplorer2 align
will create a directorycirc_out
by default, and the BED filefusion_junction.bed
under this directory is required for following analysis. You could also checktophat.log
andtophat_fusion.log
file for detailed logs of Tophat2 and TopHat-Fusion alignment.- See Align for detailed information about
CIRCexplorer2 align
.
To align manually
If you prefer other aligners or want to align paired-end reads, you could align sequencing reads manually. Commands for different aligners for detecting fusion junction reads are listed below, and you could modify them according to your different requirements.
For single-end reads
- TopHat2/TopHat-Fusion
tophat2 -a 6 --microexon-search -m 2 -p 10 -G knownGene.gtf -o tophat hg19_bowtie2_index RNA_seq.fastq
bamToFastq -i tophat/unmapped.bam -fq tophat/unmapped.fastq
tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search hg19_bowtie1_index tophat/unmapped.fastq
- STAR (See STAR manual for more information)
STAR --chimSegmentMin 10 --runThreadN 10 --genomeDir hg19_STAR_index --readFilesIn RNA_seq.fastq
- MapSplice (See MapSplice for more information)
mapsplice.py -p 10 -k 1 --non-canonical --fusion-non-canonical --min-fusion-distance 200 -c hg19_dir -x bowtie1_index --gene-gtf hg19_kg.gtf -1 RNA_seq.fastq
- BWA (See BWA for more information)
bwa mem -T 19 -t 10 hg19_bwa_index RNA_seq.fastq > RNA_seq_bwa.sam
- segemehl (See segemehl manual for more information)
segemehl.x -q RNA_seq.fastq -d hg19.fa -i hg19_segemehl.idx -S -M 1 -t 10 -o RNA_seq.sam
testrealign.x -d hg19.fa -q RNA_seq.sam -n
Note: You could align unmapped reads from TopHat2 alignment (circ_out/tophat/unmapped.fastq
) via other aligners rather than TopHat-Fusion.
For paired-end reads
- TopHat-Fusion
tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search hg19_bowtie1_index RNA_seq.fastq
Note:
- For paired-end data analysis, only TopHat-Fusion aligning results are supported by now.
- For alignment of paired-end data, you should choose appropriate library-type which is not included in the command above.