denovo

CIRCexplorer2 denovo parses circular RNA de novo assembly results to identify novel circRNAs and characterize various of alternative splicing events.

Usage and option summary

Usage:

CIRCexplorer2 denovo [options] -r REF -g GENOME <circ_dir>

Options:

-h --help                      Show help message.
--version                      Show version.
-r REF --ref=REF               Gene annotation.
--as                           Detect alternative splicing.
--as-type=AS_TYPE              Only check certain type (CE/RI/ASS) of AS events.
-a PLUS_OUT --pAplus=PLUS_OUT  TopHat mapping directory for p(A)+ RNA-seq.
-g GENOME --genome=GENOME      Genome FASTA file.
--tophat-dir=TOPHAT_DIR        TopHat mapping directory for p(A)- RNA-seq.
--no-fix                       No-fix mode (useful for species with poor gene annotations).
--rpkm                         Calculate RPKM for cassette exons.

Notes about options

  1. If there is no cufflinks directory under <circ_dir> (no de novo assembly step for circular RNAs, see Assemble), CIRCexplorer2 denovo will only use existing gene annotations to parse alternative splicing with setting --as option. This way is not recommended, so please run CIRCexplorer2 assemble before CIRCexplorer2 denovo.
  2. If you set --as option, it will characterize the alternative splicing of circular RNAs, including 'cassette exons', 'retained introns', 'A5SS' and 'A3SS'. In this mode, you should also offer the path of TopHat mapping directory for p(A)+ RNA-seq via -a option. By default, after setting --as option, it will check all types of alternative splicing events. You could also indicate one type of alternative splicing events through the --as-type option ('CE': 'cassette exons', 'RI': 'retained introns', 'ASS': 'A5SS' and 'A3SS').
  3. If TopHat mapping directory for p(A)- RNA-seq is not under <circ_dir>, you could set it via --tophat-dir option.
  4. If you set --no-fix options, realignment step of fusion junction reads will be skipped. It is useful for species with poor gene annotations, but the accuracy of circular RNA prediction would decrease.
  5. If --rpkm option is set, RPKM of cassette exons would be calculated.

Input

The input files are similar to those in CIRCexplorer2 annotate command. See Annotate for more details.

Output

CIRCexplorer2 denovo will create one denovo folder under the <circ_dir> folder.

denovo
├── combined_ref.txt
├── annotated_fusion.txt
├── circ_fusion.txt
├── annotated_circ.txt
├── novel_circ.txt
├── all_exon_info.txt
├── all_intron_info.txt
├── all_A5SS_info.txt
└── all_A3SS_info.txt
  • combined_ref.txt: combined gene annotation file (de novo assembled gene annotations and existing gene annotations).
  • annotated_fusion.txt: Annotated fusion junction information file. (same as relevant file in Annotate))
  • circ_fusion.txt: Circular RNA annotation file. (same as relevant file in Annotate))
  • annotated_circ.txt: CircRNAs with annotated back-splice sites.
  • novel_circ.txt: CircRNAs with one or two novel back-splice site(s).
  • all_exon_info.txt: Cassette exon information file.
  • all_intron_info.txt: Retained intron information file.
  • all_A5SS_info.txt: Alternative 5' splice site information file.
  • all_A3SS_info.txt: Alternative 3' splice site information file.

Format of annotated_circ.txt:

Field Description
chrom Chromosome
start Start of circular RNA
end End of circular RNA
name Circular RNA/Junction reads
score Flag of fusion junction realignment
strand + or - for strand
geneName Name of gene

Format of novel_circ.txt:

Field Description
chrom Chromosome
start Start of circular RNA
end End of circular RNA
name Circular RNA/Junction reads
score Flag of fusion junction realignment
strand + or - for strand
leftLabel Label of circRNA start
rightLabel Label of circRNA end

Format of all_exon_info.txt:

Field Description
chrom Chromosome
start Start of cassette exon
end End of cassette exon
name Exon label
score No meaning
strand + or - for strand
geneName Name of gene
isoformName Name of isoform
read CircRNA fusion junction reads
psiCirc PSI in circular RNAs
psiLinear PSI in linear RNAs
pValue1 P value (circular RNAs vs linear RNAs)
pValue2 P value (linear RNAs vs circular RNAs)
inCirc Inclusion reads in circular RNAs
exCirc Exclusion reads in circular RNAs
inLinear Inclusion reads in linear RNAs
exLinear Exclusion reads in linear RNAs
rpkmCirc RPKM of cassette exon in circular RNAs
rpkmLinear RPKM of cassette exon in linear RNAs

Format of all_intron_info.txt:

Field Description
chrom Chromosome
start Start of retained intron
end End of retained intron
name Intron label
score No meaning
strand + or - for strand
geneName Name of gene
isoformName Name of isoform
read CircRNA fusion junction reads
pirCirc PIR in circular RNAs
pirLinear PIR in linear RNAs
pValue1 P value (circular RNAs vs linear RNAs)
pValue2 P value (linear RNAs vs circular RNAs)
riCirc Retained intron reads in circular RNAs
juncCirc Junction reads in circular RNAs
intronCirc Intron reads in circular RNAs
riLinear Retained intron reads in linear RNAs
juncLinear Junction reads in linear RNAs
intronLinear Intron reads in linear RNAs

Format of all_A5SS_info.txt:

Field Description
chrom Chromosome
start Start of circular RNA
end End of circular RNA
strand + or - for strand
readCirc 5' splice reads in circular RNAs
totalCirc Total splice reads in circular RNAs
psuCirc PSU in circular RNAs
readLinear 5' splice reads in linear RNAs
totalLinear Total splice reads in linear RNAs
psuLinear PSU in linear RNAs

Format of all_A3SS_info.txt:

Field Description
chrom Chromosome
start Start of circular RNA
end End of circular RNA
strand + or - for strand
readCirc 3' splice reads in circular RNAs
totalCirc Total splice reads in circular RNAs
psuCirc PSU in circular RNAs
readLinear 3' splice reads in linear RNAs
totalLinear Total splice reads in linear RNAs
psuLinear PSU in linear RNAs