Installation and Setup
This part will guide you to install CIRCexplorer2 and setup all the required stuff step by step.
Softwares and Packages
- TopHat & TopHat-Fusion
- Cufflinks (>=2.1.1)
- UCSC Utilities
- bedGraphToBigWig (optional)
- bedToBigBed (optional)
- STAR or MapSplice or BWA or segemehl (optional)
- Package (python 2.7 +)
Install latest release using pip
CIRCexplorer2 is on PyPI, so you can install via pip like most Python packages. Depending on your Python installation, this may require admin rights:
pip install circexplorer2
Install latest release via conda
CIRCexplorer2 has installation recipe on Bioconda, so you can install via conda from Bioconda channel:
conda install circexplorer2 --channel bioconda
Install latest release from source codes
Firstly, you should successfully install softwares required for fusion junction read alignment and de novo assembly, and add relevant pathes to your
Secondly, install required python packages and CIRCexplorer2:
git clone https://github.com/YangLab/CIRCexplorer2.git cd CIRCexplorer2 pip install -r requirements.txt ### install scipy according to http://www.scipy.org/install.html python setup.py install
CIRCexplorer2 requires the gene annotation file and the reference genome sequence file to annotate circular RNAs. The gene annotation file should be in the format of Gene Predictions and RefSeq Genes with Gene Names, and the reference genome sequence file contains all the genome sequences with respective chromosome ID. All the chromosome IDs in the gene annotation file must be included in the reference genome sequence file, otherwise the inconsistence between these two files may cause undetectable errors when running CIRCexplorer2.
Format of 'Gene Predictions and RefSeq Genes with Gene Names':
|geneName||Name of gene|
|isoformName||Name of isoform|
|strand||+ or - for strand|
|txStart||Transcription start position|
|txEnd||Transcription end position|
|cdsStart||Coding region start|
|cdsEnd||Coding region end|
|exonCount||Number of exons|
|exonStarts||Exon start positions|
|exonEnds||Exon end positions|
We could use
fetch_ucsc.py script to download all the essential gene annotation and reference genome sequence files for circular RNA identification.
fetch_ucsc.py is a small python script included in CIRCexplorer2 to help users to prepare relevant stuff for CIRCexplorer2. It could download and format the gene annotation file (RefSeq, KnownGenes or Ensembl) and the reference genome sequence file for two species (Human: hg19, hg38; Mouse: mm9, mm10). All these files will be fetched from the latest release of UCSC.
Command line of
fetch_ucsc.py hg19/hg38/mm9/mm10 ref/kg/ens/fa out
1 Download human RefSeq gene annotation file
fetch_ucsc.py hg19 ref hg19_ref.txt
2 Download human KnownGenes gene annotation file
fetch_ucsc.py hg19 kg hg19_kg.txt
3 Download human Ensembl gene annotation file
fetch_ucsc.py hg19 ens hg19_ens.txt
4 Download human reference genome sequence file
fetch_ucsc.py hg19 fa hg19.fa
5 Convert gene annotation file to GTF format (require genePredToGtf)
cut -f2-11 hg19_ref.txt|genePredToGtf file stdin hg19_ref.gtf # or cut -f2-11 hg19_kg.txt|genePredToGtf file stdin hg19_kg.gtf # or cut -f2-11 hg19_ens.txt|genePredToGtf file stdin hg19_ens.gtf
1 hg38 only has RefSeq and KnownGenes (GENCODE) gene annotations, and does not support Ensembl gene annotations.
2 You could select one gene annotation file among
hg19_ens.txt at your choice. In addition, you could concatenate all these gene annotation file as a single file for CIRCexplorer2.
cat hg19_ref.txt hg19_kg.txt hg19_ens.txt > hg19_ref_all.txt
3 CIRCexploer2 TopHat2/TopHat-Fusion pipeline requires Bowtie and Bowtie2 index files for reference genome. You could use
bowtie2-build to index relevant genome. Or you could use
CIRCexplorer2 align to automatically index the genome file (See Alignment).
# index genome for Bowtie bowtie-build hg19.fa bowtie1_index # index genome for Bowtie2 bowtie2-build hg19.fa bowtie2_index
4 If you analyze circular RNAs in mouse, you should download mouse relevant files (use mm10 for example).
# mouse RefSeq gene annotation file fetch_ucsc.py mm10 ref mm10_ref.txt # mouse KnownGenes gene annotation file fetch_ucsc.py mm10 kg mm10_kg.txt # mouse Ensembl gene annotation file fetch_ucsc.py mm10 ens mm10_ens.txt # mouse reference genome sequence file fetch_ucsc.py mm10 fa mm10.fa