Method

Data collection and processing

The human transcriptome or microarray data were retrieved from the GEO resource by searching keywords of "[(cancer OR carcinoma) AND Homo sapiens]". All datasets were manually curated based on these criteria: 1) the transcriptome data were generated using non-polyA enriched library construction method, while the microarray data were generated using circRNA-related microarray platforms; 2) for solid tumors, only samples with corresponding paracancerous normal tissue were included. After manual curation based on these criteria, 81 transcriptome and 53 microarray datasets were selected and downloaded. Besides, we also included 10 datasets that generated in our previous studies. Reads were trimmed of adapter sequences and low-quality bases with Trimmomatic (v0.36).


CircRNA identification and transcriptome analysis

We applied the aligner STAR (v2.5.3a) to align clean reads to the GRCh37.p13 reference genome (GRCh37.p13) with following settings: chimSegmentMin 10. Then, CIRCexplorer2 (v2.3.8) was used to identify circRNA with the guidance of reference annotion (GENCODE v19). The expression of circRNAs were normalized as the number of back-spliced RPMs (reads per million mapped reads).

The clean reads were mapped to the human genome (GRCh37.p13) using HISAT2 (v2.1.0). We employed StringTie (v2.0.4) to estimate gene abundance. Gene expression levels measured as fragments per kilobase per million (FPKM).


Differential expression analysis

We applied Wilcoxon rank sum test to compute the p-value between tumor and normal samples. The fold-change was calculated using the average expression of circRNA/gene in tumor and normal samples. Genes and circRNAs with fold-change greater than 2 and p-value less than 0.05 were determined to be significantly differentially expressed.


Prediction of miRNA binding sites on circRNA

The putative spliced sequences of circRNAs are extracted from reference genome according to the genomic coordinate of circRNAs. The mature sequences of miRNAs (miRBase 22 release) are downloaded from miRBase database (http://www.mirbase.org/). The miRNA binding sites on circRNA are predicted by miRanda (v3.3a) with following parameters: miRanda miRNA.fa circRNA.fa -go -8 -ge -2 -sc 120.


Analysis of RNA binding proteins (RBPs) binding sites on circRNA

1053 CLIP-Seq datasets of 291 RBPs were collected from GEO(https://www.ncbi.nlm.nih.gov/geo/), ENCODE(https://www.encodeproject.org) and starBase(http://starbase.sysu.edu.cn/index.php) databases.

The RBP binding sites on circRNAs were determined using the RBP binding sites that identified by CLIP-Seq datasets. Briefly, these identified RBP binding sites were firstly clustered and merged based on genomic coordinate to obtain the overlapped part of stacked binding sites among datasets. The resultant stacked binding sites were mapped to circRNAs to obtain the relative position of RBPs on circRNA by bedtools intersect tool (v2.28.0) using following parameters:
bedtools intersect -a RBP_binding_sites.bed -b circRNA_coordinate.bed -wb -s

We collected RBPs binding motif from beRBP (http://bioinfo.vanderbilt.edu/beRBP/predict.html) and RBPmap (http://rbpmap.technion.ac.il/) databases. Then FIMO were utilized to predict RBP binding sites on circRNAs based on collected RBP motif with following parameters:
fimo --thresh 1e-4 --oc rbp rbp.motif circRNA.fa

Prediction of coding potential of circRNA

We applied an in-house script to predict all putative open reading frames (ORFs) within circRNA and kept the ORFs with a minimum length of 20 amino acids (aa). Considering that circRNA translation is driven by internal ribosome entry site (IRES) element or m6A modification, we firstly retrieved experimentally verified IRES sequences from the IRESbase database (http://reprod.njmu.edu.cn/cgi-bin/iresbase/) and m6A modification sites detected by m6A-seq and related techniques from the m6A-Atlas (http://180.208.58.19/m6A-Atlas/). Then, circRNA sequences were aligned to IRES sequences to identify putative IRES element within circRNA using blastn (v2.11.0) with at least 90% sequence identity and a cutoff 30 nucleotides alignment length. Besides, the relative position of m6A modification sites on circRNA were determined by mapping circRNA to the collected m6A sites using bedtools intersect tool. We kept the circRNAs with putative IRES element or m6A modification sites as possible translatable circRNAs. We then annotated these potential translatable circRNAs using publicly available translatome datasets to provide direct in vivo translation evidence for these circRNAs.


Analysis of epigenetic modification sites on circRNA

The epigenetic modification sites are retrieved from m6A-Atlas database (http://180.208.58.66/m6A-Atlas). The relative positions of m6A modification sites on circRNA are determined by bedtools intersect tool (v2.28.0).


Analysis of genome mutation sites on circRNA

The genome mutation sites were downloaded from COSMIC (https://cancer.sanger.ac.uk/cosmic) database. The relative positions of genome mutation sites on circRNA were determined by bedtools intersect tool (v2.28.0).


Prediction of secondary structure of circRNA

The putative spliced sequences of circRNAs are submitted to RNAfold (v2.4.14) (https://www.tbi.univie.ac.at/RNA/index.html) software to predict the minimum free energy (MFE) structure using following parameters: RNAfold -p -d2 --noLP --circ.