Welcome to circPlot, an interactive web server for circRNA exploration, analysis and visualization

Circular RNAs (circRNAs) are a type of single-stranded covalently closed endogenous RNA molecules that generated by back-splicing. CircRNAs have attracted great attention in recent years and have been reported to regulate various physiological and pathological processes via sponging miRNAs, acting as RNA binding protein (RBP) decoys, and encoding functional peptides. However, there is still lacks a graphical user interface-based web tool for circRNA interactive analysis and visualization.

circPlot provides an user-friendly interface and a suite of analytic and plotting functions for rapid and intuitive exploration, analysis and visualization of circRNA:

1) comprehensive annotations of circRNA such as basic information, genomic position, circularization diagram, somatic mutation, secondary structure, epigenetic modification and divergent primer;

2) expression landscape of circRNA across diverse caner types, non-tumor tissues, cancer cell lines and non-tumor cells;

3) putative mechanisms of circRNA including miRNA sponge, RBP decoy and translating into peptide;

4) expression correlation between circRNA and gene or miRNA in any tumor and non-tumor tissues, cancer cell lines and non-tumor cells.

Overall, circPlot is an user-friendly and powerful web application for exploring, analyzing and visualizing circRNAs without any programming skills.

ID converter

Select a option to convert circRNA ID

circRNA ID

circRNA coordinate - hg19/GRCh37

circRNA coordinate - hg38/GRCh38

circRNA ID

Notes: Users can enter the circRNA ID of 14 circRNA databases in the box

Examples: hsa_circ_0001946 hsa-LIFR_0005 hsa_circLIFR_002 Reset: reset

circRNA coordinate

Notes: The coordinate system must be 0-based, and the coordinate must be in the following
format: chromosome:start-end|strand

Example: chr5:38523520-38530768|- Reset: reset

circRNA coordinate

Notes: The coordinate system must be 0-based, and the coordinate must be in the following
format: chromosome:start-end|strand

Example: chr5:38523418-38530666|- Reset: reset

ID convert result

Notes: Click the row to display the datails of circRNA ID converter result

Details

Location (0-based):

IDs in database:

IDs in microarray platform:

circPlot: a web server for circRNA exploration, analysis and visualization

The circPlot is an user-friendly interactive web server for circRNA exploration, analysis and visualization. With meticulous attention to detail, we manually collected

It has integrated the sequencing data of 3180 samples covering 26 types of cancer, 107 types of normal tissue, 111 types of cancer cell line and 254 types of normal cell from our previous studies and public datasets. Besides, other publicly available datasets such as RBP CLIP-seq, somatic mutation sites, epigenetic modification sites are also included.

The circPlot web application contains 5 modules, namely “Home”, “Exploration”, “Tool”, “Tutorial” and “Contact” (Figure 1). 1) the “Home” module provides a brief introduction to the circPlot; 2) the “Exploration” module is dedicated to explore, analyze and visualize circRNA; 3) the “Tool” module provides a suite of analysitc tools for user to convert circRNA ID, perform circRNA differential expression analysis, design circRNA divergent primer, and query gene and miRNA expression profiles in tumors, non-tumor tissues, cancer cell lines, non-tumor cells, TCGA and CCLE projects; 4) the “Tutorial” module provides detailed instructions on how to easily use the circPlot; 5) the “Contact” module is designed for the feedback of website questions and comments to the developer.

Figure 1. The graphical interface of the circPlot web application

A suite of plotting and analysis functions is provided to facilitate the interactive analysis and visualization of pre-analyzed or user provided circRNA (Figure 3).

general information module
- detailed annotations of circRNA and its host gene, including genomic coordinate, aliases, sequence and description
- analysis and visualization of circRNA back-splicing schematic diagram, genomic position and sequence conservation, base modification and somatic mutation sites, secondary structure, and divergent primer
expression profile module
- visualization of expression landscape of circRNA in cancers, normal tissues, cancer cell lines and normal cells
putative mechanism module
- analysis and visualization of circRNA-miRNA, circRNA-RBP interactions, and circRNA coding potential

Figure 3. Schematic overview of the circPlot web application

Herein we provide the detailed instructions on how to explore, analyze and visualize circRNA in the circPlot web application.

Analyze a circRNA

In “Exploration” module of the circPlot web server, users can analyze a circRNA of interest using circRNA ID, coordinate or sequence. Alternatively, user can retrieve the previous analysis results using the Job ID.

Analyze a circRNA by circRNA ID

Users should choose the “circRNA ID” search option and then enter the circRNA ID in the box to analyze a circRNA of interest (Figure 1). Currently, the circRNA ID in 14 circRNA databases (circAtlas3, circBank, circBase, CircNet2, CIRCpedia2, CircRic, circRNADb, CSCD2, DeepBase3, exoRBase2, MiOncoCirc, riboCIRC, TransCirc, TSCD) are supported in the circPlot.

Figure 1. Users can analyze a circRNA of interest by circRNA ID

Analyze a circRNA by circRNA coordinate

Users should choose the “circRNA coordinate” option and then input the circRNA coordinate of hg19 or hg38 assembly in the box via selecting the “hg19” or “hg38” option to analyze a circRNA of interest (Figure 2). It is worth noting that the circRNA name and coordinate must be in proper format.

Figure 2. Users can analyze a circRNA of interest by circRNA coordinate

Analyze a circRNA of interest by circRNA sequence

Users should choose the “circRNA sequence” option and then select either the “Input” or “Upload” option to enter or upload the circRNA sequence (Figure 3). It is worth noting that the circRNA name and sequence must be in proper format.

Figure 3. Users can analyze a circRNA of interest by entering or uploading circRNA sequence

Retrieve previous results by job ID

Users can retrieve previous analysis results by simply selecting the “Job ID” option and typing job ID in the box (Figure 4).

Figure 4. Users can retrieve previous analysis results using job ID

Overview of analysis result

When users have entered the input in the corresponding box, they can simply click the “Submit!” button to trigger the analysis and the results will display upon the server has finished the analysis. As shown in Figure 5, the circRNA candidates that matched user input are displayed in an interactive table. To explore and visualize the detailed analysis results of a specific circRNA, users can simply click the row to select circRNA of interest. This intuitive interface streamlines the exploration of the analysis results and enhances the user experience.

Figure 5. Overview of analysis results of selected circRNA

As the circRNA is selected, the analysis and visualization results are presented in 3 sections: “General information”, “Expression profile”, “Putative mechanism” and “Expression correlation” (Figure 5). Users can click the tab to view the detailed results.

General information of circRNA
- Comprehensive basic information of circRNA and its host gene
- Detailed information of circRNA back-splicing, genomic position, somatic mutation, epigenetic modification, secondary structure and divergent primer set
Expression profile of circRNA in tumors, non-tumor tissues, cancer cell lines and non-tumor cells
Putative mechanism of circRNA
- Detailed information of miRNA and RBP binding sites on circRNA
- Detailed information of circRNA-encoded peptide
Expression correlation between circRNA and gene and miRNA

The circPlot provides the enhanced interactivity between users and user interface, which is summary blow:

1. Users can select to display the analysis and visualization results of interest through simply clicking the row in the result table

2. The tooltip or detailed information emerged when the mouse hover over the input box or elements within the plot

3. Users can utilize the modebar and figure legend to manipulate the plot, such as show or hide elements, zoom in or zoom out the plot

4. Users can customize the plot by clicking the “Visualization parameters” dropdown menu and then setting the visualization parameters. As any parameter changed, the plot will update immediately

5. The plot is available for download in various formats by clicking the “Download parameters” dropdown menu in corresponding section. Besides, all plotting data and analysis results are publicly accessible and can be downloaded

6. Users can click the “Select dataset” dropdown menu to select any sample of interest and then visualize circRNA expression profile and expression correlation results in selected sample

7. Users can prepare their own data and upload to the circPlot for customized visualization by selecting “Visualize your data?” option

General information of circRNA

The “General information” section provides basic information of circRNA, and has 8 tabs: “circRNA overview”, “Host gene overview”, “Genomic position”, “Circularization diagram”, “Base modification”, “Secondary structure”, “Somatic mutation” and “Divergent primer”.

circRNA overview

This tab provides the basic information of circRNA, including host gene, linear transcript, genomic coordinate, spliced sequence, aliases of circRNA in 14 circRNA databases and 17 microarray platforms, and the conserved mouse circRNA reported in other circRNA databases (Figure 6). Besides, users can click the hyperlink to navigate to original database for details.

Figure 6. Basic information of circRNA

Host gene overview

This tab provides the basic information of circRNA host gene such as gene description, gene type, gene summary, and aliases in other databases (Figure 7). Users can click the hyperlink to redirect to original database for details.

Figure 7. Basic information of circRNA host gene

Genomic position

This tab is designed to visualize the schematic diagram of circRNA genomic position (Figure 8), emphasizing all possible genes and transcripts that might produce this circRNA. The “Visualization parameters” dropdown menu enables users to customize the plot, such as setting the color for exon, intron and arrow individually. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data.

Figure 8. Schematic diagram of circRNA genomic position

Circularization diagram

This tab displays the schematic diagram of circRNA back-splicing and the detailed information about each block of circRNA (Figure 9). The plot shows schematic diagram of circRNA back-splicing, users can click the “Visualization parameters” dropdown menu to customize the plot, such as setting the relative position of circRNA divergent and convergent primer sets, allowing them to be shown on the schematic diagram. The “circRNA block color” dropdown menu is used to change the color of the circRNA block. The “Download parameters” dropdown menu allows users to customize the download settings to save the plot, and to download the plotting data. The details table provides the information for each circRNA block, including source transcript, genomic coordinate, block index and size.

Figure 9. Results of circRNA back-splicing diagram

Base modification

This tab shows the schematic diagram and the detailed information of the epigenetic modification sites on circRNA (Figure 10). The statistics table illustrates the type and number of epigenetic modification sites on circRNA. The plot demonstrates the schematic diagram of the epigenetic modification sites on circRNA, users can click the “Visualization parameters” dropdown menu to customize the plot, such as assigning a color to each type of base modification. Users can change the color of the circRNA block by using the “circRNA block color” dropdown menu. The “Download parameters” dropdown menu allows users to customize the download parameters to save the plot, and to download the plotting data. The details table shows the relative position, genomic coordinate, external reference for each modification site, user can click the hyperlink to redirect to original database for details.

Figure 10. Results of base modification sites on circRNA

Secondary structure

This tab displays the schematic diagram and the detailed information of the minimum free energy secondary structure of circRNA (Figure 11). The statistics table demonstrates minimum free energy and the number of paired base-pairs, hairpin, internal loop and multi-branched loop structures of circRNA secondary structure. The plot shows the schematic diagram of the minimum free energy secondary structure of circRNA, users can customize the plot via the “Visualization parameters” dropdown menu, such as controlling the color of the base-pairing probability colorbar. The “circRNA block color” dropdown menu enables users to set circRNA block color. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data and the circRNA minimum free energy structure sequence. The details table displays the position, sequence, base-pairing information and structure of each base in the circRNA.

Figure 11. Results of the minimum free energy secondary structure of circRNA

Somatic mutation

This tab illustrates the schematic diagram and the detailed information of the somatic mutation sites on circRNA (Figure 12). The statistics table illustrates the number of somatic mutation sites on circRNA. Users can click the “Visualization parameters” dropdown menu to customize the plot, such as defining the color for somatic mutation site. User can change circRNA block color via the “circRNA block color” dropdown menu. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The details table shows the genomic coordinate, relative position, sequence alteration, external reference for each mutation site, user can click the hyperlink to redirect to original database for details.

Figure 12. Results of somatic mutation sites on circRNA

Divergent primer

This tab is designed to display the schematic diagram and the detailed information of the designed circRNA divergent primer sets (Figure 13). The details of designed circRNA divergent primer sets including primer sequence, length, melting temperature, GC content and relative position are exhibited in an interactive table. Users should firstly click a row in the table to visualize the schematic diagram and amplification product of selected primer sets. The plot displays the schematic diagram of the circRNA divergent primer sets, providing an intuitive view of the circRNA junction site sequence and divergent primer sets. By clicking the “Visualization parameters” dropdown menu, users can customize the plot, such as assigning the color to the primer sets. The “circRNA block color” dropdown menu enables users to set the circRNA block color. The “Download parameters” dropdown menu allows users to customize download settings to save the plot, and to download the plotting data. The amplification product panel displays the amplification product of selected primer sets.

Figure 13. Results of circRNA divergent primer

Expression landscape of circRNA

The “Expression profile” section contains 4 tabs, namely “Tumor tissue”, “Non-tumor tissue”, “Cancer cell line” and “Non-tumor cell”. These tabs provide comprehensive circRNA expression profiles across diverse tumors, normal tissues, cancer cell lines, and normal cells.

Expression profile of circRNA in tumors

The “Tumor tissue” tab provides the expression landscape, differential expression results, and detailed expression data of the circRNA in both tumor and adjacent normal tissues across 33 cancer types (Figure 14), helping users quickly determine whether the circRNA exhibits a cancer-specific differential expression pattern or shares a common differential expression signature across different cancer types. As shown in Figure 14, the plot shows the expression profiles of circRNA in the selected tumors, depicting the expression levels in both tumor and paired normal tissues. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display circRNA expression profiles in the selected tumors. Users can customize the plot through the “Visualization parameters” dropdown menu, such as defining the color for tumor and normal tissue individually. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profiles. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The differential expression result table displays the differential expression results of circRNA across cancers, providing the average expression level, fold change value and p-value of the circRNA in each cancer type. The expression profile table displays the circRNA expression details for each sample.

Figure 14. The expression profile of circRNA in tumors

Expression profile of circRNA in non-tumor tissues

The “Non-tumor tissue” tab is designed to display circRNA expression profile across 21 types of normal tissue (Figure 15), helping users identify tissue-specific or ubiquitously expressed circRNAs. As shown in Figure 15, the plot shows the expression profile of circRNA across the selected tissues. The “Select dataset” dropdown menu enables users to choose any tissue type of interest to display circRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as setting the color for normal tissue. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The expression profile table displays the circRNA expression details for each sample.

Figure 15. The expression profile of circRNA in non-tumor tissues

Expression profile of circRNA in cancer cell lines

The “Cancer cell line” tab provides circRNA expression profile across cancer cell lines of 19 cancer types (Figure 16), helping users select the appropriate cancer cell line model for research. As shown in Figure 16, the plot shows the expression profile of circRNA across the selected cancer cell lines. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display circRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as controlling the color for cancer cell line. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The expression profile table displays the circRNA expression details for each sample.

Figure 16. The expression profile of circRNA in cancer cell lines

Expression profile of circRNA in non-tumor cells

The “Non-tumor cell” tab is designed to display circRNA expression profile across 25 types of normal cell (Figure 17), helping users identify cell-specific or ubiquitously expressed circRNAs. As shown in Figure 17, the plot shows the expression profile of circRNA across the selected cells. Users can click the “Select dataset” dropdown menu to select any cell type of interest to display circRNA expression profile in selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as defining the color for normal cell. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The expression profile table displays the circRNA expression details for each sample.

Figure 17. The expression profile of circRNA in non-tumor cells

Putative mechanism of circRNA

The “Putative mechanism” section contains 3 tabs: “circRNA-miRNA interaction”, “circRNA-RBP interaction” and “Coding potential”. This section provides comprehensive information on the putative mechanisms of circRNA.

miRNA binding sites on circRNA

The “circRNA-miRNA interaction” tab is designed to display the schematic diagram of miRNA binding sites on circRNA and sequence alignment between circRNA and miRNA (Figure 18). The statistics on the circRNA-miRNA interaction, such as the number of miRNA binding sites on circRNA, the free energy and score of sequence alignment between circRNA and miRNA, are shown in an interactive table. Users should firstly click the rows in the statistics table to select miRNAs of interest to display the schematic diagram and detailed information of circRNA-miRNA interaction. The left plot displays the schematic diagram of miRNA binding sites on circRNA, while the right plot shows the sequence alignment between circRNA and miRNA. Users can customize the plot using the “Visualization parameters” dropdown menu. The “circRNA and miRNA color” dropdown menu allows users to change the color of the circRNA block and each miRNA. The Download parameters dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data and the sequence alignment between circRNA and miRNA.

Figure 18. The schematic diagram and details of miRNA binding sites on circRNA

RBP binding sites on circRNA

The “circRNA-RBP interaction” tab shows the schematic diagram of RBP binding sites on circRNA (Figure 19). The statistics on the circRNA-RBP interaction, such as the number of RBP binding sites that identified by CLIP-seq or predicted by RBPmap on circRNA, are shown in an interactive table. Users should firstly click the rows in the statistics table to select RBPs of interest to display the schematic diagram and detailed information of circRNA-RBP interaction. The plot shows the schematic diagram of RBP binding sites on circRNA. Users can customize the plot using the “Visualization parameters” dropdown menu. The “circRNA and RBP color” dropdown menu allows users to change the color of the circRNA block and each RBP. The Download parameters dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The details table shows the RBP name, genomic position, relative position, identification method, external reference, binding sequence for each RBP binding site, user can click the hyperlink to redirect to original database for details.

Figure 19. The schematic diagram and details of RBP binding sites on circRNA

circRNA coding potential

The “Coding potential” tab is designed to display the schematic diagram and the detailed information of the circRNA-encoded peptide (Figure 20). The statistics on the circRNA-encoded peptide, such as ORF length, relative position and orientation, the number of translatome dataset and junction reads that support circRNA translation, IRES element and m⁶A modification site that drive circRNA translation, are display in an interactive table. Users should firstly click the row in the statistics table to select ORF of interest to display the schematic diagram and detailed information of circRNA-encoded peptide. The plot shows the schematic diagram of IRES element, m⁶A modification site and ORF in circRNA. Users can customize the plot using the “Visualization parameters” dropdown menu, such as setting the color for ORF, IRES element and m⁶A modification site. The “circRNA block color” dropdown menu allows users to change the color of the circRNA block. The Download parameters dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The property panel shows the basic properties of selected ORF, such as number of amino acids, molecular weight. The amino acid composition table displays the number of 20 common amino acids in ORF. The modification information table provides the details of predicted N-Glycosylation, mucin type O-Glycosylation, and S, T and Y phosphorylation modification sites in ORF.

Figure 20. The schematic diagram and details of circRNA coding potential

Expression correlation between circRNA and gene and miRNA

The “Expression correlation” section contains 2 tabs: “circRNA-gene expression correlation” and “circRNA-miRNA expression correlation”. This section provides the portal to analyze expression correlation between circRNA and gene and miRNA in any sample.

Expression correlation between circRNA and gene

The “circRNA-gene expression correlation” tab is designed to perform the expression correlation analysis between circRNA and gene, and to visualize the analysis results (Figure 21). The details of genes such as gene ID, type and aliases are shown in an interactive table, users can sort and search the table. Users should firstly click the row in gene list table to select gene of interest to conduct the expression correlation analysis between circRNA and the selected gene, and to display the analysis results. Upon the gene has selected, the gene overview tab shows the detailed information of selected gene, such as gene aliases and summary. The plot illustrates the expression correlation between circRNA and selected gene, offering an intuitive depiction of the expression levels of both circRNA and gene in any sample. Users can click the “Select dataset” dropdown menu to select any sample of interest. Upon the samples are selected, the expression correlation between circRNA and miRNA in selected samples will be reanalyzed, and the results will be updated accordingly in the interface. Users can customize the plot using the “Visualization parameters” dropdown menu, such as setting the color for each tissue type. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The expression profile table provides the details of circRNA and gene expression profiling in samples. The expression correlation result panel provides the results of expression correlation of three primary types of correlation: Pearson, Spearman or Kendall.

Figure 21. The results of expression correlation between circRNA and gene

Expression correlation between circRNA and miRNA

The “circRNA-miRNA expression correlation” tab is used to conduct the expression correlation analysis between circRNA and miRNA, and to show the analysis results (Figure 22). The details of miRNAs such as miRNA accession and aliases are shown in an interactive table that allows for efficient sorting and searching. Users should firstly click the row in miRNA list table to select miRNA of interest to perform the expression correlation analysis between circRNA and the selected miRNA, and to display the analysis results. Upon the miRNA has selected, the miRNA overview tab shows the detailed information of selected miRNA, such as miRNA sequence, precursor and predicted targets. The plot illustrates the expression correlation between circRNA and selected miRNA, providing an intuitive view of the expression levels of both circRNA and miRNA in any sample. The “Select dataset” dropdown menu enables users to select any sample of interest. Once users have chosen the samples, the web server immediately reanalyzes the expression correlation between circRNA and miRNA in selected samples, and the interface simultaneously updates the results. By clicking the the “Visualization parameters” dropdown menu, users can customize the plot using such as setting the color for each tissue type. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot, and to download the plotting data. The expression profile table provides the details of circRNA and miRNA expression profiling in samples. The expression correlation result panel provides the results of expression correlation of three primary types of correlation: Pearson, Spearman or Kendall.

Figure 22. The results of expression correlation between circRNA and miRNA

This tutorial will provide a detailed guide to the interface and results of the analytical tools within circPlot. Currently, the circPlot web server provides 5 analytic tools in the “Tool” module: “circRNA ID converter”, “circRNA differential expression analysis”, “circRNA divergent primer design”, “query gene expression profile” and “query miRNA expression profile”.

circRNA ID converter

The “circRNA ID converter” tool is designed to map the users' input to IDs across 14 circRNA databases and 17 microarray platforms. This tool allows users to input the circRNA to be converted through “circRNA ID”, “circRNA coordinate - hg19/GRCh37” or “circRNA coordinate - hg38/GRCh38” options. As users have entered the input, their should click the “Submit!” button to initiate the conversion. As shown in Figure 1, the conversion results are summarized in an interactive table, users can simply click the row of interest to view the details. The details panel provides the aliases of circRNA across 14 circRNA databases and 17 microarray platforms. Besides, users can click the hyperlink to navigate to original database for details.

Figure 1. The interface and results of circRNA ID converter tool

circRNA differential expression analysis

The “circRNA differential expression analysis” tool is used to perform circRNA differential expression analysis on the users' selected dataset and visualize the results. As shown in Figure 2, users should firstly choose a cancer type from the “Select a cancer type” dropdown menu. Upon the cancer type has selected, statistics for the datasets of the chosen cancer type, including the project ID and the number of normal and tumor tissue samples, are presented in an interactive table. Users should click the row in the table to select the dataset of interest and display the analysis results. The volcano plot visualizes the significance and fold change of circRNAs, helping you quickly identify significantly differentially expressed circRNAs. The “Visualization parameters” dropdown menu allows users to set the criteria for displaying differentially expressed circRNA, such as adjusting the log2(foldchange) and p-value to filter circRNAs that meet your criteria. Besides, users can also customize the plot using the “Visualization parameters” dropdown menu, such as defining the color for normal and tumor tissues. The heatmap demonstrates the normalized expression of differentially expressed circRNAs across samples, with colors indicating varying levels of expression. By clicking the “Visualization parameters” dropdown menu of heatmap, users can set parameters to customize the plot through choosing method for sample clustering, controlling the color for normal and tumor tissues, and defining the color for colorbar. The differential expression result table displays the differential expression results of circRNA, providing the average expression level, fold change value and p-value of the circRNA. Users can click the row in table to display the expression plot of the selected circRNA, depicting the expression levels in both normal and tumor samples.

Figure 2. The interface and results of circRNA differential expression analysis tool

circRNA divergent primer design

The “circRNA divergent primer design” tool is employed to design divergent primer for the circRNA sequence provided by users. As shown in Figure 3, users should firstly select either the “Input” or “Upload” option to enter or upload the circRNA sequence. After entering the circRNA name and sequence, users should click the “Submit!” button to start the analysis. The details of designed circRNA divergent primer sets, including primer sequence, length, melting temperature, GC content and relative position are presented in an interactive table. Users should click a row in the table to visualize the schematic diagram and amplification product of selected primer sets. The plot displays the schematic diagram of the circRNA divergent primer sets, providing an intuitive view of the circRNA junction site sequence and divergent primer sets. Users can customize the plot through the “Visualization parameters” dropdown menu, such as controlling the color to the circRNA and the primer sets. The “Download parameters” dropdown menu allows users to configure settings for downloading the plot. The amplification product panel displays the amplification product of selected primer sets.

Figure 3. The interface and results of circRNA differential expression analysis tool

query gene expression profile

The “query gene expression profile” tool is applied to query and visualize the gene expression profile in tumors, non-tumor tissues, cancer cell lines, non-tumor cells, and the TCGA and CCLE projects. As shown in Figure 4, the details of genes such as gene ID, type and aliases are shown in an interactive table, users can sort and search the table. Users should firstly click the row in gene list table to select gene of interest to visualize the expression profile. Upon the gene has selected, the gene overview tab shows the detailed information of selected gene, such as gene aliases and summary. The expression profile section contains 6 tabs: “Tumor tissue”, “Non-tumor tissue”, “Cancer cell line”, “Non-tumor cell”, “TCGA” and “CCLE”. Users should click the tab to visualize the detailed results.

Figure 4. The interface and results of query gene expression profile tool

Expression profile of gene in tumors

The “Tumor tissue” tab provides the expression landscape, differential expression results, and detailed expression data of the selected gene in both tumor and adjacent normal tissues across 33 cancer types (Figure 5), helping users quickly determine whether the gene exhibits a cancer-specific differential expression pattern or shares a common differential expression signature across different cancer types. As shown in Figure 5, the plot shows the expression profiles of gene in the tumors, depicting the expression levels in both tumor and paired normal tissues. Users can customize the plot through the “Visualization parameters” dropdown menu, such as controlling the color for tumor and normal tissue individually. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize gene expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The differential expression result table displays the differential expression results of gene across cancers, providing the average expression level, fold change value and p-value of the circRNA in each cancer type. The expression profile table displays the gene expression details for each sample.

Figure 5. The expression profile of gene in tumors

Expression profile of gene in non-tumor tissues

The “Non-tumor tissue” tab is designed to display gene expression profile across 21 types of normal tissue (Figure 6), helping users identify tissue-specific or ubiquitously expressed genes. As shown in Figure 6, the plot shows the expression profile of gene across the selected tissues. The “Select dataset” dropdown menu enables users to choose any tissue type of interest to display circRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as setting the color for normal tissue. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the circRNA expression details for each sample.

Figure 6. The expression profile of gene in non-tumor tissues

Expression profile of gene in cancer cell lines

The “Cancer cell line” tab provides gene expression profile across cancer cell lines of 19 cancer types (Figure 7), helping users select the appropriate cancer cell line model for research. As shown in Figure 7, the plot shows the expression profile of gene across the selected cancer cell lines. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display gene expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as controlling the color for cancer cell line. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize gene expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the gene expression details for each sample.

Figure 7. The expression profile of gene in cancer cell lines

Expression profile of gene in non-tumor cells

The “Non-tumor cell” tab is designed to display gene expression profile across 25 types of normal cell (Figure 8), helping users identify cell-specific or ubiquitously expressed genes. As shown in Figure 8, the plot shows the expression profile of gene across the selected cells. Users can click the “Select dataset” dropdown menu to select any cell type of interest to display gene expression profile in selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as defining the color for normal cell. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the gene expression details for each sample.

Figure 8. The expression profile of gene in non-tumor cells

Expression profile of gene in tumors of the TCGA project

The “TCGA” tab provides the expression landscape, differential expression results, and detailed expression data of the selected gene in both tumor and adjacent normal tissues across TCGA 33 cancer types (Figure 9), helping users quickly determine whether the gene exhibits a cancer-specific differential expression pattern or shares a common differential expression signature across different cancer types. As shown in Figure 9, the plot shows the expression profiles of gene in the tumors, depicting the expression levels in both tumor and paired normal tissues. Users can customize the plot through the “Visualization parameters” dropdown menu, such as controlling the color for tumor and normal tissue individually. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize gene expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The differential expression result table displays the differential expression results of gene across cancers, providing the average expression level, fold change value and p-value of the circRNA in each cancer type. The expression profile table displays the gene expression details for each sample.

Figure 9. The expression profile of gene in tumors of the TCGA project

Expression profile of gene in cancer cell lines of the CCLE project

The “CCLE” tab provides gene expression profile across CCLE cancer cell lines of 28 cancer types (Figure 10), helping users select the appropriate cancer cell line model for research. As shown in Figure 10, the plot shows the expression profile of gene across the selected cancer cell lines. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display gene expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as controlling the color for cancer cell line. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize gene expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the gene expression details for each sample.

Figure 10. The expression profile of gene in cancer cell lines of the CCLE project

query miRNA expression profile

The “query miRNA expression profile” tool is applied to query and visualize the miRNA expression profile in tumors, non-tumor tissues, cancer cell lines, non-tumor cells, and the TCGA and CCLE projects. As shown in Figure 11, the details of miRNAs such as miRNA accession and aliases are shown in an interactive table, users can sort and search the table. Users should firstly click the row in miRNA list table to select miRNA of interest to visualize the expression profile. After selecting the miRNA, the miRNA overview tab shows the detailed information of selected miRNA, such as miRNA accession, sequence, aliases, precursor and predicted targets. The expression profile section contains 6 tabs: “Tumor tissue”, “Non-tumor tissue”, “Cancer cell line”, “Non-tumor cell”, “TCGA” and “CCLE”. Users should click the tab to visualize the detailed results.

Figure 11. The interface and results of query miRNA expression profile tool

Expression profile of miRNA in tumors

The “Tumor tissue” tab provides the expression landscape, differential expression results, and detailed expression data of the selected miRNA in both tumor and adjacent normal tissues across 10 cancer types (Figure 12), helping users quickly determine whether the miRNA exhibits a cancer-specific differential expression pattern or shares a common differential expression signature across different cancer types. As shown in Figure 12, the plot shows the expression profiles of miRNA in the tumors, depicting the expression levels in both tumor and paired normal tissues. Users can customize the plot through the “Visualization parameters” dropdown menu, such as controlling the color for tumor and normal tissue individually. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize miRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The differential expression result table displays the differential expression results of miRNA across cancers, providing the average expression level, fold change value and p-value of the circRNA in each cancer type. The expression profile table displays the miRNA expression details for each sample.

Figure 12. The expression profile of miRNA in tumors

Expression profile of miRNA in non-tumor tissues

The “Non-tumor tissue” tab is designed to display miRNA expression profile across 17 types of normal tissue (Figure 13), helping users identify tissue-specific or ubiquitously expressed miRNAs. As shown in Figure 13, the plot shows the expression profile of miRNA across the selected tissues. The “Select dataset” dropdown menu enables users to choose any tissue type of interest to display circRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as setting the color for normal tissue. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the circRNA expression details for each sample.

Figure 13. The expression profile of miRNA in non-tumor tissues

Expression profile of miRNA in cancer cell lines

The “Cancer cell line” tab provides miRNA expression profile across cancer cell lines of 14 cancer types (Figure 14), helping users select the appropriate cancer cell line model for research. As shown in Figure 14, the plot shows the expression profile of miRNA across the selected cancer cell lines. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display miRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as controlling the color for cancer cell line. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize miRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the miRNA expression details for each sample.

Figure 14. The expression profile of miRNA in cancer cell lines

Expression profile of miRNA in non-tumor cells

The “Non-tumor cell” tab is designed to display miRNA expression profile across 25 types of normal cell (Figure 15), helping users identify cell-specific or ubiquitously expressed miRNAs. As shown in Figure 15, the plot shows the expression profile of miRNA across the selected cells. Users can click the “Select dataset” dropdown menu to select any cell type of interest to display miRNA expression profile in selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as defining the color for normal cell. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize circRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the miRNA expression details for each sample.

Figure 15. The expression profile of miRNA in non-tumor cells

Expression profile of miRNA in tumors of the TCGA project

The “TCGA” tab provides the expression landscape, differential expression results, and detailed expression data of the selected miRNA in both tumor and adjacent normal tissues across TCGA 33 cancer types (Figure 16), helping users quickly determine whether the miRNA exhibits a cancer-specific differential expression pattern or shares a common differential expression signature across different cancer types. As shown in Figure 16, the plot shows the expression profiles of miRNA in the tumors, depicting the expression levels in both tumor and paired normal tissues. Users can customize the plot through the “Visualization parameters” dropdown menu, such as controlling the color for tumor and normal tissue individually. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize miRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The differential expression result table displays the differential expression results of miRNA across cancers, providing the average expression level, fold change value and p-value of the circRNA in each cancer type. The expression profile table displays the miRNA expression details for each sample.

Figure 16. The expression profile of miRNA in tumors of the TCGA project

Expression profile of miRNA in cancer cell lines of the CCLE project

The “CCLE” tab provides miRNA expression profile across CCLE cancer cell lines of 27 cancer types (Figure 17), helping users select the appropriate cancer cell line model for research. As shown in Figure 17, the plot shows the expression profile of miRNA across the selected cancer cell lines. By clicking the “Select dataset” dropdown menu, users can select any cancer type of interest to display miRNA expression profile in the selected samples. Users can customize the plot using the “Visualization parameters” dropdown menu, such as controlling the color for cancer cell line. The “Plot type” dropdown menu enables users to select from a box plot, dot plot or bar plot to visualize miRNA expression profile. The “Download parameters” dropdown menu allows users to customize download parameters to save the plot. The expression profile table displays the miRNA expression details for each sample.

Figure 17. The expression profile of miRNA in cancer cell lines of the CCLE project

Data analysis

Obtain circRNA candidates

The circRNA candidates that downloaded from circBase database are filtered to obtain exonic circRNAs. The annotations such as genomic coordinates, host gene and exon index of exonic circRNAs are retrieved from circBase database.

# download circRNA candidates and annotations
wget http://www.circbase.org/download/hsa_hg19_circRNA.bed

Prediction of miRNA binding sites on circRNA

The putative spliced sequences of retained circRNAs are obtained from circBase database. The mature sequences of miRNAs (miRBase 22 release) are downloaded from miRBase database.

# download putative spliced sequences of circRNAs
wget http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz
# unpack
gunzip human_hg19_circRNAs_putative_spliced_sequence.fa.gz

# download mature sequences of miRNAs
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
# unpack
gunzip mature.fa.gz

The miRNA binding sites on exonic circRNA are predicted by miRanda (v3.3a) with following parameters:

# prediction of miRNA binding sites on circRNA
miRanda miRNA.fa circRNA.fa -go -8 -ge -2 -sc 120 > circRNA_miRanda.result

Analysis of RNA binding proteins (RBPs) binding sites on circRNA

The RBP binding sites on circRNA that inferred by RBP CLIP-seq experiments are retrieved from starBase database using the provided Web API.

# example command line
# get data of all RBPs for TP53 (in human)
curl 'http://starbase.sysu.edu.cn/api/RBPTarget/?assembly=hg19&geneType=circRNA&RBP=TP53&clipExpNum=1&pancancerNum=0&target=all&cellType=all' > hg19_TP53_circRNA_interaction

The relative positions of RBP binding sites on circRNA are determined by bedtools (v2.28.0) using following command line:

bedtools intersect -a RBP_binding_sites.bed -b circRNA_exon_coordinate.bed -wb -s > RBP_relative_position_circRNA

Prediction of coding potential of circRNA

The putative spliced sequences of circRNAs are submitted to ORFfinder software to search the putative open reading frames (ORFs) in circRNA sequence. Only ORFs with minimal 25 aa in length are kept.

The Internal Ribosome Entry Site (IRES) sequence has been extensively demonstrated to initiate the cap-independent translation of circRNA. The IRESfinder (v1.1.0) software is used to predict the IRES element on circRNAs that harbor putative ORFs.

python IRESfinder.py -f circRNA.fa -o circRNA.result -m 2 -w 174 -s 50

Visualization of circRNA

The circPlot package

The circPlot package is a wrapper of the ggplot2 and ggforce packages to facilitate the visualization of circRNA. The circPlot_circularization function is used to view the back-splicing event of circRNA. The circPlot_miRNA_binding function is utilized to visualize the miRNA binding sites on circRNA, while circPlot_miRNA_alignment function is designed to show the details of sequence complementary between circRNA and miRNA. The circPlot_RBP function is employed to display the RBP binding sites on circRNA. The circPlot_coding_potential function is applied to visualize the relative position of predicted ORF and IRES elements on circRNA. All R scripts of the circPlot package are freely available at https://github.com/zimuliving/circPlot.

The circPlot web portal

The circPlot interactive web server is built using shiny and shinydashboard packages. The data visualization is done with circPlot package. The interactive user interface is implemented with shiny framework and DT package. The source codes and data that used to develop the circPlot interactive web portal are freely available at https://github.com/zimuliving/circPlot.

ggforce manual: https://ggforce.data-imaginist.com/index.html

https://rviews.rstudio.com/2019/09/19/intro-to-ggforce/

In this post, I will walk you through some examples that show off the major features of the ggforce package. The main goal is to share a few ideas about customizing visualizations that you may find useful in your everyday work.

The ggforce package is an extension to ggplot2 developed by Thomas Pedersen. Thanks to ggforce, you can enhance almost any ggplot by highlighting data groupings, and focusing attention on interesting features of the plot. The package contains geoms, stats, facets, and other ggplot functions. Among such functions, there are some for marking the convex hull of a set of points, jittering data, and creating Voronoi plots.

Base ggplot

The examples in this article will use data from the nycflights13 package. Most of the examples will build on the same basic ggplot that visualizes airports by geographical location. I am using this data set because it makes it easy to plot x/y coordinates without having to remember what they “mean”. This basic plot will be saved to a variable, and then that variable will be used as the base of the examples of enhancing the visualization using ggforce

library(tidyverse)
library(ggforce)
library(nycflights13)

p <- airports %>%
  filter(lon < 0, tzone != "\\N") %>%
  ggplot(aes(lon, lat, color = tzone)) + 
  geom_point(show.legend = FALSE)  

p

Make your mark with ggforce

I have long been waiting for an easy way to draw an outline around groups of data. The geommark…() family of functions does exactly that. There are four mark functions in ggforce, all different based on the shape they draw around the group: 1. geom_mark_circle() 2. geom_mark_ellipse() 3. geom_mark_hull() 4. geom_mark_rect()

Let’s start with geom_mark_rect(); it will draw a rounded rectangle around each time zone group.

p +
  geom_mark_rect()

Like magic! The rectangles look amazing, even without modifying any arguments. Of course, more customization is possible via setting arguments. In this post, I will review some of the many great arguments available in ggforce functions, but I don’t want to rob you of the fun of trying it yourself and discovering all of the different options.

Label, and an arrow!

This next addition to our plot deserves its own subheading. Adding a label and an arrow pointing to a group would typically be a major undertaking. Without ggforce, this would require manually adding both the text and the arrow to the ggplot. But, with geom_mark it is a simple as setting the label argument. So, without further ado, here is the label argument in action:

p + 
  geom_mark_rect(aes(label = tzone))

The labels and arrows are not only drawn, but they are also placed in an optimized location. In addition, the position will recalculate if the plot is re-sized! There are too many little details about this label argument to mention. The backdrop is automatically white, the indicator is not really an arrow, it is a simple line that also underlines the text, so it is easy for the eye to know which group belongs to which label.

It is now easy to finalize the plot by resetting the theme, and again suppressing the legend using show.legend.

p + 
  geom_mark_rect(aes(label = tzone), show.legend = FALSE) +
  theme_void()

Hull-k, enhance!

There are many cases where drawing a rectangle or circle around the groups is not ideal, or even preferable. The geom_mark_hull() essentially traces a more complex polygon around the shape of the outline of the group.

p + 
  geom_mark_hull(aes(label = tzone)) +
  theme_void()

Again, without adding any arguments to the function, the traced outline already looks wonderful. Another option to add now is fill. And since the legend table is now redundant, it can be suppressed by setting show.legend to FALSE.

p + 
  geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE) +
  theme_void()

Notice that the fill color is not totally opaque; by default, ggforce has set the translucency lower to make sure that the dots are visible. This is something that I would have done anyway, usually by adding the alpha argument. In this case, it saves having to remember to add that argument.

Another adjustment that I thought was important for this plot was to modify the size of the hull, to change the padding around the outline of the group. The expand argument controls this aesthetic; it is possible to change it using the units() command.

p + 
  geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
  theme_void()

Axe theme_void()

To finalize plots such as this one, it is necessary to remove most components from the default theme. Usually, theme_void() does the trick. For printed or online articles with white backgrounds, which is essentially all of them, it is often hard to determine the margins of the plot. theme_no_axes() provides a great compromise by removing all but the one element.

p + 
  geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
  theme_no_axes()

Another facet of ggforce, and it’s magnify-cent

It is common to produce two plots, one to show the full picture, and another to magnify or focus on a specific area. With facet_zoom(), it is incredibly easy to show “macro” and “micro” in one plot by using the same xlim and ylim arguments to focus on an area of a plot.

p +
  facet_zoom(xlim = c(-155, -160.5), ylim = c(19, 22.3))

Skip the coordinates

Another cool feature of facet_zoom() is the ability to set the zoom region based on a row selection. To do this, simply pass an expression that you would use in a function such as filter() to the facet. So instead of using coordinates, I just tell the facet to zoom on anything that has a Pacific/Honolulu time zone.

p +
  facet_zoom(xy = tzone == "Pacific/Honolulu")

Putting it all together, with three lines of code

Using what has been covered so far, it is easy go from a very simple point plot to a sophisticated and nice-looking visualization with just three lines of code, thanks to ggforce.

p +
  geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
  theme_no_axes() +
  facet_zoom(x = tzone == "America/Los_Angeles")

“What is a Voronoi?”

This section title is based on my first reaction when I heard the word “Voronoi”. I have since learned about it, and can see why the Voronoi Diagram can be useful for very specific use cases. The good news is that if you encounter one of those use cases, you know that it is easy to draw it up in ggplot using geom_voronoi_segment().

The idea behind a Voronoi diagram is to split the area of the plot into as many sections as there are points. Unlike a grid or heat map, Voronoi draws custom shapes for each point based on the proximity to other points. It returns a plot that looks like stained glass. This can be good to determine the closest point inside each area. For example, a retailer can use it to see the area their store locations cover, and can help them make decisions to optimize their location based on the size of each Voronoi shape.

The following example will focus on airports in Alaska. The ggplot will zoom into that state’s general location, and then trace a hull shape. The hull will provide a quasi-map overlay. The final step is to add the Voronoi diagram layer by calling the function: geom_voronoi_segment()

p +
  geom_mark_hull(aes(fill = tzone), expand = unit(3, "mm")) +
  coord_cartesian(xlim = c(-130, -180), ylim = c(50, 75))  +
  geom_voronoi_segment()

Parallel to alluvial

The geom_parallel… functions allow visualizing interactions between categorical variables. The implementation is generic enough to create Sankey or alluvial charts.

For this, I will use the Manufacturer and Engine data from the planes table inside nycflights13. In this case, some simple data preparation is needed first.

prep_planes <- planes %>%
  filter(year > 1998, year < 2005) %>%
  filter(engine != "Turbo-shaft") %>%
  select(manufacturer, engine) %>%
  head(500)

prep_planes
## # A tibble: 500 x 2
##    manufacturer     engine   
##    <chr>            <chr>    
##  1 EMBRAER          Turbo-fan
##  2 AIRBUS INDUSTRIE Turbo-fan
##  3 AIRBUS INDUSTRIE Turbo-fan
##  4 EMBRAER          Turbo-fan
##  5 AIRBUS INDUSTRIE Turbo-fan
##  6 AIRBUS INDUSTRIE Turbo-fan
##  7 AIRBUS INDUSTRIE Turbo-fan
##  8 AIRBUS INDUSTRIE Turbo-fan
##  9 AIRBUS INDUSTRIE Turbo-fan
## 10 EMBRAER          Turbo-fan
## # … with 490 more rows

Prep for plotting with one line

The gather_set_data() is a convenience function that, just like gather(), creates a single line for each combination of categorical variables. The table contains three new columns - id, x, and y - which contain the combinations that each new row represents, and the row ID number from the original table.

prep_planes %>%
  gather_set_data(1:2)
## # A tibble: 1,000 x 5
##    manufacturer     engine       id x            y               
##    <chr>            <chr>     <int> <chr>        <chr>           
##  1 EMBRAER          Turbo-fan     1 manufacturer EMBRAER         
##  2 AIRBUS INDUSTRIE Turbo-fan     2 manufacturer AIRBUS INDUSTRIE
##  3 AIRBUS INDUSTRIE Turbo-fan     3 manufacturer AIRBUS INDUSTRIE
##  4 EMBRAER          Turbo-fan     4 manufacturer EMBRAER         
##  5 AIRBUS INDUSTRIE Turbo-fan     5 manufacturer AIRBUS INDUSTRIE
##  6 AIRBUS INDUSTRIE Turbo-fan     6 manufacturer AIRBUS INDUSTRIE
##  7 AIRBUS INDUSTRIE Turbo-fan     7 manufacturer AIRBUS INDUSTRIE
##  8 AIRBUS INDUSTRIE Turbo-fan     8 manufacturer AIRBUS INDUSTRIE
##  9 AIRBUS INDUSTRIE Turbo-fan     9 manufacturer AIRBUS INDUSTRIE
## 10 EMBRAER          Turbo-fan    10 manufacturer EMBRAER         
## # … with 990 more rows

The ggplot is primed with x for x, and then new aesthetics: id, split and value. For id, we pass the id column, split takes y, and finally, value is fixed to 1. The value is used to express the amount of “thickness” to add to that particular relationship; using 1 means that all combinations are weighted the same. At this point, the only argument to pass geom_parallel_sets() will be the color fill; in this case we will use engine.

Plotting with parallel

prep_planes %>%
  gather_set_data(1:2) %>%
  ggplot(aes(x, id = id, split = y, value = 1))  +
  geom_parallel_sets(aes(fill = engine))

The plot shows how a specific plane’s engine relates to each of the manufacturers. Next geom_parallel_sets_axes() provides a terminal box; the axis.width argument is the only one necessary to use at this stage, and we will set it to 0.1. The labels are added by using geom_parallel_sets_labels(), and they are automatically rotated.

prep_planes %>%
  gather_set_data(1:2) %>%
  ggplot(aes(x, id = id, split = y, value = 1))  +
  geom_parallel_sets(aes(fill = engine)) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels()

The following is done to finalize the plot:

geom_parallel_sets() - Hide the legend and lower the alpha
geom_parallel_sets_axes() - Change the fill color and font color
geom_parallel_sets_labels() - Remove the rotation of the label

prep_planes %>%
  gather_set_data(1:2) %>%
  ggplot(aes(x, id = id, split = y, value = 1))  +
  geom_parallel_sets(aes(fill = engine), show.legend = FALSE, alpha = 0.3) +
  geom_parallel_sets_axes(axis.width = 0.1, color = "lightgrey", fill = "white") +
  geom_parallel_sets_labels(angle = 0) +
  theme_no_axes()

Danger zone!

When visualizing the combination of a continuous and a categorical variable, it is common practice to resort to a bar or column plot. Cases that require representing this in a single circle shape usually involve modifying a polar bar in ggplot. But, this is much easier now with ggforce. I start with the total number of planes by engine planes:

planes %>%
  count(engine) 
## # A tibble: 6 x 2
##   engine            n
##   <chr>         <int>
## 1 4 Cycle           2
## 2 Reciprocating    28
## 3 Turbo-fan      2750
## 4 Turbo-jet       535
## 5 Turbo-prop        2
## 6 Turbo-shaft       5

and then pipe those results into ggplot using geom_arc_bar() to create the circle-shaped plot. The new aesthetics employed here are: x0, y0, r0, r, amount, and explode. The x, y, and r aesthetics refer to the position and the radius of the circle. Since only one plot is needed, I fix x and y to 0. For radius, the r0 refers to the inside of the circle, and r to the outside. Setting r0 to 0.7 and r to 1 will create a sort of doughnut with a 0.3 thickness. Finally, I use “pie” as the stat.

planes %>%
  count(engine) %>%
  ggplot() +
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine), alpha = 0.3, stat = "pie")

Another cool thing this geom does is to make it east to “break-away” one or several segments of the plot. The explode aesthetic controls that. To break away the “Turbo-jet” results, I create a new column called focus, setting it to 0.2 if it is part of that engine group, and to 0 if it is not, then finish up with theme_no_axes().

planes %>%
  count(engine) %>%
  mutate(focus = ifelse(engine == "Turbo-jet", 0.2, 0)) %>%
  ggplot() +
  geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine, explode = focus), alpha = 0.3, stat = "pie") +
  theme_no_axes()

This section is titled “Danger Zone”, because hanging the r0 in geom_arc_bar() may change the look of the plot to one that has fallen out of favor. That plot type happens to be the same name of the stat that we are using.

Closing remarks

ggforce is a great package that does a lot more than what I covered today. My hope is to have shared one or two things that will encourage you to try ggforce in your everyday work.

Special thanks to Thomas Pedersen, the author of the package and a co-worker of mine. His contributions to the R community also include the tidygraph and ggraph packages, which I wrote about in this blog post a few months back.

Contact us

If you find any bug, or have any comment, question or suggestion, please don't hesitate to send email to liuzimu1992@gmail.com. Alternatively, you are encouraged to create an issue at the project repostitory available at https://github.com/liuwenrong1992/circPlot.

Welcome to circPlot, an interactive web server for circRNA exploration, analysis and visualization

Exploration

Select a option to explore circRNA

circRNA ID

Notes: Users can enter the circRNA ID of 14 circRNA databases in the box

Examples: circBase circAtlas3 circBank CSCD2 circRNADb TransCirc Reset: reset

Notes: The running time ranging from two to five minutes

circRNA name

Notes: circRNA name cannot contain space, slash, backslash, special character and symbol

Select a option to select reference genome assembly

circRNA coordinate

Notes: The coordinate system must be 0-based, and the coordinate must be in the following format: chromosome:start-end|strand

Example: hg19 hg38 Reset: reset

Notes: The running time depends on size of coordinate range and ranging from two to five minutes

Select a option to provide circRNA sequence

circRNA name

Notes: circRNA name cannot contain space, slash, backslash, special character and symbol

circRNA sequence

Example: example Reset: reset

circRNA name

Notes: circRNA name cannot contain space, slash, backslash, special character and symbol

circRNA sequence

Notes: The running time depends on length of inputed circRNA sequence and ranging from two to ten minutes

Job ID

Notes: The job ID cannot contain space, slash, backslash, special character and symbol

Example: example Reset: reset

Notes: The running time ranging from one to five minutes

Analysis result

Notes: Click the row to display the results of selected circRNA

Result visualization

Host gene:

Transcript:

Location (0-based):

Length (bp):

IDs in database:

IDs in microarray platform:

Sequence:

Conserved mouse circRNA:

Gene name:

Description:

Location (0-based):

Gene type:

Aliases:

Other ID:

Summary:

transcript coordinate

circRNA coordinate

Notes: Prepare your upload file based on example data.

Schematic diagram of circRNA genomic position

Customize plot

Font size

Font type

Exon color

Intron color

Arrow color

Download plot and data

Width (inch)

Height (inch)

Figure format

Schematic diagram of circRNA genomic position

Customize plot

Font size

Font type

Exon color

Intron color

Arrow color

Download plot

Width (inch)

Height (inch)

Figure format

circRNA circularization

Notes: Prepare your upload file based on example data.

Schematic diagram of circRNA back-splicing

Customize plot

Inner radius size

Outer radius size

Font size

Font type

Divergent forward primer position

Divergent reverse primer position

Notes: The coordinate system must be 0-based, and the coordinate must be in the following format:
chromosome:start-end|strand