Circular RNAs (circRNAs) are a type of single-stranded covalently closed endogenous RNA molecules that generated by back-splicing. CircRNAs have attracted great attention in recent years and have been reported to regulate various physiological and pathological processes via sponging miRNAs, acting as RNA binding protein (RBP) decoys, and encoding functional peptides. However, there is still lacks a graphical user interface-based web tool for circRNA interactive analysis and visualization.
circPlot provides an user-friendly interface and a suite of plotting and analysis functions for rapid and intuitive exploration, visualization and analysis of pre-analyzed or user's own data:
1) comprehensive annotations of circRNA such as basic information, genomic position, sequence conservation, somatic mutation, secondary structure, epigenetic modification and divergent primer;
2) expression landscape of circRNA across multiple caner types, non-tumor tissues, cancer cell lines and non-tumor cells;
3) putative mechanisms of circRNA including miRNA sponge, RBP decoy and translating into peptide;
4) expression correlation analysis between circRNA and gene or miRNA in various tumor and non-tumor tissues, cell lines and cells.
Overall, circPlot is an user-friendly and powerful web application for visualizing, exploring and analyzing circRNAs without any programming skills.
Circular RNAs (circRNAs) are a class of endogenous RNA molecules with single-stranded covalently closed structure, which generate via the back-splicing event where the downstream splice donor is linked to the upstream splice acceptor. CircRNAs have attracted great attention in recent years and have merged as versatile regulator that extensively implicated in various physiological and pathological processes including cancer. CircRNAs have been demonstrated to exert their functions through multiple aspects of mechanisms such as regulating gene transcription, sponging miRNAs, acting as RNA binding protein (RBP) decoys, and encoding functional peptides (Figure 1).
Figure 1. Mechanisms of circRNA function
circPlot is an user-friendly interactive web server for comprehensively annotating circRNA and providing publication-quality plots to assist in circRNA research. It has integrated the sequencing data of 3180 samples covering 26 types of cancer, 107 types of normal tissue, 111 types of cancer cell line and 254 types of normal cell from our previous studies and public datasets. Besides, other publicly available datasets such as RBP CLIP-seq, somatic mutation sites, epigenetic modification sites are also included.
The circPlot web application contains 5 menus, namely Home, Search, Exploration, Tutorial and Contact (Figure 2). The Home page provides a brief introduction to the circPlot, the Search and Exploration pages are dedicated to query or analyze circRNA respectively, the Tutorial page contains detailed instructions on how to easily use the circPlot, the Contact page is designed for the feedback of website questions and comments to the developer.
Figure 2. The graphical interface of the circPlot web application
A suite of plotting and analysis functions is provided to facilitate the interactive analysis and visualization of pre-analyzed or user provided circRNA (Figure 3).
Figure 3. Schematic overview of the circPlot web application
Herein we provide the detailed instructions on how to easily use the circPlot web application.
Currently, circPlot has integrated the basic information, expression profile and putative mechanism of 815624 circRNAs. The Search page provides 5 search options, namely circRNA ID, circRNA sequence, gene, miRNA and RBP for users to query a circRNA of interest. The search results are shown upon clicking the “Submit” button. Besides, users can try the provided examples in each section directly.
As shown in Figure 1, users can choose the “circRNA ID” search option and then enter the circRNA ID or circRNA coordinate in the search box to query circRNA. The circRNA ID in 12 databases (circBase, circBank, circAtlas, DeepBase3, CIRCpedia2, circRNADb, exoRBase2, TSCD, CSCD2, MiOncoCirc, TransCirc, riboCIRC) and circRNA coordinate in hg19 or hg38 version are supported in the circPlot.
Figure 1. Query a circRNA of interest by searching circRNA ID
Users can choose the “circRNA sequence” search option and then paste the circRNA sequence in the box to query circRNA (Figure 2).
Figure 2. Query a circRNA of interest by searching circRNA sequence
Users can choose the “gene” search option and then paste the gene name, gene aliases or gene ID in the box to query circRNA (Figure 3). Currently, gene ID in the NCBI, Ensembl, geneAliases, MIM, HGNC, AllianceGenome or IMGT are supported in the circPlot.
Figure 3. Query a circRNA of interest by searching gene
Users can choose the “miRNA” search option and then paste the miRNA name, miRNA accession or miRNA aliases in the box and set the number of miRNA binding site to query circRNA (Figure 4). Currently, only miRNA name, miRNA accession or miRNA aliases in the miRBase (Release 22.1) database are supported in the circPlot.
Figure 4. Query a circRNA of interest by searching miRNA
Users can choose the “RBP” search option and then paste the gene name, gene aliases or gene ID in the box and set the number of RBP binding site to query circRNA (Figure 5). Currently, gene ID in the NCBI, Ensembl, geneAliases, MIM, HGNC, AllianceGenome or IMGT are supported in the circPlot.
Figure 5. Query a circRNA of interest by searching RBP
If the circRNA is not include in our pre-analyzed results, users can analyze this circRNA by simply selecting the Exploration page and typing circRNA coordinate or circRNA sequence in the corresponding input box. Subsequently, users should click the “Submit” button to trigger the analysis and the results will display upon the server has finished the analysis. In addition, users can use the predefined examples for a try.
Users can choose the “circRNA coordinate” option and then paste the circRNA name and circRNA coordinate in the box to analyze circRNA (Figure 6). It is worth noting that the circRNA name and coordinate must be in proper format.
Figure 6. Analyze a circRNA by circRNA coordinate
Besides, users can analyze a circRNA by circRNA sequence. Users should firstly choose the “circRNA sequence” option and then select to provide circRNA sequence by typing (Figure 7) or uploading (Figure 8) the circRNA sequence. It is worth noting that the circRNA name and sequence must be in proper format.
Figure 7. Users can type circRNA sequence and then analyze circRNA
Figure 8. Users can upload circRNA sequence and then analyze circRNA
To show the analysis and visualization results of a circRNA, users should firstly select a circRNA of interest via clicking the row of result table. As the circRNA is selected, a suite of analyses and visualization result will be displayed for this circRNA (Figure 9), including:
Figure 9. Overview of results of selected circRNA
The circPlot provides the interactivity between users and analysis visualization results, which is summary blow:
The “General information” section provides basic information of circRNA, and has 8 menus, namely circRNA overview, host gene overview, genomic position, circularization diagram, base modification, secondary structure, somatic mutation and divergent primer.
This section provides the basic information of circRNA, including host gene, linear transcript, genomic coordinates, aliases of circRNA in circRNA databases and microarray platforms, spliced sequence, and the conserved mouse circRNA reported in circBase database. Besides, users can click the hyperlink to view detailed information recorded in the corresponding database (Figure 10).
Figure 10. Basic information of circRNA
The basic information of circRNA host gene such as linear transcript, aliases in other databases, and gene summary are shown in this section. Users can click the hyperlink to view detailed information recorded in the corresponding database (Figure 11).
Figure 11. Basic information of circRNA host gene
This module is designed to visualize the genomic position and sequence conservation of circRNA using pre-analyzed result (Figure 12) or user's own data (Figure 13).
Figure 12. Visualization of circRNA genomic position and sequence conservation of pre-analyzed results
Figure 13. Visualization of circRNA genomic position of user provided data
In this section, users can select the “Pre-analyzed result” (Figure 14) or “Visualize your data?” (Figure 15) option to display the schematic diagram of circRNA back-splicing. Besides, users can set the relative position of circRNA divergent and convergent primer sets to show the schematic diagram of primer sets.
Figure 14. Visualization of schematic diagram of circRNA back-splicing of pre-analyzed results
Figure 15. Visualization of schematic diagram of circRNA back-splicing of user provided data
In this section, users can select the “Pre-analyzed result” (Figure 16) or “Visualize your data?” (Figure 17) option to visualize the base modification sites on circRNA.
Figure 16. Visualization of base modification sites on circRNA of pre-analyzed results
Figure 17. Visualization of base modification sites on circRNA of user provided data
This module is designed to visualize the secondary structure of circRNA using pre-analyzed result (Figure 18) or user's own data (Figure 19). The statistics on circRNA secondary structure illustrates the number and relative position of 3 kinds of structure (namely hairpin, internal-loop and multi-branched loop). Besides, users can download the sequence of circRNA secondary structure, which can be analyzed or visualized by other web tools, such as RNAfold and forna.
Figure 18. Visualization of secondary structure of circRNA of pre-analyzed results
Figure 19. Visualization of secondary structure of circRNA of user provided data
In this section, users can select the “Pre-analyzed result” (Figure 20) or “Visualize your data?” (Figure 21) option to visualize the somatic mutation sites on circRNA.
Figure 20. Visualization of somatic mutation sites on circRNA of pre-analyzed results
Figure 21. Visualization of somatic mutation sites on circRNA of user provided data
This module is designed to design and visualize the divergent primer of circRNA using pre-analyzed result (Figure 22) or user's own data (Figure 23). Users can click the row of result table to view the details and visualization results of selected primer sets.
Figure 22. Visualization of divergent primer of circRNA of pre-analyzed results
Figure 23. Visualization of divergent primer of circRNA of user provided data
Besides, users can design the divergent primer using their own circRNA sequence by selecting “Design circRNA divergent primer?” option. The circRNA sequence can be provided via typing (Figure 24) or uploading (Figure 25).
Figure 24. Users can type circRNA sequence and then design and visualize circRNA divergent primer
Figure 25. Users can upload circRNA sequence and then design and visualize circRNA divergent primer
The “Expression profile” section provides circRNA expression profiles in cancers, normal tissues, cancer cell lines and normal cells. Currently, 3 plot types, namely “Box plot”, “Dot plot” and “Bar plot” (Figure 26) are provided for the visualization results this section, which users can choose according to their needs. Collectively, this module contains valuable resources to benefit the circRNA research community.
Figure 26. Three plot types are provided for the visualization results
The “Tumor tissue” module provides the details and visualization result of circRNA expression profiles in tumor and their adjacent normal tissues (Figure 27). Besides, the differential expression results are also provided, helping users to screen out the cancer-specific or ubiquitously altered circRNAs in cancer. Alternatively, ssers can also select to upload and visualize their own plotting data (Figure 28).
Figure 27. The visualization, details and differential expression results of circRNA in tumor and its adjacent normal tissues
Figure 28. Users can upload and visualize circRNA expression profiles in tumor and its adjacent normal tissues
The “Non-tumor tissue” section is designed to display circRNA expression profiles in normal tissues of the pre-analyzed result (Figure 29) or users provided data (Figure 30).
Figure 29. The visualization results and details of circRNA in normal tissues
Figure 30. Users can upload and visualize circRNA expression profiles in normal tissues
In “Cancer cell line” module, herein we provide the visualization results and details of circRNA in cancer cell lines (Figure 31), helping users to select the appropriate cancer cell lines and facilitating circRNA researches in cancer. This module also allows the users to visualize their own plotting data (Figure 32).
Figure 31. The visualization results and details of circRNA in cancer cell lines
Figure 32. Users can upload and visualize circRNA expression profiles in cancer cell lines
The expression profiles of circRNA in normal cells is provided in the “Non-tumor cell” module (Figure 33). Besides, users are permited to upload and visualize their own data as followed the instructions (Figure 34).
Figure 33. The visualization results and details of circRNA in normal cells
Figure 34. Users can upload and visualize circRNA expression profiles in normal cells
In the “Putative mechanism” section, we provide the visualization and analysis results of circRNA-miRNA interaction, circRNA-RBP interaction and circRNA coding potential. This section suggests valuable clues to circRNA mechanism research and will significantly contribute to research on circRNAs.
The “circRNA-miRNA interaction” module shows miRNA binding sites on circRNA and sequence alignments between circRNA and miRNA. Users should click the rows of result table to select miRNAs of interest and set the color for each miRNA (Figure 35). Upon the rows have selected, the visualization results is shown and will update when the selected rows changed. The sequence alignments of circRNA-miRNA interaction can be exported in plain text format. Alternatively, users can upload and visualize their own data (Figure 36).
Figure 35. The visualization results of circRNA-miRNA interaction
Figure 36. Users can upload and visualize circRNA-miRNA interaction
The “circRNA-RBP interaction” module shows RBP binding sites on circRNA. Users should click the rows of result table to select RBP of interest and set the color for each RBP (Figure 37). Upon the rows have selected, the visualization results is shown and will update when the selected rows changed. The sequence of RBP binding site can be exported in plain text format. Users can upload and visualize their own data (Figure 38).
Figure 37. The visualization result of RBP binding sites on circRNA
Figure 38. Users can upload and visualize circRNA-RBP interaction
The “circRNA coding potential” module is designed to display ORF, IRES element and m6A modification sites on circRNA. Users should click the rows of result table to select ORF and then the visualization results will show (Figure 39). Users can download the sequence of ORF and IRES element. The basic properties such as amino acid composition and molecular weight of selected ORF can be downloaded. Besides, the details of N-Glycosylation, mucin type O-Glycosylation, and S, T and Y phosphorylation modification sites are also provided and can be exported in plain text format. As the selected rows changed, the analysis and visualization results will update immediately. Users can upload and visualize their own data (Figure 40).
Figure 39. The visualization result of circRNA coding potential
Figure 40. Users can upload and visualize circRNA coding potential
The circRNA candidates that downloaded from circBase database are filtered to obtain exonic circRNAs. The annotations such as genomic coordinates, host gene and exon index of exonic circRNAs are retrieved from circBase database.
# download circRNA candidates and annotations
wget http://www.circbase.org/download/hsa_hg19_circRNA.bed
The putative spliced sequences of retained circRNAs are obtained from circBase database. The mature sequences of miRNAs (miRBase 22 release) are downloaded from miRBase database.
# download putative spliced sequences of circRNAs
wget http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz
# unpack
gunzip human_hg19_circRNAs_putative_spliced_sequence.fa.gz
# download mature sequences of miRNAs
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
# unpack
gunzip mature.fa.gz
The miRNA binding sites on exonic circRNA are predicted by miRanda (v3.3a) with following parameters:
# prediction of miRNA binding sites on circRNA
miRanda miRNA.fa circRNA.fa -go -8 -ge -2 -sc 120 > circRNA_miRanda.result
The RBP binding sites on circRNA that inferred by RBP CLIP-seq experiments are retrieved from starBase database using the provided Web API.
# example command line
# get data of all RBPs for TP53 (in human)
curl 'http://starbase.sysu.edu.cn/api/RBPTarget/?assembly=hg19&geneType=circRNA&RBP=TP53&clipExpNum=1&pancancerNum=0&target=all&cellType=all' > hg19_TP53_circRNA_interaction
The relative positions of RBP binding sites on circRNA are determined by bedtools (v2.28.0) using following command line:
bedtools intersect -a RBP_binding_sites.bed -b circRNA_exon_coordinate.bed -wb -s > RBP_relative_position_circRNA
The putative spliced sequences of circRNAs are submitted to ORFfinder software to search the putative open reading frames (ORFs) in circRNA sequence. Only ORFs with minimal 25 aa in length are kept.
The Internal Ribosome Entry Site (IRES) sequence has been extensively demonstrated to initiate the cap-independent translation of circRNA. The IRESfinder (v1.1.0) software is used to predict the IRES element on circRNAs that harbor putative ORFs.
python IRESfinder.py -f circRNA.fa -o circRNA.result -m 2 -w 174 -s 50
The circPlot package is a wrapper of the ggplot2 and ggforce packages to facilitate the visualization of circRNA. The circPlot_circularization function is used to view the back-splicing event of circRNA. The circPlot_miRNA_binding function is utilized to visualize the miRNA binding sites on circRNA, while circPlot_miRNA_alignment function is designed to show the details of sequence complementary between circRNA and miRNA. The circPlot_RBP function is employed to display the RBP binding sites on circRNA. The circPlot_coding_potential function is applied to visualize the relative position of predicted ORF and IRES elements on circRNA. All R scripts of the circPlot package are freely available at https://github.com/zimuliving/circPlot.
The circPlot interactive web server is built using shiny and shinydashboard packages. The data visualization is done with circPlot package. The interactive user interface is implemented with shiny framework and DT package. The source codes and data that used to develop the circPlot interactive web portal are freely available at https://github.com/zimuliving/circPlot.
The circRNA candidates that downloaded from circBase database are filtered to obtain exonic circRNAs. The annotations such as genomic coordinates, host gene and exon index of exonic circRNAs are retrieved from circBase database.
# download circRNA candidates and annotations
wget http://www.circbase.org/download/hsa_hg19_circRNA.bed
The putative spliced sequences of retained circRNAs are obtained from circBase database. The mature sequences of miRNAs (miRBase 22 release) are downloaded from miRBase database.
# download putative spliced sequences of circRNAs
wget http://www.circbase.org/download/human_hg19_circRNAs_putative_spliced_sequence.fa.gz
# unpack
gunzip human_hg19_circRNAs_putative_spliced_sequence.fa.gz
# download mature sequences of miRNAs
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
# unpack
gunzip mature.fa.gz
The miRNA binding sites on exonic circRNA are predicted by miRanda (v3.3a) with following parameters:
# prediction of miRNA binding sites on circRNA
miRanda miRNA.fa circRNA.fa -go -8 -ge -2 -sc 120 > circRNA_miRanda.result
The RBP binding sites on circRNA that inferred by RBP CLIP-seq experiments are retrieved from starBase database using the provided Web API.
# example command line
# get data of all RBPs for TP53 (in human)
curl 'http://starbase.sysu.edu.cn/api/RBPTarget/?assembly=hg19&geneType=circRNA&RBP=TP53&clipExpNum=1&pancancerNum=0&target=all&cellType=all' > hg19_TP53_circRNA_interaction
The relative positions of RBP binding sites on circRNA are determined by bedtools (v2.28.0) using following command line:
bedtools intersect -a RBP_binding_sites.bed -b circRNA_exon_coordinate.bed -wb -s > RBP_relative_position_circRNA
The putative spliced sequences of circRNAs are submitted to ORFfinder software to search the putative open reading frames (ORFs) in circRNA sequence. Only ORFs with minimal 25 aa in length are kept.
The Internal Ribosome Entry Site (IRES) sequence has been extensively demonstrated to initiate the cap-independent translation of circRNA. The IRESfinder (v1.1.0) software is used to predict the IRES element on circRNAs that harbor putative ORFs.
python IRESfinder.py -f circRNA.fa -o circRNA.result -m 2 -w 174 -s 50
The circPlot package is a wrapper of the ggplot2 and ggforce packages to facilitate the visualization of circRNA. The circPlot_circularization function is used to view the back-splicing event of circRNA. The circPlot_miRNA_binding function is utilized to visualize the miRNA binding sites on circRNA, while circPlot_miRNA_alignment function is designed to show the details of sequence complementary between circRNA and miRNA. The circPlot_RBP function is employed to display the RBP binding sites on circRNA. The circPlot_coding_potential function is applied to visualize the relative position of predicted ORF and IRES elements on circRNA. All R scripts of the circPlot package are freely available at https://github.com/zimuliving/circPlot.
The circPlot interactive web server is built using shiny and shinydashboard packages. The data visualization is done with circPlot package. The interactive user interface is implemented with shiny framework and DT package. The source codes and data that used to develop the circPlot interactive web portal are freely available at https://github.com/zimuliving/circPlot.
ggforce manual: https://ggforce.data-imaginist.com/index.html
https://rviews.rstudio.com/2019/09/19/intro-to-ggforce/
In this post, I will walk you through some examples that show off the major features of the ggforce package. The main goal is to share a few ideas about customizing visualizations that you may find useful in your everyday work.
The ggforce package is an extension to ggplot2 developed by Thomas Pedersen. Thanks to ggforce, you can enhance almost any ggplot by highlighting data groupings, and focusing attention on interesting features of the plot. The package contains geoms, stats, facets, and other ggplot functions. Among such functions, there are some for marking the convex hull of a set of points, jittering data, and creating Voronoi plots.
The examples in this article will use data from the nycflights13 package. Most of the examples will build on the same basic ggplot that visualizes airports by geographical location. I am using this data set because it makes it easy to plot x/y coordinates without having to remember what they “mean”. This basic plot will be saved to a variable, and then that variable will be used as the base of the examples of enhancing the visualization using ggforce
library(tidyverse)
library(ggforce)
library(nycflights13)
p <- airports %>%
filter(lon < 0, tzone != "\\N") %>%
ggplot(aes(lon, lat, color = tzone)) +
geom_point(show.legend = FALSE)
p
I have long been waiting for an easy way to draw an outline around groups of data. The geommark…() family of functions does exactly that. There are four mark functions in ggforce, all different based on the shape they draw around the group:
1. geom_mark_circle()
2. geom_mark_ellipse()
3. geom_mark_hull()
4. geom_mark_rect()
Let’s start with geom_mark_rect(); it will draw a rounded rectangle around each time zone group.
p +
geom_mark_rect()
Like magic! The rectangles look amazing, even without modifying any arguments. Of course, more customization is possible via setting arguments. In this post, I will review some of the many great arguments available in ggforce functions, but I don’t want to rob you of the fun of trying it yourself and discovering all of the different options.
This next addition to our plot deserves its own subheading. Adding a label and an arrow pointing to a group would typically be a major undertaking. Without ggforce, this would require manually adding both the text and the arrow to the ggplot. But, with geom_mark it is a simple as setting the label argument. So, without further ado, here is the label argument in action:
p +
geom_mark_rect(aes(label = tzone))
The labels and arrows are not only drawn, but they are also placed in an optimized location. In addition, the position will recalculate if the plot is re-sized! There are too many little details about this label argument to mention. The backdrop is automatically white, the indicator is not really an arrow, it is a simple line that also underlines the text, so it is easy for the eye to know which group belongs to which label.
It is now easy to finalize the plot by resetting the theme, and again suppressing the legend using show.legend.
p +
geom_mark_rect(aes(label = tzone), show.legend = FALSE) +
theme_void()
There are many cases where drawing a rectangle or circle around the groups is not ideal, or even preferable. The geom_mark_hull() essentially traces a more complex polygon around the shape of the outline of the group.
p +
geom_mark_hull(aes(label = tzone)) +
theme_void()
Again, without adding any arguments to the function, the traced outline already looks wonderful. Another option to add now is fill. And since the legend table is now redundant, it can be suppressed by setting show.legend to FALSE.
p +
geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE) +
theme_void()
Notice that the fill color is not totally opaque; by default, ggforce has set the translucency lower to make sure that the dots are visible. This is something that I would have done anyway, usually by adding the alpha argument. In this case, it saves having to remember to add that argument.
Another adjustment that I thought was important for this plot was to modify the size of the hull, to change the padding around the outline of the group. The expand argument controls this aesthetic; it is possible to change it using the units() command.
p +
geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
theme_void()
To finalize plots such as this one, it is necessary to remove most components from the default theme. Usually, theme_void() does the trick. For printed or online articles with white backgrounds, which is essentially all of them, it is often hard to determine the margins of the plot. theme_no_axes() provides a great compromise by removing all but the one element.
p +
geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
theme_no_axes()
It is common to produce two plots, one to show the full picture, and another to magnify or focus on a specific area. With facet_zoom(), it is incredibly easy to show “macro” and “micro” in one plot by using the same xlim and ylim arguments to focus on an area of a plot.
p +
facet_zoom(xlim = c(-155, -160.5), ylim = c(19, 22.3))
Another cool feature of facet_zoom() is the ability to set the zoom region based on a row selection. To do this, simply pass an expression that you would use in a function such as filter() to the facet. So instead of using coordinates, I just tell the facet to zoom on anything that has a Pacific/Honolulu time zone.
p +
facet_zoom(xy = tzone == "Pacific/Honolulu")
Using what has been covered so far, it is easy go from a very simple point plot to a sophisticated and nice-looking visualization with just three lines of code, thanks to ggforce.
p +
geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) +
theme_no_axes() +
facet_zoom(x = tzone == "America/Los_Angeles")
This section title is based on my first reaction when I heard the word “Voronoi”. I have since learned about it, and can see why the Voronoi Diagram can be useful for very specific use cases. The good news is that if you encounter one of those use cases, you know that it is easy to draw it up in ggplot using geom_voronoi_segment().
The idea behind a Voronoi diagram is to split the area of the plot into as many sections as there are points. Unlike a grid or heat map, Voronoi draws custom shapes for each point based on the proximity to other points. It returns a plot that looks like stained glass. This can be good to determine the closest point inside each area. For example, a retailer can use it to see the area their store locations cover, and can help them make decisions to optimize their location based on the size of each Voronoi shape.
The following example will focus on airports in Alaska. The ggplot will zoom into that state’s general location, and then trace a hull shape. The hull will provide a quasi-map overlay. The final step is to add the Voronoi diagram layer by calling the function: geom_voronoi_segment()
p +
geom_mark_hull(aes(fill = tzone), expand = unit(3, "mm")) +
coord_cartesian(xlim = c(-130, -180), ylim = c(50, 75)) +
geom_voronoi_segment()
The geom_parallel… functions allow visualizing interactions between categorical variables. The implementation is generic enough to create Sankey or alluvial charts.
For this, I will use the Manufacturer and Engine data from the planes table inside nycflights13. In this case, some simple data preparation is needed first.
prep_planes <- planes %>%
filter(year > 1998, year < 2005) %>%
filter(engine != "Turbo-shaft") %>%
select(manufacturer, engine) %>%
head(500)
prep_planes
## # A tibble: 500 x 2
## manufacturer engine
## <chr> <chr>
## 1 EMBRAER Turbo-fan
## 2 AIRBUS INDUSTRIE Turbo-fan
## 3 AIRBUS INDUSTRIE Turbo-fan
## 4 EMBRAER Turbo-fan
## 5 AIRBUS INDUSTRIE Turbo-fan
## 6 AIRBUS INDUSTRIE Turbo-fan
## 7 AIRBUS INDUSTRIE Turbo-fan
## 8 AIRBUS INDUSTRIE Turbo-fan
## 9 AIRBUS INDUSTRIE Turbo-fan
## 10 EMBRAER Turbo-fan
## # … with 490 more rows
The gather_set_data() is a convenience function that, just like gather(), creates a single line for each combination of categorical variables. The table contains three new columns - id, x, and y - which contain the combinations that each new row represents, and the row ID number from the original table.
prep_planes %>%
gather_set_data(1:2)
## # A tibble: 1,000 x 5
## manufacturer engine id x y
## <chr> <chr> <int> <chr> <chr>
## 1 EMBRAER Turbo-fan 1 manufacturer EMBRAER
## 2 AIRBUS INDUSTRIE Turbo-fan 2 manufacturer AIRBUS INDUSTRIE
## 3 AIRBUS INDUSTRIE Turbo-fan 3 manufacturer AIRBUS INDUSTRIE
## 4 EMBRAER Turbo-fan 4 manufacturer EMBRAER
## 5 AIRBUS INDUSTRIE Turbo-fan 5 manufacturer AIRBUS INDUSTRIE
## 6 AIRBUS INDUSTRIE Turbo-fan 6 manufacturer AIRBUS INDUSTRIE
## 7 AIRBUS INDUSTRIE Turbo-fan 7 manufacturer AIRBUS INDUSTRIE
## 8 AIRBUS INDUSTRIE Turbo-fan 8 manufacturer AIRBUS INDUSTRIE
## 9 AIRBUS INDUSTRIE Turbo-fan 9 manufacturer AIRBUS INDUSTRIE
## 10 EMBRAER Turbo-fan 10 manufacturer EMBRAER
## # … with 990 more rows
The ggplot is primed with x for x, and then new aesthetics: id, split and value. For id, we pass the id column, split takes y, and finally, value is fixed to 1. The value is used to express the amount of “thickness” to add to that particular relationship; using 1 means that all combinations are weighted the same. At this point, the only argument to pass geom_parallel_sets() will be the color fill; in this case we will use engine.
prep_planes %>%
gather_set_data(1:2) %>%
ggplot(aes(x, id = id, split = y, value = 1)) +
geom_parallel_sets(aes(fill = engine))
The plot shows how a specific plane’s engine relates to each of the manufacturers. Next geom_parallel_sets_axes() provides a terminal box; the axis.width argument is the only one necessary to use at this stage, and we will set it to 0.1. The labels are added by using geom_parallel_sets_labels(), and they are automatically rotated.
prep_planes %>%
gather_set_data(1:2) %>%
ggplot(aes(x, id = id, split = y, value = 1)) +
geom_parallel_sets(aes(fill = engine)) +
geom_parallel_sets_axes(axis.width = 0.1) +
geom_parallel_sets_labels()
The following is done to finalize the plot:
geom_parallel_sets()
- Hide the legend and lower the alphageom_parallel_sets_axes()
- Change the fill color and font colorgeom_parallel_sets_labels()
- Remove the rotation of the labelprep_planes %>%
gather_set_data(1:2) %>%
ggplot(aes(x, id = id, split = y, value = 1)) +
geom_parallel_sets(aes(fill = engine), show.legend = FALSE, alpha = 0.3) +
geom_parallel_sets_axes(axis.width = 0.1, color = "lightgrey", fill = "white") +
geom_parallel_sets_labels(angle = 0) +
theme_no_axes()
When visualizing the combination of a continuous and a categorical variable, it is common practice to resort to a bar or column plot. Cases that require representing this in a single circle shape usually involve modifying a polar bar in ggplot. But, this is much easier now with ggforce. I start with the total number of planes by engine planes:
planes %>%
count(engine)
## # A tibble: 6 x 2
## engine n
## <chr> <int>
## 1 4 Cycle 2
## 2 Reciprocating 28
## 3 Turbo-fan 2750
## 4 Turbo-jet 535
## 5 Turbo-prop 2
## 6 Turbo-shaft 5
and then pipe those results into ggplot using geom_arc_bar() to create the circle-shaped plot. The new aesthetics employed here are: x0, y0, r0, r, amount, and explode. The x, y, and r aesthetics refer to the position and the radius of the circle. Since only one plot is needed, I fix x and y to 0. For radius, the r0 refers to the inside of the circle, and r to the outside. Setting r0 to 0.7 and r to 1 will create a sort of doughnut with a 0.3 thickness. Finally, I use “pie” as the stat.
planes %>%
count(engine) %>%
ggplot() +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine), alpha = 0.3, stat = "pie")
Another cool thing this geom does is to make it east to “break-away” one or several segments of the plot. The explode aesthetic controls that. To break away the “Turbo-jet” results, I create a new column called focus, setting it to 0.2 if it is part of that engine group, and to 0 if it is not, then finish up with theme_no_axes().
planes %>%
count(engine) %>%
mutate(focus = ifelse(engine == "Turbo-jet", 0.2, 0)) %>%
ggplot() +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine, explode = focus), alpha = 0.3, stat = "pie") +
theme_no_axes()
This section is titled “Danger Zone”, because hanging the r0 in geom_arc_bar() may change the look of the plot to one that has fallen out of favor. That plot type happens to be the same name of the stat that we are using.
ggforce is a great package that does a lot more than what I covered today. My hope is to have shared one or two things that will encourage you to try ggforce in your everyday work.
Special thanks to Thomas Pedersen, the author of the package and a co-worker of mine. His contributions to the R community also include the tidygraph and ggraph packages, which I wrote about in this blog post a few months back.
If you find any bug, or have any comment, question or suggestion, please don't hesitate to send email to liuzimu1992@gmail.com. Alternatively, you are encouraged to create an issue at the project repostitory available at https://github.com/zimuliving/circPlot.