Preface

What can I find in this geom encyclopedia?

Let’s begin by looking at what we mean by geom in data visualization. In this context, geoms (short for geometries in ggplot2 package) are the visual elements used to represent data in a plot. They define the type of chart such as point chart, line chart, bar chart, etc.

To illustrate the idea I use the following dataset which contains information related to 82 countires. The variable description is as follows (Table 1):

Table 1: Variable Description
Variable Description
Country Country name
Income Income category in 2021: lower income (L), lower middle income (LM)
Electricity Percentage of people access to electricity

The first eight rows of the dataset as well as the R code to load the dataset is given below:

library(drone)
library(tidyverse)
data("worldbankdata")
worldbankdata2021 <- worldbankdata |> filter(Year == 2021) |>
  filter( Income == "LM" | Income =="L") |> 
  select( Country, Income, Electricity)
worldbankdata2021 |> head(8)
# A tibble: 8 × 3
  Country      Income Electricity
  <fct>        <fct>        <dbl>
1 Afghanistan  L             97.7
2 Angola       LM            48.2
3 Burundi      L             10.2
4 Benin        LM            42.0
5 Burkina Faso L             19.0
6 Bangladesh   LM            99.0
7 Bolivia      LM            98.6
8 Bhutan       LM           100  

Now, I want to visualize the relationship between Electricity and Income variables on the cartesian coordinate plane shown in Figure 1

Figure 1: Cartesian coordinate plane showing Income (x-axis) and Electricity (y-axis).

Figure 2 shows 6 plotting types that I created to visualise the relationship between the variables Electricity and Income on the Cartesian plane Figure 1.

Figure 2: Six plot types used to visualize the relationship between income level (low income [L], lower-middle income [LM]) and the percentage of people with access to electricity in 2021.

In all panels of Figure 2 (a–h), the same dataset, variables, and Cartesian coordinate system are used. However, the chart types different.

Could you list the differences you observe in the charts?

  1. The type of chart or shape used to depict data.

  2. The statistics computed on the data to visualize on the chart.

When creating a chart, we first decide on the statistic we want to visualize. Next, we compute this statistic from the data, and finally, we use a suitable geometry to display the computed values of the statistic. Table 2 summarizes the statistical operations (or statistical transformations) performed on the data and the corresponding geometries used to visualize the computed statistics.

Table 2: Statistical transformations and geom/stat layers are used in the plots.
Plot Statistical Operation Visualisation Method geom layer stat layer
A Use individual data points Point geometry point data is plotted as-is
B Use individual data points Quasirandom geometry quasirandom data is plotted as-is. But, generate quasirandom placement of points, helpful for avoiding overplotting
C Use individual data points Jitter geometry jitter data is plotted as-is. But, adds a small amount of random noise to the position of points to reduce overlap and make individual points more visible
D Compute quartiles, Q1 − 3·IQR, and Q3 + 3·IQR Box-and-whisker plot boxplot Five statistical summary measures used in the boxplot
E Use individual data points; compute quartiles, Q1 − 3·IQR, and Q3 + 3·IQR Jitter geometry and box-and-whisker plot jitter and boxplot Individual points and summary measures corresponds to boxplot
F Compute kernel density Violin geometry violin calculate and plot the kernel density estimate(KDE) for continuous data
G Compute minimum and maximum of data Line geometry linerange Range of data
H Compute minimum, mean, and maximum Line geometry (range) and point geometry (mean) linerange and point Range and mean of data

This encyclopedia is a collection of geoms, in other words plot types that you can create using the ggplot2 or extensions of ggplot2 under the grammar of graphics framework. In other words Encyclopedia of Plots.

Motivation to write this book

The motivation behind writing this encyclopedia is, there is no centralized resource where all geoms can be viewed in one place. Additionally, no comprehensive book exists that catalogs the different types of plots available for data visualization. Having them in one place help data visualizers to craft more effective analyses and create new geoms. Further, this also helps to avoid duplicate efforts.

What you will learn?

In this geom encyclopedia you will learn different types of geoms and their applications. Furthermore, each geom has a set of aesthetics that it understands. These aesthetics can be divided into two parts: i) required aesthetics and ii) optional aesthetics. Further, every geom has a default stat; and every stat has a default geom.

To give you an idea about the associated aesthetics and stat, let’s look at the plot shown in Figure 3. In this case x, y are required aesthetics and color size and alpha is optional aesthetics. The statistics layer is identity since the data points are plotted as it is.

The aesthetic mappings, defined with aes(), describe how variables in the dataset are mapped to aesthetics (or visual properties of the plot). This aes(x=Income, y=Electricity, color=Income) is called mapping variables to the visual properties of the chart. This alpha=0.5, size=2 is called setting values to visual properties.

ggplot(worldbankdata2021, aes(x=Income, y=Electricity, color=Income)) + 
  geom_point(alpha=0.5, size=2) +
  scale_color_brewer(palette = "Dark2")
Figure 3: Distribution of Electricity by Income Category

In this geom encyclopedia, you will also learn about the required and optional aesthetics that each geom supports. Additionally, an example is provided demonstrating the application of each geom with reproducible R code.

In summary you will learn:

  1. What each geom does The purpose of different geoms (e.g., points, lines, bars, boxes, tiles) and the types of data and questions they are best suited for.

  2. Required and optional aesthetics Which aesthetics (such as x, y, colour, fill, size, group) are required for each geom and how optional aesthetics change the appearance and meaning of a plot.

  3. Typical use cases When to use each geom—for example, comparing distributions, showing relationships, visualizing trends over time, or displaying uncertainty.

  4. Common variations and set parameters Key arguments (such as stat, position, alpha, width, binwidth) and how they affect the visual output.

  5. Strategies to improve clarity of plots Strategies to overcome the frequent challenges such as overplapping, missing values, highlight trend, etc.

  6. Connections between geoms and statistics How geoms interact with statistical transformations (e.g., geom_histogram() with binning, geom_smooth() with model fitting).

vii. Connections between geom
Similar geoms and different combinations of geoms that can use to visualise data.

What you won’t learn?

This book is not focused on teaching R programming fundamentals or providing a comprehensive guide to data visualization principles. It assumes you already have a basic understanding of R and ggplot2, and it will not cover how to start from scratch in these areas.

How this geom encyclopedia is organized?

As this is an encyclopedia, the chapters are organised according to the alphabetical order. However, within a chapter geoms are not organized according to the alphabetical order. At the beginning of each chapter, I have tabulated the geoms listed under that letter.

Audience

The Figure 4 shows my target audience for the book. In general for all data enthusiasts, this can be considered as a Encyclopedia of Plots. For R, tidyverse, gggplot2 users this can be considered as a Data Visualisation Geometries Encyclopedia.

Figure 4: Target Audience

What prior knowledge is needed to use this geom encyclopedia?

  1. For a complete newcomer to get an idea about the possible charts that can be drawn for a data set.

    None

  2. To get hands-on-experience with the examples provided you need to have following knowledge.

    1. R and RStudio Integrated Development Environment (IDE)

    2. Basics of R programming

    3. tidyverse (Wickham et al. 2019) style of coding

    4. Data wrangling with tidyr(Wickham, Vaughan, and Girlich 2024) and dplyr(Wickham et al. 2023)

    5. Knowledge in using the pipe operator: |>

    6. Knowledge in creating data visualisations using the grammar of graphics frame work implemented in ggplot2(Wickham 2016)

If you want to learn more of them, following are the books recommended:

Acknowledgments

I would like to thank all the package authors and contributors who have developed and shared geoms as R packages.

I would like to thank all the students who took my ASP 460 2.0/STA 492 2.0 Data Visualization course in 2024 for their valuable contributions in exploring geoms with me.

This book was built with Quarto.

Declaration of generative AI in scientific writing

During the preparation of this work the author used ChatGPT in order to improve readability and language. After using this tool/service, the author reviewed and edited the content as needed and take full responsibility for the content of the published article.

Cite

Please use the below text and bibtex for citing the book.

T. S. Talagala, Data Visualisation Geometries Encyclopedia: Geoms in the Grammar of Graphics - All Types of Plots. [Online]. Available: https://thiyangt.github.io/geom.encyclopedia/. [Accessed: Jan. 18, 2025

@online{talagala2024geoms, author = {Thiyanga S. Talagala}, title = {Data Visualisation Geometries Encyclopedia: Geoms in the Grammar of Graphics - All Types of Plots}, year = {2024}, url = {https://thiyangt.github.io/geom.encyclopedia/}, note = {Accessed: 2025-01-18} }

Please use the below text and bibtex for citing the associated R package drone.

Talagala T (2025). _drone: Data for Data Visualisation Geometries Encyclopedia_. R package version 2.0.0, <https://github.com/thiyangt/geom.encyclopedia>..

@Manual{drone, title = {drone: Data for Data Visualisation Geometries Encyclopedia}, author = {Thiyanga S. Talagala}, year = {2025}, note = {R package version 1.0.0, commit 86d4fc19bb3a03da3eeb8f6748cb0bfc21dfdf72}, url = {https://github.com/thiyangt/drone}, }

Colophon

The field of data visualization is dynamic, and new techniques and visualizations may emerge over time. Hence, I will be regularly updating this encyclopedia to ensure it remains a relevant and comprehensive resource for users.