2  Introduction

2.1 What is data visualisation?

Data visualization is the graphical representation of data to understand the patterns, trends, relationships, outliers and complex structures hidden inside data more easily. In other words data visualization allows data to speak for itself in a way that is easily understandable to humans. It is like giving a voice to the data, enabling us to listen and understand its story more effectively.

2.2 What are Gemetries (geoms) in Data Visualisation?

In data visualization, geometries (or geoms) refer to the visual elements that represent data in a plot. They define the type of chart or shape used to depict data points or relationships. For example, the below figure represents the relationship between income level and percentage of people access to electricity in lower income (L), lower middle income (LM), upper middle income (UM) and higher income (H) countries using 6 different graph times.

Relationship between income level and percentage of people access to electricity in lower incom (L), lower middle income (LM), upper middle income (UM) and higher income (H) countries using 6 different graphs from A to F.

In order to create the above plots, I use the same i) dataset and ii) same cartesian plane (shown below). The differences are the geometries that we use and statistics we visualise on the plots. For example, in chart A: all values are represented using individual points. In chart H: The range is denoted using a blue line and mean is represented using a red dot.

2.3 Data use in the Encyclopedia

library(drone)
library(tibble)
data(worldbankdata)
worldbankdata
# A tibble: 7,937 × 7
   Country Code  Region                     Year Cooking Electricity Income
   <fct>   <fct> <fct>                     <dbl>   <dbl>       <dbl> <fct> 
 1 Aruba   ABW   Latin America & Caribbean  1990      NA       100   H     
 2 Aruba   ABW   Latin America & Caribbean  2000      NA        91.7 H     
 3 Aruba   ABW   Latin America & Caribbean  2013      NA       100   H     
 4 Aruba   ABW   Latin America & Caribbean  2014      NA       100   H     
 5 Aruba   ABW   Latin America & Caribbean  2015      NA       100   H     
 6 Aruba   ABW   Latin America & Caribbean  2016      NA       100   H     
 7 Aruba   ABW   Latin America & Caribbean  2017      NA       100   H     
 8 Aruba   ABW   Latin America & Caribbean  2018      NA       100   H     
 9 Aruba   ABW   Latin America & Caribbean  2019      NA       100   H     
10 Aruba   ABW   Latin America & Caribbean  2020      NA       100   H     
# ℹ 7,927 more rows

2.4 Data description

library(tidyverse)
library(visdat)
vis_dat(worldbankdata) + 
  scale_fill_brewer(palette = "Dark2")

library(naniar)
gg_miss_upset(worldbankdata) 

2.5 Packages use for data wrangiling and |> operator

2.6 R packages with geom implementation

  1. ggplot2 (Wickham 2016)

  2. ggpattern (FC, Davis, and ggplot2 authors 2023)

  3. ggforce (Pedersen 2022)

  4. ggalluvial (ggalluvial?)

  5. ggbump (Sjoberg 2020)

  6. ggridges (Wilke 2023)

  7. ggalt (Rudis, Bolker, and Schulz 2017)

FC, Mike, Trevor L Davis, and ggplot2 authors. 2023. Ggpattern: ’Ggplot2’ Pattern Geoms.
Pedersen, Thomas Lin. 2022. Ggforce: Accelerating ’Ggplot2’. https://CRAN.R-project.org/package=ggforce.
Rudis, Bob, Ben Bolker, and Jan Schulz. 2017. Ggalt: Extra Coordinate Systems, ’Geoms’, Statistical Transformations, Scales and Fonts for ’Ggplot2’. https://CRAN.R-project.org/package=ggalt.
Sjoberg, David. 2020. Ggbump: Bump Chart and Sigmoid Curves. https://CRAN.R-project.org/package=ggbump.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilke, Claus O. 2023. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.