2  Introduction

2.1 What is data visualisation?

Data visualization is the graphical representation of data to understand the patterns, trends, relationships, outliers and complex structures hidden inside data more easily. To demonstrate the concept, I use a simple dataset. This dataset includes 30 rows and two variables. Looking at the numerical figures makes it difficult to find patterns in the data. Figure 2 is a visual representation of the same dataset using an individual value plot. The individual value plot shows each observation as a single point. The individual value plot presents insights at a glance. We can see the outline behaviour of one observation in “L2”. The dispersion of data in L2 and L3 are higher than L1. We can immediately see the outline behavior of one observation in “L2”. Moreover, it is immediately apparent that the dispersion of data in L2 and L3 is slightly higher than in L1. In other words data visualization allows data to speak for itself in a way that is easily understandable to humans. It is like giving a voice to the data, enabling us to listen and understand its story more effectively.

2.2 What did we do in making a graph/plot?

In making a graph (sometimes we call plot), what we are doing is we are mapping variables to graphical properties on a cartesian plan. and then represent data using a suitable geometry. I will break down the steps as below:

Step 1: Obtain ingredients to make a plot. There are two main ingredients: i) canvas to draw a plot. ii) data to plot on the canvas

Let’s first obtain a canvas using the ggplot2 package.

library(ggplot2)
ggplot()

Second, the data that we are going to draw the plot. Here, I use worldbankdata available in the package drone.

# A tibble: 7,937 × 7
   Country Code  Region                     Year Cooking Electricity Income
   <fct>   <fct> <fct>                     <dbl>   <dbl>       <dbl> <fct> 
 1 Aruba   ABW   Latin America & Caribbean  1990      NA       100   H     
 2 Aruba   ABW   Latin America & Caribbean  2000      NA        91.7 H     
 3 Aruba   ABW   Latin America & Caribbean  2013      NA       100   H     
 4 Aruba   ABW   Latin America & Caribbean  2014      NA       100   H     
 5 Aruba   ABW   Latin America & Caribbean  2015      NA       100   H     
 6 Aruba   ABW   Latin America & Caribbean  2016      NA       100   H     
 7 Aruba   ABW   Latin America & Caribbean  2017      NA       100   H     
 8 Aruba   ABW   Latin America & Caribbean  2018      NA       100   H     
 9 Aruba   ABW   Latin America & Caribbean  2019      NA       100   H     
10 Aruba   ABW   Latin America & Caribbean  2020      NA       100   H     
# ℹ 7,927 more rows

Step 2: Map the variables to the graphical properties.

Step 3: Plot the data.

2.3 What is geom?

Below are four methods that I used to visualise the distribution of var1 by var 2.

2.4 Data

library(drone)
library(tibble)
data(worldbankdata)
worldbankdata
# A tibble: 7,937 × 7
   Country Code  Region                     Year Cooking Electricity Income
   <fct>   <fct> <fct>                     <dbl>   <dbl>       <dbl> <fct> 
 1 Aruba   ABW   Latin America & Caribbean  1990      NA       100   H     
 2 Aruba   ABW   Latin America & Caribbean  2000      NA        91.7 H     
 3 Aruba   ABW   Latin America & Caribbean  2013      NA       100   H     
 4 Aruba   ABW   Latin America & Caribbean  2014      NA       100   H     
 5 Aruba   ABW   Latin America & Caribbean  2015      NA       100   H     
 6 Aruba   ABW   Latin America & Caribbean  2016      NA       100   H     
 7 Aruba   ABW   Latin America & Caribbean  2017      NA       100   H     
 8 Aruba   ABW   Latin America & Caribbean  2018      NA       100   H     
 9 Aruba   ABW   Latin America & Caribbean  2019      NA       100   H     
10 Aruba   ABW   Latin America & Caribbean  2020      NA       100   H     
# ℹ 7,927 more rows

2.5 Data description

library(tidyverse)
library(visdat)
vis_dat(worldbankdata) + 
  scale_fill_brewer(palette = "Dark2")

library(naniar)
gg_miss_upset(worldbankdata) 

2.6 Packages use for data wrangiling and |> operator

2.7 R packages with geom implementation

  1. ggplot2 (Wickham 2016)

  2. ggpattern (FC, Davis, and ggplot2 authors 2023)

  3. ggforce (Pedersen 2022)

  4. ggalluvial (ggalluvial?)

  5. ggbump (Sjoberg 2020)

  6. ggridges (Wilke 2023)

  7. ggalt (Rudis, Bolker, and Schulz 2017)

FC, Mike, Trevor L Davis, and ggplot2 authors. 2023. Ggpattern: ’Ggplot2’ Pattern Geoms.
Pedersen, Thomas Lin. 2022. Ggforce: Accelerating ’Ggplot2’. https://CRAN.R-project.org/package=ggforce.
Rudis, Bob, Ben Bolker, and Jan Schulz. 2017. Ggalt: Extra Coordinate Systems, ’Geoms’, Statistical Transformations, Scales and Fonts for ’Ggplot2’. https://CRAN.R-project.org/package=ggalt.
Sjoberg, David. 2020. Ggbump: Bump Chart and Sigmoid Curves. https://CRAN.R-project.org/package=ggbump.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilke, Claus O. 2023. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.