library(ggplot2)
ggplot()
2 Introduction
2.1 What is data visualisation?
Data visualization is the graphical representation of data to understand the patterns, trends, relationships, outliers and complex structures hidden inside data more easily. To demonstrate the concept, I use a simple dataset. This dataset includes 30 rows and two variables. Looking at the numerical figures makes it difficult to find patterns in the data. Figure 2 is a visual representation of the same dataset using an individual value plot. The individual value plot shows each observation as a single point. The individual value plot presents insights at a glance. We can see the outline behaviour of one observation in “L2”. The dispersion of data in L2 and L3 are higher than L1. We can immediately see the outline behavior of one observation in “L2”. Moreover, it is immediately apparent that the dispersion of data in L2 and L3 is slightly higher than in L1. In other words data visualization allows data to speak for itself in a way that is easily understandable to humans. It is like giving a voice to the data, enabling us to listen and understand its story more effectively.
2.2 What did we do in making a graph/plot?
In making a graph (sometimes we call plot), what we are doing is we are mapping variables to graphical properties on a cartesian plan. and then represent data using a suitable geometry. I will break down the steps as below:
Step 1: Obtain ingredients to make a plot. There are two main ingredients: i) canvas to draw a plot. ii) data to plot on the canvas
Let’s first obtain a canvas using the ggplot2 package.
Second, the data that we are going to draw the plot. Here, I use worldbankdata
available in the package drone.
# A tibble: 7,937 × 7
Country Code Region Year Cooking Electricity Income
<fct> <fct> <fct> <dbl> <dbl> <dbl> <fct>
1 Aruba ABW Latin America & Caribbean 1990 NA 100 H
2 Aruba ABW Latin America & Caribbean 2000 NA 91.7 H
3 Aruba ABW Latin America & Caribbean 2013 NA 100 H
4 Aruba ABW Latin America & Caribbean 2014 NA 100 H
5 Aruba ABW Latin America & Caribbean 2015 NA 100 H
6 Aruba ABW Latin America & Caribbean 2016 NA 100 H
7 Aruba ABW Latin America & Caribbean 2017 NA 100 H
8 Aruba ABW Latin America & Caribbean 2018 NA 100 H
9 Aruba ABW Latin America & Caribbean 2019 NA 100 H
10 Aruba ABW Latin America & Caribbean 2020 NA 100 H
# ℹ 7,927 more rows
Step 2: Map the variables to the graphical properties.
Step 3: Plot the data.
2.3 What is geom
?
Below are four methods that I used to visualise the distribution of var1 by var 2.
2.4 Data
library(drone)
library(tibble)
data(worldbankdata)
worldbankdata
# A tibble: 7,937 × 7
Country Code Region Year Cooking Electricity Income
<fct> <fct> <fct> <dbl> <dbl> <dbl> <fct>
1 Aruba ABW Latin America & Caribbean 1990 NA 100 H
2 Aruba ABW Latin America & Caribbean 2000 NA 91.7 H
3 Aruba ABW Latin America & Caribbean 2013 NA 100 H
4 Aruba ABW Latin America & Caribbean 2014 NA 100 H
5 Aruba ABW Latin America & Caribbean 2015 NA 100 H
6 Aruba ABW Latin America & Caribbean 2016 NA 100 H
7 Aruba ABW Latin America & Caribbean 2017 NA 100 H
8 Aruba ABW Latin America & Caribbean 2018 NA 100 H
9 Aruba ABW Latin America & Caribbean 2019 NA 100 H
10 Aruba ABW Latin America & Caribbean 2020 NA 100 H
# ℹ 7,927 more rows
2.5 Data description
library(tidyverse)
library(visdat)
vis_dat(worldbankdata) +
scale_fill_brewer(palette = "Dark2")
library(naniar)
gg_miss_upset(worldbankdata)
2.6 Packages use for data wrangiling and |> operator
2.7 R packages with geom implementation
ggplot2 (Wickham 2016)
ggpattern (FC, Davis, and ggplot2 authors 2023)
ggforce (Pedersen 2022)
ggalluvial (ggalluvial?)
ggbump (Sjoberg 2020)
ggridges (Wilke 2023)
ggalt (Rudis, Bolker, and Schulz 2017)