1  Introduction to R and RStudio

1.1 Chapter Roadmap

  1. What is R?

  2. Why learn R?

  3. R and Rstudio

  4. Downloading R and RStudio

  5. Installing R and RStudio

  6. Download and Installing Rtools

  7. Familiarize with RStudio interface

  8. Creating and saving an RStudio project

1.2 What is R?

R is a popular programming language and environment specifically designed for statistical computing, data analysis, and data visualisation. The language designers are Ross Ihaka and Robert Gentleman at the Department of Statistics, University of Auckland, New Zealand. The parent language is S. R is primarily a functional programming language.

1.3 Why learn R?

1. R is a free and open-source software package.

2. R is a powerful software for data analysis and statistical computing.

Following are some steps that need to perform for your data analysis projects.

  1. Import/ Export data
  2. Data cleaning
  3. Data visualisation
  4. Data modelling
  5. Model deployment
  6. Ensuring validity and interpretability of models
  7. Presentation and communication of results

Additionally, ensuring the reproducibility of data analysis and modeling workflows is crucial for enhancing trustworthiness. R has packages to fulfill all these data analysis and modeling needs.

3. R can be utilized for tasks beyond traditional data analysis, modelling and statistical computing

  1. Scientific Writing Tools: R can be used for scientific writing, particularly through the use of packages like knitr rmarkdown, and Quarto. These packages allow you to integrate R code directly into documents alongside text and figures, which is highly useful for reproducible research and automated report generation. These are useful for thesis writing, book writing or any other documentation work. This book “Programming and Data Analysis with R” is written based on Quarto.

  2. Website Development: R can be used developed websites, particularly through the use of packages like knitr rmarkdown, blogdown and Quarto. For example, the website https://hellor.netlify.app/ is written based on blogdown and https://thiyangt.github.io/rprogramming/ is written based on quarto.

  3. Creating Presentations: R can generate dynamic and visually appealing presentations using rmarkdown, xaringan, quarto, etc. These packages enable you to embed R code, plots, and interactive elements directly into presentation slides. Here is an example presentation developed using xaringan https://thiyangt.github.io/whyR2021keynote/#1

  4. Creating Posters: R can be utilized to design scientific posters using packages such as posterdown. These packages provide templates for creating professional-looking posters directly from R Markdown documents. You can include plots, tables, formatted text and graphics in your poster design.

  5. Web Application Development: This capability is particularly useful for developing data-driven tools, simulations, and dashboards that can be accessed through web browsers without the need for users to install additional software.

4. There is a large community of users.

Learning a programming language with a large community of users is particularly important for several reasons:

  1. Support and Resources to learn: This means there are abundant resources available online, including tutorials, documentation, forums, and open-source libraries. When you encounter issues or need guidance, you’re more likely to find solutions quickly due to the active community.

  2. Regular updates and improvements: Popular languages receive regular updates and improvements driven by community feedback and contributions.

  3. Collaboration and Career opportunities: Since, there are many users you are very likely to find good job opportunities and collaboration opportunities.

  4. Networking: Being part of a large programming community allows you to participate in blackthorns, and attend meetups or conferences. One such a R community is “R Ladies”.

1.4 R and RStudio

What is the difference between R and RStudio? R is the programming language that provides the statistical computing capabilities. R Studio is an Integrated Development Environment(IDE) for R. Dr Julia Lowndes illustrates the distinctions between R and RStudio using the analogy as follows:

“If R were an airplane, RStudio would be the airport, providing many, many supporting services that make it easier for you, the pilot, to take off and go to awesome places. Sure, you can fly an airplane without an airport, but having those runways and supporting infrastructure is a game-changer.”

Julie Lowndes (https://jules32.github.io/2016-07-12-Oxford/R_RStudio/, accessed on May 1, 2024)

1.5 Downloading R and RStudio

To download R:

Visit the link https://cran.r-project.org/ to download R. Choose the appropriate version depending on the operating system you are using.

To download RStudio (This can also be used to download R as well):

Visit the link https://posit.co/download/rstudio-desktop/ to download RStudio IDE. Make sure that you are downloading the appropriate version that matches your computer’s operating system. From the same link you can also download the R. Furthermore, RStudio requires a specific version of R. For example “RStudio requires R 3.3.0+”. Make sure you have the correct R version. The version of the R programming language is denoted using a format like “x.x.x”, where each “x” represents a number indicating the major version, minor version, and patch level respectively.

1.6 Installing R and Rstudio

First, you should install R. After installation R, you can install RStudio.

1.6.1 Installing R

Double click on the the downloaded R installer file. This will start the install process. Usually the default options are fine. If you want to watch a step-by-step tutorial on how to install R for Windows, you can watch the video here.

1.6.2 Installing RStudio

You must have R installed before installing RStudio. Double click on the the downloaded RStudio installer file. This will start the install process. Usually the default options are fine. If you want to watch a step-by-step tutorial on how to install R for Windows, you can watch the video here.

1.7 Download and Installing Rtools

Windows users also require additional software known as Rtools, particularly for installing certain packages. Go to the website https://cran.r-project.org/bin/windows/Rtools/ and download the Rtools version that is compatible with your R version.

If you want to watch a step-by-step tutorial on how to install R for Windows, you can watch the video here.

1.8 Familiarize with RStudio interface

Step 1: Double click RStudio icon. This will open an RStudio project window as follows:

Step 2: To obtain the source or script editor use the following steps:

File > New File > R Script

Now we have four window. They are as follows:

  1. Source window: Here you can write your R codes.
  2. Console: This is where you execute your commands to obtain the outputs.
  3. Environment, History, Connections, Tutorial: Out of the different tabs you see here, the most important ones are the environment and the history tab.
    • The Environment pane shows all the objects (like data frames, variables, functions, etc.) that are currently in your R session.

    • The History tab keeps a record of all the commands you have run in the Console.

  4. File, Plots, Packages, Help, Viewer, Presentations:
    • Files allows you to navigate your files.

    • The graphical outputs are displayed here.

    • Allows users to install, update, load, unload packages

    • Help files corresponds to the functions are shown here. Help files provides function descriptions, examples and references for you to learn on your own. To access the help file of a function type “?” followed by the function name . For example, `?ls`.

    • Viewer/ Presentations: Useful when working with RMarkdown or Quarto documentations. We will look at this in Chapter 8.

1.9 Change the appearance of RStudio pane

This is an optional step. To change the appearance, font size, RStudio theme colour follow the steps below:

Step 1: Go to Tools > Global Options. You will get the window below

Step 2: Select Appearance tab

Here, I select the theme to “Cobalt”. Then the appearance of the window will change as below:

1.10 Creating an RStudio project

To create an RStudio project, please follow the following steps

Step 1: File > New Projects

Step 2: Click on the “New Directory” on the following window.

Step 3: Click on “New Project”

Step 4: Give a directory name and a path to save

1.11 To save an R Studio projects

Category 1: If you have created a project using the steps shown in Section 1.10, you can save your R Script files by clicking on the floppy disk icon, as illustrated in the figure below:

Category 2: If you started coding without creating a project and want to save your work, go to File > Save As and follow the steps.

1.12 Exercise

The goal of this exercise is to help you become familiar with the R Studio environment and create and save projects.

  1. Create a new project in the RStudio IDE. Name your project as lesson1.

  2. Select a suitable theme for your RStudio IDE’s user interface.

Help: Navigate to Tools > Global Options > Appearance .

  1. Change the RStudio pane layout as follows:

  1. Create a folder called data inside your lesson1 project folder.

  2. Create another folder called src inside your lesson1 project folder.

  3. Open a script file and save it as exercise1.R inside the src folder.

  4. Type the following commands on exercise1.R and run it on the console. See the changes happening under the “Environment” tab and the “History” tab.

100 + 200
rnorm(100)
grades <- c("A+", "A-", "A", "B", "F")
random.numbers <- rnorm(100)
random.numbers*100
ls()
  1. Close the project by saving the workspace.

  2. Reopen your project by clicking the leason1.Rproj inside your lesson1 folder. Open the .RData file and the .Rhistory file and observe them.

  3. Type the following commands on exercise1.R and run them on the console.

marks <- c(100, 70, 80, 60)
  1. Close the project without saving the workspace.

  2. Reopen the lesson1.Rproj and type ls() on the console, and observe the output. (marks is not listed, but the other objects are available. Why?)

  3. Type the following command in the console to observe changes in the console, environment, history, and Viewer windows. Observe the outputs of the code and gain an understanding of the purpose of each line.

data("iris")
View(iris)
summary(iris)
hist(iris$Sepal.Length)
plot(x=iris$Sepal.Length, y=iris$Sepal.Width) # Method 1
plot(Sepal.Length ~ Sepal.Width, data=iris) # Method 2
plot(x=iris$Sepal.Length, y=iris$Sepal.Width, col=iris$Species) 
plot(Sepal.Length ~ Sepal.Width, data=iris)
plot(Sepal.Length ~ Sepal.Width, pch=16, cex=0.6, data=iris)
plot(Sepal.Length ~ Sepal.Width, pch=16, cex=0.6, data=iris)
plot(Sepal.Length ~ Sepal.Width, col="forestgreen", pch=16, cex=0.6, data=iris)
  1. Type the following code to obtain list of predefined colours.
colours()
  1. Explore what changes the following code do on the last plot that you took.

code chunk 15.1

plot(Sepal.Length ~ Sepal.Width, col="forestgreen", pch=16, cex=0.6, data=iris, main = "Scatterplot Between Sepal Length and Petal Length",
     xlab = "Sepal Length (cm)",
     ylab = "Sepal Width (cm)")

code chunk 15.2

model <- lm(Sepal.Length ~ Sepal.Width, data=iris)
plot(Sepal.Length ~ Sepal.Width, col="forestgreen", pch=16, cex=0.6, data=iris, main = "Scatterplot Between Sepal Length and Petal Length",
     xlab = "Sepal Length (cm)",
     ylab = "Sepal Width (cm)")
abline(model, col="tomato1")
  1. Type the following commands and understand what each line of code is doing. Interpret the outputs.

code chunk 16.1

plot(iris)

code chunk 16.2

plot(~ Petal.Length + Petal.Width + Sepal.Width, data=iris)
  1. Type the following command and open your data folder and see the changes that had occurred.
write.csv(iris, file="data/iris.csv")