c(1996, 1998, 2000, 2005)
[1] 1996 1998 2000 2005
There are five main data structures in R. They are:
vectors
matrix
array
data frame
list
One dimensional data object.
Homogeneous data structure. That means data in a vector must only be one type or mode (numeric, character, or logical). You cannot mix different types of data. If you try to mix different types of data, R will automatically convert them into one type.
Vectors can be made in four primary ways. They are
using c()
function
using :
function
using seq
function
using rep
function
Methods ii–iv simplify vector creation. They are useful when there is a pattern in data.
c()
syntax:
Example:
The following will create the vector but not assigned a name.
c(1996, 1998, 2000, 2005)
[1] 1996 1998 2000 2005
Assigning a name to vector:
The advantage of assigning a name is that we can reuse the same set of values by calling the vector name.
<- c(1996, 1998, 2000, 2005)
a a
[1] 1996 1998 2000 2005
:
The :
function can be used to create a regular decreasing or increasing sequence.
Examples:
1:10
[1] 1 2 3 4 5 6 7 8 9 10
10:1
[1] 10 9 8 7 6 5 4 3 2 1
-0.5:10
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
-0.3:10
[1] -0.3 0.7 1.7 2.7 3.7 4.7 5.7 6.7 7.7 8.7 9.7
In all of the above sequences the increment is one. The output will display the numbers only within the range.
seq
seq
function cal also be used for creating regular sequence. With seq
you can control the increment and length of the output.
Example 1
seq(1, 19)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Example 2
seq(1, 19, length.out=8)
[1] 1.000000 3.571429 6.142857 8.714286 11.285714 13.857143 16.428571
[8] 19.000000
Example 3
seq(1, 19, by = 3)
[1] 1 4 7 10 13 16 19
rep
The rep
function can be used if there is a pattern of repetition in the data.
Example 1
The number 8 is repeated three times.
rep(8, 5)
[1] 8 8 8 8 8
Example 2
The sequence 1, 2, 3
is repeated five times.
rep(1:3, times=5)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Example 3
Same as in Example 2 above.
rep(1:3, 5)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Example 4
Each element in the sequence is repeated five times.
rep(1:3, each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Example 5
First, each element is repeated five times. After that, the whole sequence is repeated three times.
rep(1:3, each=5, times=3)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2
[39] 2 2 3 3 3 3 3
Example 6
Same as before. Changing the ordering of each
and time
does not change the output.
rep(1:3, times=3, each=5)
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2
[39] 2 2 3 3 3 3 3
When you try to include different types they will be coerced to the most flexible type.
<- c(1, 3, "GPA", TRUE, 1L)
a typeof(a)
[1] "character"
Explicit coercion means that if we try to convert a data type to another data type intentionally using a specific function. For example,
<- c(3.1, 3.2, 3.7, 5.9)
b b
[1] 3.1 3.2 3.7 5.9
as.integer(b)
[1] 3 3 3 5
Consider the vector below
<- c(1, 2, 3, 4, 5, 6, 7, 8) example.vec
typeof(example.vec)
[1] "double"
class(example.vec)
[1] "numeric"
is.character(example.vec)
[1] FALSE
is.integer(example.vec)
[1] FALSE
is.logical(example.vec)
[1] FALSE
is.double(example.vec)
[1] TRUE
sum(example.vec)
[1] 36
mean(example.vec)
[1] 4.5
summary(example.vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 2.75 4.50 4.50 6.25 8.00
is.na(example.vec)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
There are many more functions that you can use with vectors. We will learn about them in the upcoming chapters.
[1] 1990 1992 1934 1957 1970 2000 2005
[1] 3 6 9 3 6 9 3 6 9 3 6 9 3 6 9
[1] 3 3 3 3 3 6 6 6 6 6 9 9 9 9 9
[1] 3 3 3 3 3 6 6 6 6 6 9 9 9 9 9 3 3 3 3 3 6 6 6 6 6 9 9 9 9 9
[1] 1 4 7 10 13 16 19 22 25 28 31 34
[1] 0.1000000 0.1020202 0.1040404 0.1060606 0.1080808 0.1101010 0.1121212
[8] 0.1141414 0.1161616 0.1181818 0.1202020 0.1222222 0.1242424 0.1262626
[15] 0.1282828 0.1303030 0.1323232 0.1343434 0.1363636 0.1383838 0.1404040
[22] 0.1424242 0.1444444 0.1464646 0.1484848 0.1505051 0.1525253 0.1545455
[29] 0.1565657 0.1585859 0.1606061 0.1626263 0.1646465 0.1666667 0.1686869
[36] 0.1707071 0.1727273 0.1747475 0.1767677 0.1787879 0.1808081 0.1828283
[43] 0.1848485 0.1868687 0.1888889 0.1909091 0.1929293 0.1949495 0.1969697
[50] 0.1989899 0.2010101 0.2030303 0.2050505 0.2070707 0.2090909 0.2111111
[57] 0.2131313 0.2151515 0.2171717 0.2191919 0.2212121 0.2232323 0.2252525
[64] 0.2272727 0.2292929 0.2313131 0.2333333 0.2353535 0.2373737 0.2393939
[71] 0.2414141 0.2434343 0.2454545 0.2474747 0.2494949 0.2515152 0.2535354
[78] 0.2555556 0.2575758 0.2595960 0.2616162 0.2636364 0.2656566 0.2676768
[85] 0.2696970 0.2717172 0.2737374 0.2757576 0.2777778 0.2797980 0.2818182
[92] 0.2838384 0.2858586 0.2878788 0.2898990 0.2919192 0.2939394 0.2959596
[99] 0.2979798 0.3000000
[1] -0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
[26] 52 54 56 58 60 62 64 66 68 70 72
typeof()
function to check the R storage mode of the following vectors and class()
to check the class type of the vector.<- c(TRUE, FALSE, TRUE, FALSE)
logical_vector <- c(1L, 2L, 3L, 4L)
integer_vector <- c(1.1, 2.2, 3.3, 4.4)
double_vector <- c(1+1i, 2+2i, 3+3i, 4+4i)
complex_vector <- c("a", "b", "c", "d")
character_vector <- NULL
null_vector <- 1996:2006
time_data <- ts(1996:2006) time_series_data
Create the vector (3, 3, 3, . . . 3, 6, 6, . . . 6, 9, 9, 9, . . . 9), where there are 10 occurrences of 3, 20 occurrences of 6 and 30 occurrences of 9.
Find the value of the following expression.
\(\sum_{i=1}^{100}i\)
\(\sum_{i=1}^{100}i^2\)
Generate a sequence using the code seq(from=1, to=10, by=1). What other ways can you generate the same sequence?
Create a vector to hold population values, and label each element with the corresponding province name. The plot will display population values when hovered over.