[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] "apple" "orange" "banana"
[[3]]
[1] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
You likely use spreadsheet software (Excel, Google Sheets, LibreOffice) to deal with tables.
You define some named columns, and include data on each row.
A | B | C |
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
7 | 8 | 9 |
For this, R provides data frames.
To start, a data frame is a list of vectors.
Additionally, the vectors must all be the same size.
Data frames also have some important attributes:
names()
or colnames()
row.names()
Use nrow
and ncol
to get the number of rows and columns, respectively, as with matrices.
Use data.frame()
to create a data frame.
Access is similar to matrices. Pass in vectors of indices, logical values, or names (strings).
Using the data frame…
How do we access the object at the 1st row, 2nd column?
Using the data frame…
How do we access the first and second rows?
How do we access the third column?
More examples:
name category
1 orange fruit
2 bok choy vegetable
3 strawberry fruit
name category num_available
1 orange fruit 35
2 bok choy vegetable 20
[1] "fruit" "vegetable" "fruit"
Did you notice something?
If we access a single column, we get a vector. Otherwise, we get a data frame.
$
OperatorAs data frames are lists, you can access its vectors with the $
operator.
head
, tail
To see the first or last n
rows, use head()
or tail()
, respectively. n
defaults to 6.
As our vectors have to be the same size, how do we simulate empty cells in a table?
We use a special data type, NA
(“Not Available”).
data.frame(
names = c("Bob the Builder", "Spongebob", "Darth Vader"),
major = c("Civil Engineering", "Culinary Studies", NA),
age = c(32, NA, 44)
)
names major age
1 Bob the Builder Civil Engineering 32
2 Spongebob Culinary Studies NA
3 Darth Vader <NA> 44
Note that we can freely combine this special type with any other type in a vector.