Lab 04: Data Frames

Tables

You likely use spreadsheet software (Excel, Google Sheets, LibreOffice) to deal with tables.

You define some named columns, and include data on each row.

A B C
1 2 3
4 5 6
7 8 9

For this, R provides data frames.

Data Frames: Lists of Vectors

To start, a data frame is a list of vectors.

list(
  1:10,
  c("apple", "orange", "banana"),
  rep(c(TRUE, FALSE, FALSE), 4)
)
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]
[1] "apple"  "orange" "banana"

[[3]]
 [1]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

Is this a data frame?

Data Frames: Additional Constraints

Additionally, the vectors must all be the same size.

list(
  1:10,
  rep("apple", 10),
  rep(c(TRUE, FALSE), 5)
)
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]
 [1] "apple" "apple" "apple" "apple" "apple" "apple" "apple" "apple" "apple"
[10] "apple"

[[3]]
 [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

Data Frames: Attributes

Data frames also have some important attributes:

  • column names, queried with names() or colnames()
  • row names, queried with row.names()

Use nrow and ncol to get the number of rows and columns, respectively, as with matrices.

Data Frames: Creation

Use data.frame() to create a data frame.

food <- data.frame(
  name = c("orange", "bok choy", "strawberry"),
  category = c("fruit", "vegetable", "fruit"),
  num_available = c(35, 20, 12)
)
food
        name  category num_available
1     orange     fruit            35
2   bok choy vegetable            20
3 strawberry     fruit            12

Data Frames: Access

Access is similar to matrices. Pass in vectors of indices, logical values, or names (strings).

Data Frames: Access - Single Elements

Using the data frame…

food
        name  category num_available
1     orange     fruit            35
2   bok choy vegetable            20
3 strawberry     fruit            12

How do we access the object at the 1st row, 2nd column?

food[1, 2]
[1] "fruit"

Data Frames: Access - Rows, Columns

Using the data frame…

food
        name  category num_available
1     orange     fruit            35
2   bok choy vegetable            20
3 strawberry     fruit            12

How do we access the first and second rows?

food[1:2,]
      name  category num_available
1   orange     fruit            35
2 bok choy vegetable            20

How do we access the third column?

food[,3]
[1] 35 20 12

Data Frames: More Access Examples

More examples:

food[,c("name", "category")]
        name  category
1     orange     fruit
2   bok choy vegetable
3 strawberry     fruit
food[food$num_available >= 20,]
      name  category num_available
1   orange     fruit            35
2 bok choy vegetable            20
food[, "category"]
[1] "fruit"     "vegetable" "fruit"    

Did you notice something?

If we access a single column, we get a vector. Otherwise, we get a data frame.

Data Frames: $ Operator

As data frames are lists, you can access its vectors with the $ operator.

food$name
[1] "orange"     "bok choy"   "strawberry"
food$category
[1] "fruit"     "vegetable" "fruit"    
food$num_available
[1] 35 20 12

We can add new columns using the $ operator.

food$for_sale <- c(TRUE, FALSE, FALSE)
food
        name  category num_available for_sale
1     orange     fruit            35     TRUE
2   bok choy vegetable            20    FALSE
3 strawberry     fruit            12    FALSE

Helpful Functions: head, tail

To see the first or last n rows, use head() or tail(), respectively. n defaults to 6.

head(df)
  x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(df, n = 3)
  x y
1 1 1
2 2 2
3 3 3
tail(df)
      x   y
95   95  95
96   96  96
97   97  97
98   98  98
99   99  99
100 100 100
tail(df, n = 3)
      x   y
98   98  98
99   99  99
100 100 100

Missing Values

As our vectors have to be the same size, how do we simulate empty cells in a table?

We use a special data type, NA (“Not Available”).

data.frame(
  names = c("Bob the Builder", "Spongebob", "Darth Vader"),
  major = c("Civil Engineering", "Culinary Studies", NA),
  age   = c(32, NA, 44)
)
            names             major age
1 Bob the Builder Civil Engineering  32
2       Spongebob  Culinary Studies  NA
3     Darth Vader              <NA>  44

Note that we can freely combine this special type with any other type in a vector.