Lab 04: Tabular Data Files

Files?

We can use R in isolation…

  • But it’s much more convenient if we can save our results (data) and share them with other applications.
  • Or use our results (data) from other applications, and manipulate them in R.

For this, we need files.

File Formats

What is a file?

For our purposes, just think of a file as a blob of saved text.

Suppose we wanted to represent a complex structure in a file.

An Agreed on Format

To do this, we need to agree on some convention (format) of representing the structure in plain text.

File Formats: Table Representation

What if we wanted to represent a table?

In markdown:

| Some Column | Another Column | Yet Another |
|-------------|----------------|-------------|
|     3       |    "hello"     |    TRUE     |
|     17      |   "goodbye"    |   FALSE     |

Any issues with this?

Reading Tables (General)

The general function for importing tables from files is read.table().

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = FALSE,
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
  • file: The file to read from
  • header: Does the file have a header for the column names?
  • sep: What separates the data (e.g., ',' for .csv files)?

Reading CSV Files

To avoid writing passing read.table() arguments to correctly parse .csv files, we have a convenience function:

read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

And for other common formats:

read.csv2(file, header = TRUE, sep = ";", quote = "\"",
          dec = ",", fill = TRUE, comment.char = "", ...)

read.delim(file, header = TRUE, sep = "\t", quote = "\"",
           dec = ".", fill = TRUE, comment.char = "", ...)

read.delim2(file, header = TRUE, sep = "\t", quote = "\"",
            dec = ",", fill = TRUE, comment.char = "", ...)

Making Sense of read.table()

Suppose we have a file starwars.csv including the following contents.

"name","height","mass"
"Luke Skywalker",172,77
"C-3PO",167,75
"R2-D2",96,32
"Darth Vader",202,136
...
read.table(
  # What arguments do we pass?
)

read.table() Solution

Suppose we have a file starwars.csv including the following contents.

"name","height","mass"
"Luke Skywalker",172,77
"C-3PO",167,75
"R2-D2",96,32
"Darth Vader",202,136
...
read.table(
  file = "starwars.csv",
  header = TRUE,
  sep = ",",
  colClasses = c("character", "double", "double")
)

What if our values were separated with ';' instead?

Writing to Files

We have write.table() for writing tables to files.

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            eol = "\n", na = "NA", dec = ".", row.names = TRUE,
            col.names = TRUE, qmethod = c("escape", "double"),
            fileEncoding = "")

And for convenience,

write.csv(...)
write.csv2(...)

readr

Alternatively, there is a package readr which provides similar functionality, but, among other benefits,

  • is 10x-100x faster1.
  • has more consistent naming conventions.

Provides read_csv(), read_tsv(), read_delim(), read_fwf(), read_table(), and read_log().

Saving R Output

If we want to save the output of our program to a file, we use sink().

sink("my-output.txt")  # We specify the file to redirect our output to
print("Hello, World!")  
3 + 2
sink() # After this call, our output will be printed as usual

The output of our program was not printed to the console, but redirected to the file my-output.txt, which now contains:

[1] "Hello World!"
[1] 5

Global State is Bad… Try withr

Avoid manually changing global state with the library withr.

library(withr) # Make sure to install first

# Calls sink("my-output.txt"), executes the code block, then sink()
withr::with_output_sink("my-output.txt", {
    print("Hello, World!")
    print(3 + 2)  # We now need to specify print explicitly 
})

Note that it seems withr requires we explicitly call print on our desired output, unlike regular sink().