Lab 08: Functions and Conditionals

Functions?

We’ve already used them extensively. Some examples:

v <- c(1:5, 10:20)                   # create a vector
v_mean <- mean(v)                    # take its mean 
df <- data.frame(x = 1:10, y = 10:1) # create a data frame
df <- filter(df, x > 5)              # subset it
plot <- ggplot(df, aes(x, y)) +      # plot it
    geom_point()                     

They all:

  • Take in a (possibly empty) set of arguments.
  • output something or at the very least perform an operation.

Function Syntax

We’ll eventually encounter a problem that cannot be solved with functions others have written. In this case, we can write our own.

The syntax of a function definition:

function(a0, a1, ..., aN) <body>

Here, a0, a1, ..., aN denotes a arbitrary number of named arguments.

Functions are Objects!

Functions are objects like any other in R. To use a function you’ve created by name, you’ll need to assign it. An example:

say_hi <- function(name) paste("Hi", name)
say_hi("Mr. Sandman") # usage
[1] "Hi Mr. Sandman"

Function Syntax

Observe that we specify a set of named arguments then refer to these values by name in the body of the function.

example <- function(a, b, c, d) a + b + c + d

With this call to example,

example(1, 2, 3, 4)

we have

  • a = 1,
  • b = 2,
  • c = 3,
  • d = 4,
  • and the result is 1 + 2 + 3 + 4.

Recall that we can reorder the arguments if we specify their names, i.e., the following is equivalent to the above.

example(d = 4, a = 1, b = 2, c = 3)

Return Values

What happened to a + b + c + d?

The result was returned by the function.

x <- example(1, 2, 3, 4)

Here the result, 10, is returned. We call this the return value. In this case, we’ve assigned it to x.

Return Values are Expressions

Importantly, all expressions in R evaluate to (return) some value. When you enter an expression in the console, R prints the value.

3 + 2
[1] 5
filter(data.frame(x = 1:10, y = 1:10), x > 5)
   x  y
1  6  6
2  7  7
3  8  8
4  9  9
5 10 10

Functions simply return the value of the expression we’ve written in its body.

The Equivalent

Suppose I enter this code into the console.

a <- 1
b <- 2
c <- 3
d <- 4
a + b + c + d
[1] 10

The last line is the expression, which evaluates to (returns) 10.

This is equivalent to,

example <- function(a, b, c, d) a + b + c + d
example(1, 2, 3, 4)
[1] 10

It’s ultimately the same expression being evaluated.

What’s the point?

If the result is the same, why write a function?

To reduce the amount of code you need to write.

Here’s a function that returns the unique values from a sorted vector.

unique <- function(vec, binary_pred = `==`) {
    prev <- NA; u <- c()
    for (obj in vec) {
        if (is.na(prev) || !binary_pred(obj, prev)) {
            u <- append(u, obj)
        }
        prev <- obj
    }
    u
}

Without the function, you’d need to rewrite this each time you’d like the unique values of a sorted vector!

DRY!

There’s an important saying in software engineering, Don’t Repeat Yourself! (DRY)

Less code is only one reason of many for DRY. Here’s a few others:

  • What if you’d like to change the operation (perhaps to fix a mistake)? Without a common function, you’ll need to pinpoint and fix each use of the operation individually. Ah, but after hunting for each of the 17 instances, you happily run your fixed code… only to recoil in terror as the program outputs garbage. You glue your eyes to each of the 17 instances, one at a time, carefully checking what might be wrong. After only a brief 4 hours, you realize that you set a variable to v when it should have been w!

  • What if your program doesn’t behave as expected? You glance at your hundreds (thousands?) of lines of code, clueless of the origin, so scrutinize each line one-by-one. If you wrote functions instead, you could view each in isolation and verify their correctness (write tests!), starting with the functions that don’t depend on other functions and moving upwards.

  • I could go on, but you probably get the point.

What’s the point?

If the result is the same, why write a function?

It makes your code easier to read and write as it makes it more expressive. That is, it helps your code indicate what it is doing, not just how it is doing it.

What does this do?

sw_males <- starwars[
    starwars$sex == "male",
    !(names(starwars) %in% c("sex", "gender"))
]

Here’s a more expressive version using dplyr.

sw_males <- starwars |>
    filter(sex == "male") |>
    select(!c(sex, gender))

Expressiveness

What does this do?

heights <- starwars$heights
length(heights[heights > 150]) > 0
[1] FALSE

What about this? Are they equivalent?

sum(as.numeric(heights > 150), na.rm = TRUE) > 0
[1] FALSE

These are both awful. Our burden reading and writing this is too great for a task so trivial.

Importantly, the code doesn’t convey what it does. Instead, it’s the reader’s responsibility to make sense of the computations.

Expressiveness

Ideally, we’d prefer for our code to say what it does directly. This makes it easier to read and write.

any(heights > 100)
[1] FALSE
all(heights > 20, na.rm = TRUE)
[1] TRUE

In both cases, you know what it does immediately.

There’s no need to make sense of confusing computations.

What’s the point?

If the result is the same, why write a function?1

It helps you reason about – and eventually solve – your problem by breaking it down into smaller problems.

Now back to functions in R!

Blocks

Recall the form a function:

function(a0, a1, ..., aN) <body>

Recall our simple example:

example <- function(a, b, c, d) a + b + c + d

What if we’d like to write a function consisting of multiple lines?

For this, we need a block of code (i.e., a number of lines grouped together). We’ll assign this block to <body> in the function form.

Brackets

In R, we group code (create blocks) with curly brackets ({ ... }).

This is similar to C/C++, Java, Javascript, and many other languages.

This is unlike Python, which uses indenting to distinguish blocks.

This is unlike MatLab, which uses the end keyword to distinguish blocks.

shout_hey <- function(name) {
  name <- toupper(name)
  paste0("HEY ", name, "!")
}
shout_hey("Mr. Tambourine Man")
[1] "HEY MR. TAMBOURINE MAN!"

Brackets

If we have multiple lines of code in brackets, the return value of the entire block will be the last expression (i.e., the last line).

For this reason, we can write,

x <- 10
y <- {
    a <- x * x
    z <- a - 10
    z * 2
}

What is y?

Brackets: What is Returned?

The last line of a function is therefore its return value.

square_add_five <- function(x) {
  x <- x * x
  x + 5
}
square_add_five(5)
[1] 30

Explicitly Return

You may find this unclear. If so, we can make explicit the return value with the return keyword1.

square_add_five <- function(x) {
  x_squared <- x * x
  return(x_squared + 5)
}
square_add_five(5)
[1] 30

Case by Case

What can we currently do if we want our code to handle different cases?

For example, suppose we would like to get the season (as a string) given a month (as an integer).

which_season <- function(month) {
   # What should we write here?
}

Case by Case

As of now, any options we have are too cumbersome.

which_season <- function(month) {
  c(rep("Winter", 2), 
    rep("Spring", 3), 
    rep("Summer", 3), 
    rep("Fall", 3), 
    "Winter")[month]
}
which_season(10)
[1] "Fall"

Fizzing, Buzzing, and FizzBuzzing

Let’s consider another case: the famous FizzBuzz problem.

Given a number n, return:

  • "Fizz" if it is divisible by 3.
  • "Buzz" if it is divisible by 5.
  • "FizzBuzz" if it is divisible by 3 and 5.
  • The number n as a string otherwise.
fizzbuzz <- function(n) {
  # What should we write here? 
}

Fizzing, Buzzing, and FizzBuzzing

Here’s an idea:

fizzbuzz <- function(n) {
  c(as.character(n),
    "Fizz",
    "Buzz",
    "FizzBuzz")[((n %% 3 == 0) + (2 * (n %% 5 == 0))) + 1]
}

Oh my, can you make sense of this?

Fizzing, Buzzing, and FizzBuzzing

Maybe this?

fizzbuzz <- function(n) {
  c("FizzBuzz",
    "Fizz",
    "Buzz",
    as.character(n))[
      c(n %% 3 == 0 && n %% 5 == 0, n %% 3 == 0, n %% 5 == 0, TRUE)
    ][1]
}

Are your eyes hurting yet?

Fizzing, Buzzing, and FizzBuzzing

What about this?

fizzbuzz <- function(n) {
  c("FizzBuzz",
    "Fizz",
    "Buzz",
    as.character(n))[
      c(n %% 3 == 0 && n %% 5 == 0, n %% 3 == 0, n %% 5 == 0, TRUE) |> 
        which() |> 
        min()
      ]
}

… Isn’t that terrible? Can you explain how it works?

Clearly we need some better tools for handling cases.

Oh, if Only!

Recall that we prefer for our code to be expressive.

Our description of fizzbuzz included “IF”. Wouldn’t it be nice to write this directly in our code?

We can!

x <- 10
if (x >= 10)
  x <- 5
x
[1] 5

The if Statement

The if statement follows the intuitive structure:

if (<logical>) <body>

where <body> is executed if <logical> is true.

Note that the parenthesis are required1.

Not This, That!

Maybe we’d like to execute some code if a condition is true, and execute some other code otherwise. We could write,

if (conditional) {
  # body A
}
if (!conditional) {
  # body B
}

But this:

  • wastefully evaluates conditional twice. If the first if failed, we know conditional is FALSE.
  • is not very expressive.
  • won’t work as intended if body A modifies conditional.

The if else Statement

Instead, we use the if else statement:

if (<logical>) <body A> else <body B>

where,

  • <body A> is executed if <logical> is true.
  • <body B> is executed if <logical> is false.
can_i_drive <- function(age) {
  if (age < 16) 
    "No, you're too young."
  else
    "If you have a license."
}
can_i_drive(21)
[1] "If you have a license."

The else if Statement

When we need to account for more cases than just two (true or false), we can use an else if statement:

if (<logical 1>) <body 1>

else if (<logical 2>) <body 2>

else if (<logical 3>) <body 3> ...

else <body N>

  • Only one body will be evaluated.
  • The initial if statement is required, but the final else is not.
  • You may include as many else ifs as you’d like.

A Note on Style

The curly brackets are optional if the body is one line.

  • Some prefer you always use curly brackets.
  • Regardless, it is recommended you use curly brackets for all bodies if at least one uses curly brackets.

Don’t do this.

year <- 2
is_sophomore <- NA
if (year == 2) {
  print("You're a sophomore!")
  is_sophomore <- TRUE
} else
  is_sophomore <- FALSE

Do this instead.

year <- 2
is_sophomore <- NA
if (year == 2) {
  print("You're a sophomore!")
  is_sophomore <- TRUE
} else {
  is_sophomore <- FALSE
}

A Note on Style

Note that R requires else/else if be placed after the closing curly bracket (}).

This is not allowed.

year <- 2
is_sophomore <- NA
if (year == 2) {
  print("You're a sophomore!")
  is_sophomore <- TRUE
} 
else {
  is_sophomore <- FALSE
}

This format is required.

year <- 2
is_sophomore <- NA
if (year == 2) {
  print("You're a sophomore!")
  is_sophomore <- TRUE
} else {
  is_sophomore <- FALSE
}

Control Flow, Branching

You may see these statement referred to as,

  • Control Flow Statements. They control which lines of code are evaluated, which are ignored, and the order in which they’re evaluated.
  • Branching Statements. They may cause the program to “branch” to another line of code (instead of the one immediately after it).

Back to which_season

Let’s rewrite our which_season function:

which_season <- function(month) {
  c(rep("Winter", 2), 
    rep("Spring", 3), 
    rep("Summer", 3), 
    rep("Fall", 3), 
    "Winter")[month]
}

Back to which_season

Now with control flow statements.

which_season <- function(month) {
  if (month < 1 || month > 12)
    NA
  else if (month == 12 || month %in% 1:2)
    "Winter"
  else if (month %in% 3:5)
    "Spring"
  else if (month %in% 6:8)
    "Summer"
  else
    "Fall"
}

Back to fizzbuzz

Let’s rewrite our fizzbuzz function:

fizzbuzz <- function(n) {
  c(as.character(n),
    "Fizz",
    "Buzz",
    "FizzBuzz")[((n %% 3 == 0) + (2 * (n %% 5 == 0))) + 1]
}

Back to fizzbuzz

Now with control flow statements. Pick your poison!

fizzbuzz <- function(n) {
  # This one is what I prefer. 
  s <- character()
  if (n %% 3 == 0) {
    s <- "Fizz"
  }
  if (n %% 5 == 0) {
    s <- paste0(s, "Buzz")
  }
  if (!length(s)) {
    s <- as.character(n)
  }
  s
}
fizzbuzz <- function(n) {
  if (n %% 3 == 0 && n %% 5 == 0) {
    "FizzBuzz"
  }
  else if (n %% 3 == 0) {
    "Fizz"
  }
  else if (n %% 5 == 0) {
    "Buzz"
  }
  else {
    as.character(n)
  }
}
fizzbuzz <- function(n) {
  if (n %% 3 == 0) {
    # To avoid redundant computation. 
    if (n %% 5 == 0)
      "FizzBuzz"
    else
      "Fizz"
  } else if (n %% 5 == 0) {
    "Buzz"
  } else {
    as.character(n)
  }
}
fizzbuzz <- function(n) {
  # To avoid redundant computation. 
  div_3 <- n %% 3 == 0
  div_5 <- n %% 5 == 0
  if (div_3 && div_5) {
    "FizzBuzz"
  } else if (div_3) {
    "Fizz"
  } else if (div_5) {
    "Buzz"
  } else {
    as.character(n)
  }
}

The Vectorized if else, ifelse()

We can use the function ifelse to apply the if else operation to a vector. Its function signature is

ifelse(test, yes, no)

where,

  • test is a vector of logical values.
  • yes is the value to be placed in the result vector if the corresponding logical value is true.
  • no is the value to be placed in the result vector if the corresponding logical values is false.

Simple ifelse() Example

x <- 1:10
ifelse(x %% 2 == 0, "Divisible by 2", "Not divisible by 2")
 [1] "Not divisible by 2" "Divisible by 2"     "Not divisible by 2"
 [4] "Divisible by 2"     "Not divisible by 2" "Divisible by 2"    
 [7] "Not divisible by 2" "Divisible by 2"     "Not divisible by 2"
[10] "Divisible by 2"    

Quick Notes: Recursion

Functions can call themselves.

countdown <- function(from = 10, outcome = "Takeoff!") {
  cat("...")
  if (from > 0) {
    cat(from)
    countdown(from - 1, outcome)
  } 
  else {
    cat(outcome)
  }
}
countdown()
...10...9...8...7...6...5...4...3...2...1...Takeoff!

Quick Notes: Functions Making Functions?

Functions can return functions.

make_adder <- function(a) {
  add_this <- function(b) {
    return(a + b)
  }
}
add_three <- make_adder(3)
add_three(4)
[1] 7
make_greeting <- function(greeting = "Hello,") {
  greet_this <- function(this = "World") {
    cat(greeting, this)
  }
}
say_howdy <- make_greeting("Howdy")
say_howdy("Polly")
Howdy Polly

Quick Notes: Passing Functions to Functions?!

Functions can be passed as arguments to other functions.

apply <- function(f, x) {
  f(x)
}
v <- 1:10
apply(max, v)
[1] 10
apply(mean, v)
[1] 5.5

Quick Notes: Functions are Regular Objects

That is, we can treat functions just like any other object in R. This includes assignment.

a_name <- function() "is merely a mask I wear..."
a_better_name <- a_name
an_even_better_name <- a_better_name
an_even_better_name()
[1] "is merely a mask I wear..."

Quick Notes: Stop!

We can stop execution of a function with the stop function.

go_driving <- function(age) {
  if (age < 16) 
    stop("You're too young to drive.")
  print("Vroom...")
}
go_driving(15)
Error in go_driving(15): You're too young to drive.

Quick Notes: Just a Warning

We can issue a warning (but continue execution) with the warning function.

go_driving <- function(age) {
  if (age < 16) 
    stop("You're too young to drive.")
  else if (age <= 21)
    warning("Your insurance company frowns.")
  print("Vroom...")
}
go_driving(20)
Warning in go_driving(20): Your insurance company frowns.
[1] "Vroom..."