We’ve already used them extensively. Some examples:
They all:
We’ll eventually encounter a problem that cannot be solved with functions others have written. In this case, we can write our own.
The syntax of a function definition:
function(a0, a1, ..., aN)
<body>
Here, a0, a1, ..., aN
denotes a arbitrary number of named arguments.
Functions are objects like any other in R. To use a function you’ve created by name, you’ll need to assign it. An example:
Observe that we specify a set of named arguments then refer to these values by name in the body of the function.
With this call to example
,
we have
a = 1
,b = 2
,c = 3
,d = 4
,1 + 2 + 3 + 4
.Recall that we can reorder the arguments if we specify their names, i.e., the following is equivalent to the above.
What happened to a + b + c + d
?
The result was returned by the function.
Importantly, all expressions in R evaluate to (return) some value. When you enter an expression in the console, R prints the value.
Functions simply return the value of the expression we’ve written in its body.
Suppose I enter this code into the console.
If the result is the same, why write a function?
To reduce the amount of code you need to write.
Here’s a function that returns the unique values from a sorted vector.
Without the function, you’d need to rewrite this each time you’d like the unique values of a sorted vector!
There’s an important saying in software engineering, Don’t Repeat Yourself! (DRY)
Less code is only one reason of many for DRY. Here’s a few others:
What if you’d like to change the operation (perhaps to fix a mistake)? Without a common function, you’ll need to pinpoint and fix each use of the operation individually. Ah, but after hunting for each of the 17 instances, you happily run your fixed code… only to recoil in terror as the program outputs garbage. You glue your eyes to each of the 17 instances, one at a time, carefully checking what might be wrong. After only a brief 4 hours, you realize that you set a variable to v
when it should have been w
!
What if your program doesn’t behave as expected? You glance at your hundreds (thousands?) of lines of code, clueless of the origin, so scrutinize each line one-by-one. If you wrote functions instead, you could view each in isolation and verify their correctness (write tests!), starting with the functions that don’t depend on other functions and moving upwards.
I could go on, but you probably get the point.
If the result is the same, why write a function?
It makes your code easier to read and write as it makes it more expressive. That is, it helps your code indicate what it is doing, not just how it is doing it.
What does this do?
What does this do?
What about this? Are they equivalent?
These are both awful. Our burden reading and writing this is too great for a task so trivial.
Importantly, the code doesn’t convey what it does. Instead, it’s the reader’s responsibility to make sense of the computations.
Ideally, we’d prefer for our code to say what it does directly. This makes it easier to read and write.
In both cases, you know what it does immediately.
There’s no need to make sense of confusing computations.
If the result is the same, why write a function?1
It helps you reason about – and eventually solve – your problem by breaking it down into smaller problems.
Now back to functions in R!
Recall the form a function:
function(a0, a1, ..., aN)
<body>
What if we’d like to write a function consisting of multiple lines?
For this, we need a block of code (i.e., a number of lines grouped together). We’ll assign this block to <body>
in the function form.
In R, we group code (create blocks) with curly brackets ({ ... }
).
This is similar to C/C++, Java, Javascript, and many other languages.
This is unlike Python, which uses indenting to distinguish blocks.
This is unlike MatLab, which uses the end
keyword to distinguish blocks.
If we have multiple lines of code in brackets, the return value of the entire block will be the last expression (i.e., the last line).
For this reason, we can write,
What is y
?
The last line of a function is therefore its return value.
You may find this unclear. If so, we can make explicit the return value with the return
keyword1.
What can we currently do if we want our code to handle different cases?
For example, suppose we would like to get the season (as a string) given a month (as an integer).
As of now, any options we have are too cumbersome.
Let’s consider another case: the famous FizzBuzz problem.
Here’s an idea:
Oh my, can you make sense of this?
Maybe this?
Are your eyes hurting yet?
What about this?
… Isn’t that terrible? Can you explain how it works?
Clearly we need some better tools for handling cases.
if
Only!Recall that we prefer for our code to be expressive.
Our description of fizzbuzz
included “IF”. Wouldn’t it be nice to write this directly in our code?
if
StatementThe if
statement follows the intuitive structure:
if (<logical>) <body>
where <body>
is executed if <logical>
is true.
Note that the parenthesis are required1.
Maybe we’d like to execute some code if a condition is true, and execute some other code otherwise. We could write,
But this:
conditional
twice. If the first if
failed, we know conditional
is FALSE
.body A
modifies conditional
.if else
StatementInstead, we use the if else
statement:
if (<logical>) <body A> else <body B>
where,
<body A>
is executed if <logical>
is true.<body B>
is executed if <logical>
is false.else if
StatementWhen we need to account for more cases than just two (true or false), we can use an else if
statement:
if (<logical 1>)
<body 1>
else if (<logical 2>)
<body 2>
else if (<logical 3>)
<body 3> ...
else
<body N>
if
statement is required, but the final else
is not.else if
s as you’d like.The curly brackets are optional if the body is one line.
Note that R requires else/else if
be placed after the closing curly bracket (}
).
This is not allowed.
You may see these statement referred to as,
which_season
Let’s rewrite our which_season
function:
which_season
Now with control flow statements.
fizzbuzz
Let’s rewrite our fizzbuzz
function:
fizzbuzz
Now with control flow statements. Pick your poison!
if else
, ifelse()
We can use the function ifelse
to apply the if else
operation to a vector. Its function signature is
where,
test
is a vector of logical values.yes
is the value to be placed in the result vector if the corresponding logical value is true.no
is the value to be placed in the result vector if the corresponding logical values is false.ifelse()
ExampleFunctions can call themselves.
Functions can return functions.
Functions can be passed as arguments to other functions.
That is, we can treat functions just like any other object in R. This includes assignment.
We can stop execution of a function with the stop
function.
We can issue a warning (but continue execution) with the warning
function.