Problem Set 1

This problem set will require knowledge of both Lesson 1 and Lesson 2. If you haven’t read over those, go ahead and do that then come back. Or, if you think you got it, carry on!

Basic Math and Logic with R

Store the value 18 in a variable x and the value 23 in the varialbe r_is_cool.

Write an simple expression in your console that adds those two variables
Write an expression to test whether or not x is greater than or equal to r_is_cool
Write an expression to test whether x is less than r_is_cool AND x + 10 is greater than r_is_cool
What is the sum of the following vector: c(TRUE, TRUE, FALSE, TRUE)?

Indexing Vectors

Copy the following into your console:\

what_the_vec <- c(seq.int(from = 8, to = 72, by = 2))

Now, answer the following questions.

What is the 14th value of this vector, what_the_vec?
Store the 2nd, 8th, and 20th value of the vector in a new vector, new_vec
Write an expression to test whether the 2nd value of new_vec is smaller than the first
Remove the first value from the vector new_vec (recall that we can use a ‘-’ sign to remove values) and store the remaining two in new_vec2

Indexing Data Frames

For this exercise, we are going to use the iris dataset, which is pre-loaded into base R. First, we should take a look at it.

What are the dimensions of the iris data set? (How many observations and variables?)
What kind of variables are present? (character, numeric, etc.)

Now that you know what kind of data you are working with, let’s work on finding values within using base R.

Since each column is a vector, we can store them in their own vector, if we’d like. Store the fourth column, Petal.Width in a new variable, pet_width.
Using the mean() function, find the mean of pet_width
What is the second value of the third column of iris?
How many values in the second column, Sepal.Width, are greater than 3.2 (remember that TRUE = 1)?
What is that as a percentage? (recall that nrow() gives the number of rows… so we just need that and the answer to #3)
Using the table() function, what is the count of the various species within our dataset?
There are three different species in our data. Remove “setosa” and store the new data in new_data
What is average Sepal.Length of our new_data
In base R, create a new column for new_data called large_petal_flag. Using the ifelse() function, assign the value 1 to rows with a Petal.Length greater than 4.5, and a value of 0 to those that don’t.
What is the sum of new_data$large_petal_flag? What is this as a proportion? (hint: taking the average of binary variables gives you the proportion if they are 1/0)

Dplyr

Working in base R is not that fun… but it really can come in handy understanding how to do things the hard way. Now let’s switch over to using our dplyr verbs. Remember, to use dplyr, you must have it installed. We will install the entire tidyverse by typing install.packages("tidyverse"). No need to do this if you’ve already done it. If tidyverse is already installed, just type library(tidyverse) and you will have all the tools at your disposal.

Using the iris data set, and our dplyr verbs, do the following…

Create a new column, double_petal, with mutate() that multiplies Petal.Width by 2. Store this new data frame in fancy_iris
Using our filter() command, filter out all values of double_petal less than 2.3 and greater than 4. How many rows are in that data set?
Suppose we want to see the maximum Sepal.Length for each species - how can we do that? (hint: you will need to group by the variable Species and use the summarise verb)

There are a few other verbs that we didn’t cover in our lessons (for the sake of brevity). But some really useful ones to know along the way are pivot_wider and pivot_longer. We will go over these a bit in class but don’t have time to cover them here.