Lab 4: Flights II
No Templates this week
- From this week onward, we will not be creating lab templates for you. You will need to create your own .qmd files, as well as load in any neccesary packages that you may need to answer the quesions!
The Data
We will be answering more questions using the flights data frame located in the stat20data package!
Question 1
Mutate a new column onto the original flights data frame which, for each flight, contains the average speed of the plane, measured in miles per hour. Call the new column avg_speed, and save the resulting data frame back into the object flights.
Hint: Look through the column names or the help file to find variables that can be used to calculate this.
Question 2
In this question, we’ll be focusing on the relationship between distance and average speed.
part a
Create a scatter plot to visualize the above relationship. Place distance on the x axis and average speed on the y axis. Label your axes.
part b
Describe the relationship between the two variables that you see. Specifically, comment on the direction, shape, and strength of the association.
part c
Add code to title your plot with a brief summary of what you wrote in part b.
part d
Write dplyr code to calculate the correlation coefficient between average speed and distance.
part e
Based on your answers to part c and part d, do you believe it is appropriate to fit a linear model explaining average speed with distance? Explain in one to two sentences.
Question 3
In this question, we’ll be focusing on the relationship between departure delay and arrival delay.
part a
Create a scatter plot to visualize this relationship. Place departure delay on the x axis and arrival delay on the y axis.
part b
Describe the relationship between the two variables that you see. Specifically, comment on the direction, shape, and strength of the association.
part c
Give the plot a title based on what you wrote in part b.
part d
Write dplyr code to calculate the correlation coefficient between arrival delay and departure delay.
part e
Based on your answers to parts a-d, do you believe it is appropriate to fit a linear model explaining arrival delay with departure delay? Explain in one to two sentences.
Question 4
part a
Fit a linear model using lm() to estimate arrival delay using departure delay.
part b
Write out the mathematical form of the linear model based on the output in part a.
Question 5
part a
Fit a linear model using lm() to estimate arrival delay using both departure delay and distance.
part b
Write out the mathematical form of the linear model based on the output in part a.
Question 6
part a
Create a new data frame from flights with the following columns:
carrier
arrival delay
departure delay
distance
the residuals of the linear model you fit in Question 5.
part b
Which flight carrier had the smallest residuals, on average?
Write dplyr code and then use it to answer this question in one sentence.
Last Question
Will you ensure that your submission to Gradescope…
- is of a pdf generated from a qmd file,
- has all of your code visible to readers,
- and assigns each of the questions to all pages that show your work for that question?
(This one is easy! Just answer “yes” or “no”)