Lab 9: People’s Park II & MythBusters

The Data

People’s Park

We will again be studying the People’s Park issue and using the ppk dataset in the stat20data package to do so! For more context, see last week’s lab.

MythBusters

During the episode “Is Yawning Contagious?” (2005), the MythBusters attempted to see if yawning was contagious by having two groups of people – those who were first yawned at and those who were not – sit inside of a booth and do nothing. The action of yawning at someone before they entered the booth was called the “seeded yawn” (or providing a “stimulus”). The MythBusters then compared the proportions of people who yawned while in the booth between the two groups. The experiment was actually performed in the Bay Area! Make sure you watch the video from the episode here before starting your work, as you’ll need it to help answer a question at the end of the lab. Most likely, you watched it during class!

In this lab, you will formalize the activity that you completed in-class by performing a hypothesis test. The data from the experiment is available in the yawn dataset within the stat20data package.

Questions

Question 1 (People’s Park)

part a

Create a boxplot of the variable corresponding to Question 15 on the Google form survey.

part b

Based on the visualization, state an appropriate measure of center for this variable.

part c

Using the infer library, calculate a 95% bootstrapped confidence interval for the population version of the appropriate measure of center for Question 15. Use 5000 bootstrapped statistics to create the interval.

part d

Interpret this interval in the context of the problem.

Question 2 (People’s Park)

As in Question 7, part b of last week’s lab, add a new column to the ppk data frame called support_after that takes the response data (in text form) from Question 21 on the Google Form and returns TRUE for answers of "Very strongly support", "Strongly support", and "Somewhat support" and FALSE otherwise. Recall that you can use the %in% operator to check if the response is one of these three values. You may re-use the code that you submitted for last week.

Suppose we are interested in estimating the proportion of all Berkeley students who would have supported the People’s Park housing project after being exposed to more information about it.

part a

Calculate a point estimate of the proportion described above and save it into q2a.

part b

Using the infer library, generate 5000 bootstrap samples of the survey respondents (specifically, their responses in the support_after column) and save this into q2b.

part c

Using the infer library and q2b, create a bootstrap sampling distribution of the sample proportion. Save this into q2c.

part d

Using the infer library, visualize the bootstrap sampling distribution in q2c.

part e

Using the infer library, calculate a 95 percent bootstrapped percentile confidence interval for the population proportion of students who would have supported the housing project after hearing more information.

part f

Interpret the interval you calculated in part e in the context of the problem.

part g

Does the interval contain 0.5? What is the significance of this number, and what implications does containing or not containing 0.5 have for those working in the Chancellor’s office? Answer in at least two sentences.

Question 3 (MythBusters)

part a

What is the unit of observation in the yawn data frame?

part b

  • Visualize the association between a participant receiving the stimulus and whether or not that participant ended up yawning using an appropriate plot type. Hint: what type of variables are involved in this visualization? What plot type(s) are therefore appropriate? Go back to our notes from the beginning of the course to refresh your memory.

  • Describe what you see in at least one sentence.


The MythBusters concluded that yawning was contagious based off of their results. In this lab, we would like to apply some statistical rigor to the situation. Using their data, we will test the hypothesis that yawning is not contagious against the hypothesis that it is contagious.

part c

Write down mathematical equations for the null and alternative hypotheses.

part d

Using the infer library, calculate the observed test statistic for the hypothesis test (the sample version of the parameter you identified in your mathematical equations for the hypotheses).

part e

Using the infer library, generate 9 simulations of the MythBusters experiment under the null hypothesis and save it into q3e.

part f

  • Use q3e to visualize the results of the simulated experiments using ggplot2 and the facet_wrap() layer.

part g

  • Do the plots you made in part f look like the one you made in part b ? What does this say about the degree of consistency that the null hypothesis has with the data the MythBusters found? Answer in at least two sentences.

part h

Using the infer library:

  • Generate 5000 simulations of the MythBusters experiment under the null hypothesis. Save this into q3h.

  • With q3h, calculate and visualize the null distribution of 5000 simulated test statistics.

part i

  • Based on the plot of part h, does the test support the null or alternative hypothesis? Answer and explain in at least one sentence.

part j

Do you still believe yawning is contagious after the results of this hypothesis test? Answer and explain in at least two sentences.