Hypothesis Testing

Measuring the consistency between a model and data

The Ideas in Code

A hypothesis test using permutation can be implemented by introducing one new step into the process used for calculating a bootstrap interval. The key distinction is that in a hypothesis test the researchers puts forth a model for how the data could be generated. That is the role of hypothesize().

hypothesize()

A function to place before generate() in an infer pipeline where you can specify a null model under which to generate data. The one necessary argument is

null: the null hypothesis. Options include "independence" and "point".

The following example implements a permutation test under the null hypothesis that there is no relationship between the body mass of penguins and their

penguins |>
  specify(response = body_mass_g,
          explanatory = sex) |>
  hypothesize(null = "independence")

Response: body_mass_g (numeric)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 333 × 2
   body_mass_g sex   
         <dbl> <fct> 
 1        3750 male  
 2        3800 female
 3        3250 female
 4        3450 female
 5        3650 male  
 6        3625 female
 7        4675 male  
 8        3200 female
 9        3800 male  
10        4400 male  
# ℹ 323 more rows

Observe:

The output is the original data frame with new information appended to describe what the null hypothesis is for this data set.
There are other forms of hypothesis tests that you will see involving a "point" null hypothesis. Those require adding additional arguments to hypothesize().

Calculating an observed statistic

Let’s say for this example you select as your test statistic a difference in means, \(\bar{x}_{female} - \bar{x}_{male}\). While you can use tools you know - group_by() and summarize() to calculate this statistic, you can also recycle much of the code that you’ll use to build the null distribution with infer.

obs_stat <- penguins |>
  specify(response = body_mass_g,
          explanatory = sex) |>
  calculate(stat = "diff in means")

obs_stat

Response: body_mass_g (numeric)
Explanatory: sex (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 -683.

Calculating the null distribution

To generate a null distribution of the kind of differences in means that you’d observe in a world where body mass had nothing to do with sex, just add the hypothesis with hypothesize() and the generation mechanism with generate().

null <- penguins |>
  specify(response = body_mass_g,
          explanatory = sex) |>
  hypothesize(null = "independence") |>
  generate(reps = 500, type = "permute") |>
  calculate(stat = "diff in means")

null

Response: body_mass_g (numeric)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 500 × 2
   replicate     stat
       <int>    <dbl>
 1         1   95.0  
 2         2  -60.5  
 3         3  -93.6  
 4         4 -132.   
 5         5  -46.7  
 6         6   65.6  
 7         7   18.4  
 8         8   -0.473
 9         9  150.   
10        10   54.8  
# ℹ 490 more rows

Observe:

The output data frame has reps rows and 2 columns: one indicating the replicate and the other with the statistic (a difference in means).

`visualize()`

Once you have a collection of test statistics under the null hypothesis saved as null, it can be useful to visualize that approximation of the null distribution. For that, use the function visualize().

penguins |>
  specify(response = body_mass_g,
          explanatory = sex) |>
  hypothesize(null = "independence") |>
  generate(reps = 9, type = "permute") |>
  calculate(stat = "diff in means")

Response: body_mass_g (numeric)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 9 × 2
  replicate    stat
      <int>   <dbl>
1         1  200.  
2         2   53.0 
3         3 -118.  
4         4   70.7 
5         5  -21.2 
6         6   91.7 
7         7    4.03
8         8   21.1 
9         9   17.8

null |>
  visualize() +
  shade_p_value(100, direction = "both")

Observe:

visualize() expects a data frame of statistics.
It is a short cut to creating a particular type of ggplot, so like any ggplot, you can add layers to it with ++.
shade_p_value() is a function you can add to shade the part of the null distribution that corresponds to the p-value. The first argument is the observed statistic, which we’ve recorded as 100 here to see the behavior of the function. direction is an argument where you specify if you would like to shade values "less than" or "more than" the observed value, or "both" for a two-tailed p-value.