Randomized Experiments

Principles of experimental design

Ideas in Code

First we create a data frame to store the pdf vs. website data.

format_data <- data.frame(name = c("Evelyn", "Grace", "Juan", "Alex", "Monica", "Sriya"),
                 group = c("pdf", "pdf", "pdf", "website", "website", "website"),
                 understanding = c("deep", "shallow", "deep", "deep", "shallow", 
                                   "shallow"),
                 major = c('Statistics', 'Economics', 'Economics', 'Statistics', 
                           'Economics', 'Statistics'),
                 GPA = c(3.81, 3.63, 3.20, 2.85, 3.19, 3.80),
                 native_speaker = c('Yes','No','Yes','No','Yes','Yes'))

The cobalt package in R contains the function bal.tab to create tables of standardized differences. By passing its output to plot you can create a Love plot. Note that cobalt expect treatment variables to be numeric or logical, so we begin by converting group to the logical variable is_website.

library(cobalt)
format_data <- format_data |>
                mutate(is_website = group == 'website')

bal.tab(is_website ~ major + GPA + native_speaker, data = format_data,
        s.d.denom = 'pooled', binary = 'std') |>
  plot()

Running balance tests uses infer much like we did in the generalization unit. The covariate for which we are testing balance is the response and the treatment is the explanatory variable.

library(infer)
set.seed(2024-3-25)
obs_stat <- format_data |>
   specify(explanatory = group,
          response = GPA) |>
  calculate(stat = "diff in means", order = c("website","pdf"))

null <- format_data |>
  specify(response = GPA,
          explanatory = group) |>
  hypothesize(null = "independence") |>
  generate(reps = 500, type = "permute") |>
  calculate(stat = "diff in means", order = c("website","pdf"))

null |>
  visualize() +
  shade_p_value(obs_stat, direction = 'both')