ETC5521 Worksheet Week 4

Assessing significance of patterns

Author

Prof. Di Cook

🎯 Objectives

Practice conducting initial data analyses, and make a start on learning how to assess significance of patterns.

Code
penguins_nona <- penguins |>
  select(bill_len, bill_dep, 
         flipper_len, body_mass, 
         species, sex) |>
  na.omit()
penguins_fit <- lm(body_mass~flipper_len,
                   data=penguins_nona)
tidy(penguins_fit)
glance(penguins_fit)
penguins_m <- augment(penguins_fit, penguins_nona)
#ggplot(penguins_m, aes(x=flipper_len, y=.resid)) +
#  geom_hline(yintercept=1, colour="grey70") +
#  geom_point() +
#  theme(aspect.ratio=1)

🧩 Tasks

Can we believe what we see?

  1. In the previous week’s worksheet we subjectively evaluated the residual plot to determine if the model was a good fit or not. We’ll use randomisation to check any observations we made from the residual plot. The code below makes a lineup of the true plot against plots made with rotation residuals (nulls/good). When you run the code you will get a line decrypt("...."), which you can copy and paste back in to the console window to get the location of the true plot (in case you forgot which it is). Does the true plot look like the null plots? If not, describe how it differs.
Code
ggplot(lineup(null_lm(body_mass~flipper_len, method="rotate"),
              penguins_m),
       aes(x=flipper_len, y=.resid)) +
  geom_point() +
  facet_wrap(~.sample, ncol=5) +
  theme_void() +
  theme(axis.text = element_blank(), 
        panel.border = element_rect(fill=NA, colour="black"))

# Alternatively, we can use permutation
ggplot(lineup(null_permute("flipper_len"),
              penguins_m),
       aes(x=flipper_len, y=.resid)) +
  geom_point() +
  facet_wrap(~.sample, ncol=5) +
  theme_void() +
  theme(axis.text = element_blank(), 
        panel.border = element_rect(fill=NA, colour="black"))

ggplot(penguins_m, aes(x=flipper_len, y=.resid, colour=species)) +
  geom_hline(yintercept=1, colour="grey70") +
  geom_point() +
  theme(aspect.ratio=1)
  1. Pick one group, males or females, and one of Adelie, Chinstrap or Gentoo, and choose two of the four measurements. Fit a linear model, and do a lineup of the residuals. Can you tell which is the true plot? Show your lineup to your tutorial partner or someone else nearby and ask them
  • to pick the plot that is most different.
  • explain why they picked that plot.

Using your decrypt() code locate the true plot. Is the true plot different from the nulls?

Did you or your friend choose the data plot? Was it identifiable from the lineup or indistinguishable from the null plots?