install.packages(c("tidyverse", "datarium", "broom", "nullabor"))
ETC5521 Tutorial 5
Statistical inference for exploratory methods
🎯 Objectives
- Refresh thinking about statistical inference.
- Learn to apply inference for data plots.
🔧 Preparation
The reading for this week is Wickham et al. (2010) Graphical inference for Infovis. It is a basic introduction to inference for exploratory data analysis, especially for data visualisation. - Complete the weekly quiz, before the deadline! - Make sure you have this list of R packages installed:
- Open your RStudio Project for this unit, (the one you created in week 1,
ETC5521
). Create a.qmd
document for this weeks activities.
📥 Exercises
Exercise 1: Skittles experiment
Skittles come in five colors (orange, yellow, red, purple, green) each with their own flavours (orange, lemon, strawberry, grape, green apple). Data was collected by Dr Nick Tierney to explore whether a sample of 3 people could identify the flavour of skittles while blindfolded. You can find the cleaned tidy data here.
- How many skittles did each person taste?
- A person with loss of taste is called ageusia and a person who has a loss of smell is called anosmia. The loss of taste and loss of smell will not allow you to distinguish flavours in food. What is the probability that a person with ageusia and anosmia will guess the skittle flavour correctly (out of the five flavours) for one skittle?
- What is the probability that a person with ageusia and anosmia will guess the skittle flavour correctly for 2 out of 10 skittles, assuming the order of taste does not matter?
- Test the null hypothesis that people cannot distinguish the flavours correctly, against the alternative that they can. Assume that the order of tasting does not matter and each person has the same ability to correctly identify the flavours. In conducting your test, define your null and alternate hypothesis, in statistical notation, your assumptions, the test statistics and calculate the \(p\)-value.
- In part (d) we disregarded the order of the tasting and the possible variability in people’s ability to correctly identify the flavour. If in fact these do matter, then how would you construct the test statistic? Is it easy?
- Consider the plot below that shows in each tile whether a person guessed correctly by order of their tasting. Suppose that under the null hypothesis, the order of tasting does not matter and people have no ability to distinguish the flavours. Generate a null plot under this null hypothesis.
- Based on (f), construct a lineup (using
nullabor
or otherwise) of 20 plots. Ask your classmate, which plot looks different.
- Suppose that you have a response from 100 people based on your line-up from (g) and 76 correctly identified the data plot. What is the \(p\)-value from this visual inference?
- Now consider the plot below. Use the same null data in (g) to construct a lineup based on below visual statistic. Suppose we had 28 people out of 100 who correctly identified the data plot in this lineup. What is the difference in power of visual statistic in (f) and this one?
Exercise 3: IDA skill sprint
Set the timer. You have 15 minutes to discover as many problems as possible in this data, cafe.rda.
A small cafe in the city of Melbourne is interested in determining whether the daily earnings depend on the weather. They compiled data for a period over 2000-2001 to study this question. The data has the following variables:
var | description |
---|---|
dt | Date |
wday | Day of the week |
revenue | Daily revenue in hundreds, 11=1100 |
expend | Daily expenses in hundreds |
precip | Precipitation in mm |
mint | Minimum temperature, Celsius |
maxt | Maximum temperature, Celsius |
source | Source of the weather data |
Your tutor has the list of problems.
👌 Finishing up
Make sure you say thanks and good-bye to your tutor. This is a time to also report what you enjoyed and what you found difficult.