install.packages(c("colorspace", "lvplot", "patchwork", "janitor", "lubridate", "vcd", "ggbeeswarm", "kableExtra"))
ETC5521 Tutorial 8
Making comparisons between groups and strata
🎯 Objectives
These are exercises so that you can make some numerical and graphical comparisons for various datasets and help to think about the comparisons being made.
🔧 Preparation
- The reading for this week is Wilke (2019) Chapters 9, 10.2-4, 11.2.
- Complete the weekly quiz, before the deadline!
- Install the following R-packages if you do not have them already:
- Open your RStudio Project for this unit, (the one you created in week 1,
ETC5521
). Create a.qmd
document for this weeks activities.
📥 Exercises
Exercise 1: Melbourne daily maximum temperature
The csv file melb_temp_2023-09-08.csv
contains data on the daily maximum temperature from 1970 to 2023 collected from the weather station at Melbourne Airport. Use this to answer the following questions, with the additional information that in Australia:
- Summer is from the beginning of December to the end of February,
- Autumn is from the beginning of March to the end of May,
- Winter is from the beginning of June to the end of August, and
- Spring is from the beginning of September to the end of November.
- There are four plots below. Write the code to make them yourself. Then think about the three questions (i), (ii) or (iii) below.
- Are there any winters where the daily maximum temperature is different to winter in other years?
- What is the general pattern of maximum daily temperatures in winter?
- Is there evidence that winters in Melbourne are getting warmer?
Which plot best matches each question? If none of them work, for any particular question, make an alternative plot. Also, if any of the plots don’t help answer any of the questions, think about a question that they might answer.
- Make a transformation of the data and a new plot with this variable, that will allow a more direct comparison to answer question (iii).
The data can be read and processed using this code:
<- read_csv("https://raw.githubusercontent.com/numbats/ddde/main/data/melb_temp_2023-09-08.csv") |>
melb_df clean_names() |>
rename(temp = maximum_temperature_degree_c) |>
::filter(!is.na(temp)) |>
dplyr::select(year, month, day, temp) |>
dplyrmutate(
date = as.Date(paste(year, month, day, sep = "-")))
Exercise 2: Hate Crime
A certain person made the following statement about this data and used the graph below to illustrate his point.
The post-9/11 upsurge in hate crimes against Muslims was real and unforgivable, but the horrible truth is that it didn’t loom that large compared with what Blacks face year in and year out.
<- tribble(
df ~year, ~offense, ~count,
2000, "Anti-Black", 3535,
2000, "Sexual Orientation", 1558,
2000, "Anti-Islamic", 36,
2001, "Anti-Black", 3700,
2001, "Sexual Orientation", 1664,
2001, "Anti-Islamic", 554,
2002, "Anti-Black", 3076,
2002, "Sexual Orientation", 1513,
2002, "Anti-Islamic", 174
|>
) mutate(offense = fct_reorder(offense, -count))
<- tribble(
pop_df ~pop, ~size,
"Anti-Black", 36.4e6,
"Sexual Orientation", 28.2e6,
"Anti-Islamic", 3.4e6
)
<- left_join(df, pop_df, by = c("offense" = "pop")) |>
crime_df mutate(prop = count / size)
Discuss whether the plot supports his statement or not. Is his comparison of the number of crimes against Muslim and Blacks fair? What graph would you suggest to make to support/disprove his statement? The data and additional information is provided below.
This uses the data from the USA hate crime statistics found here. The number of victims by three particular hate crime is shown in the table below.
Year | Offense | Victims |
---|---|---|
2000 | Anti-Black | 3535 |
2000 | Sexual Orientation | 1558 |
2000 | Anti-Islamic | 36 |
2001 | Anti-Black | 3700 |
2001 | Sexual Orientation | 1664 |
2001 | Anti-Islamic | 554 |
2002 | Anti-Black | 3076 |
2002 | Sexual Orientation | 1513 |
2002 | Anti-Islamic | 174 |
The 2000 USA Census reports that there were a total of 36.4 million people who reported themselves as Black or African American. Weeks (2003) estimated there are 3.4 million Muslims in the USA. The LGBT population is harder to estimate but reports indicate 2-10% of the population so likely below 28.2 million people in the USA.
Exercise 3: Evidence of Simpson’s paradox?
Check the following data set for evidence of Simpsons Paradox, in the sense that if group2 == "X"
the pass rate is higher.
<- tribble(
df ~group1, ~group2, ~result, ~count,
"A", "X", "pass", 100,
"B", "X", "pass", 50,
"C", "X", "pass", 25,
"A", "X", "fail", 10,
"B", "X", "fail", 20,
"C", "X", "fail", 20,
"A", "Y", "pass", 10,
"B", "Y", "pass", 70,
"C", "Y", "pass", 15,
"A", "Y", "fail", 20,
"B", "Y", "fail", 40,
"C", "Y", "fail", 30)
Exercise 4: Discussion
This week the Women’s Weekly published a story about famous Australian model, Elle McPherson’s breast cancer story. Diagnosed 7 years ago, she is in remission after choosing alternative therapies as treatment. The original diagnosis was accompanied by lumpectomy removing the cancerous tissue.
What does data say relative to this statement?
Alternative therapies assisted Elle’s being considered cleared of cancer today.
👌 Finishing up
Make sure you say thanks and good-bye to your tutor. This is a time to also report what you enjoyed and what you found difficult.