ETC5521 Worksheet Week 6

Exploring bivariate dependencies

Author

Prof. Di Cook

Exercise 1: Fisherman’s Reach crabs

Mud crabs are delicious to eat! Prof Cook’s father started a crab farm at Fisherman’s Reach, NSW, when he retired. He caught small crabs (with a special license) and nurtured and fed the crabs until they were marketable size. They were then sent to market, like Queen Victoria Market in Melbourne, for people to buy to eat. Mud crabs have a strong and nutty flavour, and a good to eat simply after steaming or boiling.

Early in the farming setup, he collected the measurements of 62 crabs of different sizes, because he wanted to learn when was the best time to send the crab to market. Crabs re-shell from time to time. They grow too big for their shell, and need to discard it. Ideally, the crabs should be sent to market just before they re-shell, because they will be crab will be fuller in the shell, less air, less juice and more crab meat.

Note: In NSW it is legal to sell female mud crabs, as long as they are not carrying eggs. In Queensland, it is illegal to keep and sell female mud crabs. Focusing only on males could be worthwhile.

Code
fr_crabs <- read_csv("https://ddde.numbat.space/data/fr-crab.csv") %>%
  mutate(Sex = factor(Sex, levels=c(1,2),
                      labels=c("m", "f")))
  1. Where is Fisherman’s Reach? What would you expect the relationship between Length and Weight of a crab to be?
  1. Make a scatterplot of Weight by NSW Length. Describe the relationship. It might be even better if you can add marginal density plots to the sides of the scatterplot. (Aside: Should one variable be considered a dependent variable? If so, make sure this is on the \(y\) axis.)
  1. Examine transformations to linearise the relationship. (Think about why the relationship between Length and Weight is nonlinear.)
  1. Is there possibly a lurking variable? Examine the variables in the data, and use colour in the plot to check for another variable explaining some of the relationship.
  1. If you have determined that the is a lurking variable, make changes in the plots to find the best model of the relationship between Weight and Length.
  1. How would you select the crabs that were close to re-shelling based on this data?

Exercise 2: Bank discrimination

Code
data(case1202, package = "Sleuth2")
  1. Look at the help page for the case1202 from the Sleuth2 package. What does the variable “Senior” measure? “Exper”? Age?
  1. Make all the pairwise scatterplots of Senior, Exper and Age. What do you learn about the relationship between these three pairs of variables? How can the age be 600? Are there some wizards or witches or vampires in the data?
  1. Colour the observations by Sex. What do you learn?
  1. Instead of scatterplots, make faceted histograms of the three variables by Sex. What do you learn about the difference in distribution of these three variables between the sexes.
  1. The data also has 1975 salary and annual salary. Plot these two variables, in two ways: (1) coloured by Sex, and (2) faceted by Sex. Explain the relationships.
  1. Examine the annual salary against Age, Senior and Exper, separately by Sex, by adding a fitted linear model to the scatterplot where Sex is mapped to colour. What is the relationship and what do you learn?
  1. When you use geom_smooth(method="lm") to add a fitted model to the scatterplot, is it adding a model with interactions?
  1. There is danger of misinterpreting differences when only examining marginal plots. What we need to know is: for a person with the same age, same experience, same seniority, is the salary different for men and women. How would you make plots to try to examine this?
  1. Would you say that this data provides evidence of sex discrimination?