Code
data(galaxies, package = "MASS")Working with a single variable
Prof. Di Cook
Load the galaxies data in the MASS package and answer the following questions based on this dataset.
You can access documentation of the data (if available) using the help function specifying the package name in the argument.
This code might be helpful to get you started. This code generates a jittered dotplot, but you can use your preferred type from part e.
# Generate null plots and make a lineup
galaxies_null <- tibble(.sample=1, galaxies=galaxies_sim1)
for (i in 2:19) {
gsim <- rnormmix(n=length(galaxies),
lambda=galaxies_fit$lambda,
mu=galaxies_fit$mu,
sigma=galaxies_fit$sigma)
galaxies_null <- bind_rows(galaxies_null,
tibble(.sample=i, galaxies=gsim))
}
galaxies_null <- bind_rows(galaxies_null,
tibble(.sample=20,
galaxies=galaxies))
# Randomise .sample to hide data plot
galaxies_null$.sample <- rep(sample(1:20, 20), rep(82, 20))
ggplot(tibble(galaxies_null), aes(x=galaxies)) +
geom_quasirandom(aes(x=1, y=galaxies)) +
facet_wrap(~.sample, ncol=5) +
coord_flip() +
theme(
aspect.ratio = 0.7,
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()
)For each of the variables in the data, which-transform.csv, decide on an appropriate transformation to make the distribution more symmetric for five of the variables and remove discreteness on one variable.
