Code
data(galaxies, package = "MASS")
Working with a single variable
Prof. Di Cook
Load the galaxies
data in the MASS
package and answer the following questions based on this dataset.
You can access documentation of the data (if available) using the help
function specifying the package name in the argument.
This code might be helpful to get you started. This code generates a jittered dotplot, but you can use your preferred type from part e.
# Generate null plots and make a lineup
galaxies_null <- tibble(.sample=1, galaxies=galaxies_sim1)
for (i in 2:19) {
gsim <- rnormmix(n=length(galaxies),
lambda=galaxies_fit$lambda,
mu=galaxies_fit$mu,
sigma=galaxies_fit$sigma)
galaxies_null <- bind_rows(galaxies_null,
tibble(.sample=i, galaxies=gsim))
}
galaxies_null <- bind_rows(galaxies_null,
tibble(.sample=20,
galaxies=galaxies))
# Randomise .sample to hide data plot
galaxies_null$.sample <- rep(sample(1:20, 20), rep(82, 20))
ggplot(tibble(galaxies_null), aes(x=galaxies)) +
geom_quasirandom(aes(x=1, y=galaxies)) +
facet_wrap(~.sample, ncol=5) +
coord_flip() +
theme(
aspect.ratio = 0.7,
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()
)
For each of the variables in the data, which-transform.csv
, decide on an appropriate transformation to make the distribution more symmetric for five of the variables and remove discreteness on one variable.