ETC5521 Worksheet Week 9

Exploring data having a space and time context Part I

Author

Prof. Di Cook

Exercise 1: Men’s heights

The heights data provided in the brolgar package contains average male heights in 144 countries from 1500-1989.

  1. What’s the time index for this data? What is the key?

The time index is year, and key is country.

  1. Filter the data to keep only measurements since 1700, when there are records for many countries. Make a spaghetti plot for the values from Australia. Does it look like Australian males are getting taller?

Its looking like Australian males are getting taller BUT …. There are few measurements in the 1900s, and none since 1975. The data for Australia looks unreliable.

Code
heights <- brolgar::heights |> filter(year > 1700)
heights_oz <- heights |> 
  filter(country == "Australia") 
ggplot(heights_oz,
       aes(x = year,
           y = height_cm,
           group = country)) + 
  geom_point() + 
  geom_line()

  1. Check the number of observations for each country. How many countries have less than five years of measurements? Filter these countries out of the data, because we can’t study temporal trend without sufficient measurements.
Code
heights <- heights |> 
  add_n_obs() |> 
  filter(n_obs >= 5)
  1. Make a spaghetti plot of all the data, with a smoother overlaid. Does it look like men are generally getting taller?

Generally, the trend is up, so yes it does look like men are getting taller acorss the globe.

Code
ggplot(heights,
       aes(x = year,
           y = height_cm)) + 
  geom_line(aes(group = country), alpha = 0.3) + 
  geom_smooth(se=FALSE)

  1. Use facet_strata to break the data into subsets using the year, and plot is several facets. What sort of patterns are there in terms of the earliest year that a country appears in the data?

The countries are pretty evenly distributed across the facets, which means that there are roughly similar numbers of countries regularly joining their data into the collection.

Code
heights <- as_tsibble(heights,
                      index = year,
                      key = country,
                      regular = FALSE)
set.seed(530)
ggplot(heights, aes(x = year,
           y = height_cm,
           group = country)) + 
  geom_line() + 
  facet_strata(along = -year)

  1. Compute the three number summary (min, median, max) for each country. Make density plots of these statistics, overlaid in a single plot, and a parallel coordinate plot of these three statistics. What is the average minimum (median, maximum) height across countries? Are there some countries who have roughly the same minimum, median and maximum height?

The average minimum height is about 164cm, median is about 168cm and tallest is about 172cm. The maximum height appears to be bimodal, with a small peak around 178cm.

Most countries have the expected pattern of increasing heights from minimum, median to maximum. There are a few which have very similar values of these, though, which is a bit surprising. It means that there has been no change in these metrics over time.

Code
heights_three <- heights |>
  features(height_cm, c(
    min = min,
    median = median,
    max = max
  ))
heights_three_l <- heights_three |> 
  pivot_longer(cols = min:max,
               names_to = "feature",
               values_to = "value")

p1 <- heights_three_l |> 
  ggplot(aes(x = value,
             fill = feature)) + 
  geom_density(alpha = 0.5) +
  labs(x = "Value",
       y = "Density",
       fill = "Feature") + 
  scale_fill_discrete_qualitative(palette = "Dark 3") +
  xlab("Height") +
  ylab("") +
  theme(legend.position = "none",
        aspect.ratio = 1)

p2 <- heights_three_l |> 
 ggplot(aes(x = factor(feature, 
                       levels = c("min", "median", "max")),
            y = value,
             group = country)) + 
  geom_line(alpha = 0.4) +
  xlab("") +
  ylab("Height") +
  theme(aspect.ratio = 1)

heights_three <- heights_three |> 
  mutate(country = factor(country)) |>
  mutate(country = fct_reorder(country, median)) 
p3 <- heights_three |>
    ggplot() + 
    geom_point(aes(x = country,
           y = median)) +
    geom_errorbar(aes(x = country, 
                      ymin=min, ymax=max), 
                  alpha = 0.6, width=0) +
    xlab("") + ylab("heights") +
    coord_flip() +
  theme(axis.text.y = element_text(size=6),
        aspect.ratio = 2)
 
design <- "
1133
1133
2233
2233"
p1 + p2 + p3 + 
  plot_layout(design = design)

  1. Which country has the tallest men? Which country has highest median male height? Which country has the shortest men? Would you say that the distribution of heights within a country is similar for all countries?

Denmark has the tallest man (max). Estonia has the tallest median height. Papua New Guinea has the shortest men, on all metrics. The distribution of heights over the years is not the same for each country.