ETC5521 Tutorial 10

Exploring data having a space and time context

Author

Prof. Di Cook

🎯 Objectives

This tutorial practices rearranging spatiotemporal data to focus on spatial or temporal patterns, and constructing choropleth maps and cartograms.

🔧 Preparation

Resources for this week is Moraga (2019) Spatial data and R packages for mapping; cubble: A Vector Spatio-Temporal Data Structure for Data Analysis; Making maps plot faster Simplify spatial polygons; sf: Simple Features for R.
Complete the weekly quiz, before the deadline!
Install the following R-packages if you do not have them already:

install.packages(c("tidyverse","here","lubridate","GGally","tsibble","cubble","forcats","cartogram","sf","cartogram","patchwork","ggthemes", "sugarbag", "viridis", "rmapshaper"))
remotes::install_github("runapp-aus/strayr")

Open your RStudio Project for this unit, (the one you created in week 1, ETC5521). Create a .qmd document for this weeks activities.

📥 Exercises

Exercise 1: Melbourne Covid-19 outbreak

In Melbourne we were in a strict lockdown for much of 2020, and large chunks of 2021. Each week we got our hopes up that restrictions might be eased, and once again these hopes were dashed by announcements each week, keeping the restrictions a little longer. The data we have collected here are the case counts by Victorian local government area (LGA) since the beginning of July, 2020. We will examine the spatiotemporal distribution of these counts.

Working with spatial data is always painful! It almost always requires some ugly code.

Part of the reason for the difficulty is the use of special data objects, that describe maps. There are several different choices, and some packages and tools use one, and others use another, so not all tools work together. The sf package helps enormously, but when you run into errors it can be hard to debug.
Another reason is that map objects can be very large, which makes sense for accurate mapping, but for data analysis and visualisation, we’d rather have smaller, even if slightly inaccurate, spatial objects. It is virtually always necessary to thin out map data before doing further analysis - you need special tools for this, eg mapshapr. We don’t really need this for the exercises here, because the strayr version of the LGAs is already thinned.
Another problem commonly encountered is that there are numerous coordinate systems, and types of projections of the 3D globe into a 2D canvas. We have become accustomed to lat/long but like time its an awkward scale to compute on because a translation from E/W and N/S to positive and negative values is needed. More commonly a Universal Transverse Mercator (UTM) is the standard but its far less intuitive to use.
And yet another reason is that keys linking data tables and spatial tables may not match perfectly because there are often synonyms or slightly different name preferences between different data collectors.

The code for all the analysis is provided for you in the solution. We recommend that you run the code in steps to see what it is doing, why the mutating and text manipulations are necessary. Talk about the code with each other to help you understand it.

a. Read case counts for 2020

The file melb_lga_covid.csv contains the cases by LGA. Read the data in and inspect result. You should find that some variables are type chr because “null” has been used to code entries on some days. This needs fixing, and also missings should be converted to 0. Why does it make sense to substitute missings with 0, here?

b. Check the data

Check the case counts to learn whether they are daily or cumulative. The best way to do this is select one suburb where there were substantial cases, and make a time series. If the counts are cumulative, calculate the daily counts, and re-check the temporal trend for your chosen LGA. Describe the temporal trend, and any visible artifacts.

c. Spatial polygons size

Now let’s get polygon data of Victorian LGAs using the strayr package. The map is already fairly small, so it doesn’t need any more thinning, but we’ll look at how thinning works.

Get a copy of the lga2018 using strayr::read_absmap(). Save the resulting data as an .rda file, and plot the map.

Now run rmapshaper::ms_simplify(), saving it as a different object. Save the object as an .rda file, and plot the map.

What is the difference in file size before and after thinning. Can you see a difference in the map?

c. Spatial polygons matching

Now let’s match polygon data of Victorian LGAs to the COVID counts. The cubble::check_key() can be used to check if the keys match between spatial and temporal data sets.

You will find that we need to fix some names of LGAs, even though cubble does a pretty good job working out which are supposed to match.

e. Choropleth map

Sum the counts over the time period for each LGA, merge the COVID data with the map polygons (LGA) and create a choropleth map. The LGA data is an sf object so the geom_sf will automatically grab the geometry from the object to make the spatial polygons. Where was the highest COVID incidence?

f. Cartogram

To make a population-transformed polygon we need to get population data for each LGA. The file VIF2019_Population_Service_Ages_LGA_2036.xlsx has been extracted from the Vic Gov web site. It is a complicated xlsx file, with the data in sheet 3, and starting 13 rows down. The readxl package is handy here to extract the population data needed. You’ll need to join the population counts to the map data to make a cartogram. Once you have the transformed polygon data, the same plotting code can be used, as created the choropleth map.

g. Hexagon tile map

Use the provided code to make a hexgon tile map, with functions from the sugarbag package. Is it easier to see the spatial distribution of incidence from the hexagon tile map, or the choropleth or the cartogram?

👌 Finishing up

Make sure you say thanks and good-bye to your tutor. This is a time to also report what you enjoyed and what you found difficult.