ETC5521: Diving Deeply into Data Exploration

Lecturer/Chief Examiner

Tutors

  • Krisanat Anukarnsakulchularp
    • Tutorials: Wed 9:30-11:00am and 7:30-9:00pm CL_Anc-19.LTB_134
    • Consultation: Mon 9.30-11:00 am, Menzies W9.20

Weekly schedule

  • Lecture: Tues 10-12 on zoom (link in Moodle)
  • Tutorial: 1.5 hours
  • Weekly learning quizzes due each Wednesday 9am, from week 2
Week Topic Reference Assessments
22 Jul Overview. Why this course? What is EDA? The Landscape of R Packages for Automated Exploratory Data Analysis
29 Jul Learning from history EDA Case Study: Bay area blues
05 Aug Initial data analysis and model diagnostics: Model dependent exploration and how it differs from EDA The initial examination of data Assignment 1
12 Aug Using computational tools to determine whether what is seen in the data can be assumed to apply more broadly Wickham et al. (2010) Graphical inference for Infovis
19 Aug Working with a single variable, making transformations, detecting outliers, using robust statistics Wilke (2019) Ch 6 Visualizing Amounts; Ch 7 Visualizing distributions
26 Aug Bivariate dependencies and relationships, transformations to linearise Wilke (2019) Ch 12 Visualising associations Assignment 2
02 Sep Making comparisons between groups and strata Wilke (2019) Ch 9, 10.2-4, 11.2
09 Sep Going beyond two variables, exploring high dimensions Cook and Laa (2023) Interactively exploring high-dimensional data and models in R Chapter 1
16 Sep Exploring data having a space and time context Part I brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R Assignment 3
23 Sep Mid-semester break
30 Sep Exploring data having a space and time context Part II cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data
07 Oct Sculpting data using models, checking assumptions, co-dependency and performing diagnostics How to use a tour to check if your model suffers from multicollinearity Assignment 4 Part 1
14 Oct Extending beyond the data, what can and cannot be inferred more generally, given the data collection
28 Oct Assignment 4 Part 2

Assessments

Software

We will be using the latest versions of R and RStudio.

Here is the code to install (most of) the R packages we will be using in this unit.

install.packages(c("tidyr", "dplyr", "readr", "readxl", "readabs", "forcats", "tsibble", "cubble", "lubridate", "ggplot2", "GGally", "ggthemes", "sugrrants", "ggbeeswarm", "plotly", "gganimate", "tourr", "sugarbag", "tsibbletalk", "visdat", "inspectdf", "naniar", "validate", "vcd", "mvtnorm", "nullabor", "visage", "forecast", "cassowaryr", "brolgar", "palmerpenguins", "housingData",  "broom", "kableExtra", "lvplot", "colorspace", "patchwork"), dependencies=TRUE)

From GitHub, install

remotes::install_github("casperhart/detourr")

If you are relatively new to R, working through the materials at https://learnr.numbat.space is an excellent way to up-skill. You are epsecially encouraged to work through Chapter 3, on Troubleshooting and asking for help, because at some point you will need help with your coding, and how you go about this matters and impacts the ability of others to help you.

Creative Commons License
These materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.