ETC5521: Diving Deeply into Data Exploration

Lecturer/Chief Examiner

Tutors

  • Krisanat Anukarnsakulchularp
    • Tutorials: Thu 9am (CL_LTB_188), 10am (CL_LTB_188), 3pm (CL_LTB_188)
    • Consultation: TBD

Weekly schedule

  • Lecture+workshop: Tues 2-5pm on zoom (link in Moodle)
  • Tutorial: 1 hour
  • Weekly learning quizzes due each Thursday 9am, from week 2
Week Topic Reference Assessments
29 Jul Overview. Why this course? What is EDA? The Landscape of R Packages for Automated Exploratory Data Analysis
05 Aug Learning from history EDA Case Study: Bay area blues
12 Aug Initial data analysis and model diagnostics: Model dependent exploration and how it differs from EDA The initial examination of data
19 Aug Using computational tools to determine whether what is seen in the data can be assumed to apply more broadly Wickham et al. (2010) Graphical inference for Infovis Exercises 1
26 Aug Working with a single variable, making transformations, detecting outliers, using robust statistics Wilke (2019) Ch 6 Visualizing Amounts; Ch 7 Visualizing distributions
02 Sep Bivariate dependencies and relationships, transformations to linearise Wilke (2019) Ch 12 Visualising associations Exercises 2
09 Sep Making comparisons between groups and strata Wilke (2019) Ch 9, 10.2-4, 11.2
16 Sep Going beyond two variables, exploring high dimensions Cook and Laa (2023) Interactively exploring high-dimensional data and models in R Chapter 1
23 Sep Exploring data having a space and time context Part I brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R Exercises 3
30 Sep Mid-semester break
07 Oct Exploring data having a space and time context Part II cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data
14 Oct Sculpting data using models, checking assumptions, co-dependency and performing diagnostics How to use a tour to check if your model suffers from multicollinearity Project Part 1
21 Oct Help session
04 Nov Project Part 2

Assessments

Software

We will be using the latest versions of R and RStudio.

Here is the code to install (most of) the R packages we will be using in this unit.

install.packages(c("tidyr", "dplyr", "readr", "readxl", "readabs", "forcats", "tsibble", "cubble", "lubridate", "ggplot2", "GGally", "ggthemes", "sugrrants", "ggbeeswarm", "plotly", "gganimate", "tourr", "sugarbag", "tsibbletalk", "visdat", "inspectdf", "naniar", "validate", "vcd", "mvtnorm", "nullabor", "visage", "forecast", "cassowaryr", "brolgar", "palmerpenguins", "housingData",  "broom", "kableExtra", "lvplot", "colorspace", "patchwork"), dependencies=TRUE)

From GitHub, install

remotes::install_github("casperhart/detourr")

If you are relatively new to R, working through the materials at https://startr.numbat.space is an excellent way to up-skill. You are epsecially encouraged to work through Chapter 3, on Troubleshooting and asking for help, because at some point you will need help with your coding, and how you go about this matters and impacts the ability of others to help you.

Creative Commons License
These materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.