ETC5521: Diving Deeply into Data Exploration

Lecturer/Chief Examiner

  • Professor Di Cook
    • Email: etc5521.clayton-x@monash.edu
    • Consultation: Fridays 11-1 Clayton: Education Blg, its blg 6, 29 Ancora Imparo way, Room 352 and on zoom (see link in moodle)

Tutors

  • Krisanat Anukarnsakulchularp
    • Tutorials: Thu 9am (CL_LTB_188), 10am (CL_LTB_188), 3pm (CL_LTB_387)
    • Consultation: Thursdays 4-6pm Clayton: Education Blg, its blg 6, 29 Ancora Imparo way, Room 232A

Weekly schedule

  • Lecture+workshop: Tues 2-5pm on zoom (link in Moodle)
  • Tutorial: 1 hour
  • Weekly learning quizzes due each Thursday 9am, from week 2
Week Topic Reference Assessments
29 Jul Overview. Why this course? What is EDA? The Landscape of R Packages for Automated Exploratory Data Analysis
05 Aug Learning from history EDA Case Study: Bay area blues Quiz 1
12 Aug Initial data analysis and model diagnostics: Model dependent exploration and how it differs from EDA The initial examination of data Quiz 2
19 Aug Using computational tools to determine whether what is seen in the data can be assumed to apply more broadly Wickham et al. (2010) Graphical inference for Infovis Exercises 1,Quiz 3
26 Aug Working with a single variable, making transformations, detecting outliers, using robust statistics Wilke (2019) Ch 6 Visualizing Amounts; Ch 7 Visualizing distributions Quiz 4
02 Sep Bivariate dependencies and relationships, transformations to linearise Wilke (2019) Ch 12 Visualising associations Exercises 2,Quiz 5
09 Sep Making comparisons between groups and strata Wilke (2019) Ch 9, 10.2-4, 11.2 Quiz 6
16 Sep Going beyond two variables, exploring high dimensions Cook and Laa (2023) Interactively exploring high-dimensional data and models in R Chapter 1 Quiz 7
23 Sep Exploring data having a space and time context Part I brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R Exercises 3,Quiz 8
30 Sep Mid-semester break
07 Oct Exploring data having a space and time context Part II cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data Quiz 9
14 Oct Sculpting data using models, checking assumptions, co-dependency and performing diagnostics How to use a tour to check if your model suffers from multicollinearity Project Part 1,Quiz 10
21 Oct Help session Quiz 11
04 Nov Project Part 2

Assessments

Software

We will be using the latest versions of R and RStudio.

Here is the code to install (most of) the R packages we will be using in this unit.

install.packages(c("tidyr", "dplyr", "readr", "readxl", "readabs", "forcats", "tsibble", "cubble", "lubridate", "ggplot2", "GGally", "ggthemes", "sugrrants", "ggbeeswarm", "plotly", "gganimate", "tourr", "sugarbag", "tsibbletalk", "visdat", "inspectdf", "naniar", "validate", "vcd", "mvtnorm", "nullabor", "visage", "forecast", "cassowaryr", "brolgar", "palmerpenguins", "housingData",  "broom", "kableExtra", "lvplot", "colorspace", "patchwork"), dependencies=TRUE)

From GitHub, install

remotes::install_github("casperhart/detourr")

If you are relatively new to R, working through the materials at https://startr.numbat.space is an excellent way to up-skill. You are epsecially encouraged to work through Chapter 3, on Troubleshooting and asking for help, because at some point you will need help with your coding, and how you go about this matters and impacts the ability of others to help you.

Creative Commons License
These materials are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.