ETC5521 Tutorial 4

Initial data analysis

Author

Prof. Di Cook

🎯 Objectives

Practice conducting initial data analyses, and make a start on learning how to assess significance of patterns.

🔧 Preparation

The reading for this week is Wickham et al. (2010) Graphical inference for Infovis.
- Complete the weekly quiz, before the deadline! - Make sure you have this list of R packages installed:

install.packages(c("tidyverse"))
  • Open your RStudio Project for this unit, (the one you created in week 1, ETC5521). Create a .qmd document for this weeks activities.

📥 Exercises

This tutorial focuses on IDA for the gardenR data, with the goal to answer this question:

Which variety of tomato produces the most return on investment, as measured by weight?

Exercise 1

  1. How many types of vegetables were grown in each year?
  2. How many vegetables were grown in 2020 that were not grown in 2021?
  3. What are some of the data recording errors that can be seen by comparing vegetables grown in each year?

Exercise 2

  1. Join the harvest, spending and planting data for the two years, after adding a new variable each, called year. Show your code.
  2. Make a subset containing just the tomatoes, for each set.
  3. Are the varieties of tomatoes grown each year the same?
  4. Are the tomato varieties grown in the same plots each year?
  5. When are tomatoes planted and harvested, in Lisa’s garden?

Exercise 3 Try to answer the original question.

  1. How should you calibrate weight of harvest by amount of seeds planted?

  2. Which variety produces the most return on investment?

👌 Finishing up

Make sure you say thanks and good-bye to your tutor. This is a time to also report what you enjoyed and what you found difficult.