To work out which differ between years, use the anti_join() function. Because there are multiple plots of vegetables use distinct() to remove duplicates, and then arrange alphabetically for ease of comparison.
# A tibble: 19 × 3
variety `2020` `2021`
<chr> <int> <int>
1 Amish Paste 30 19
2 Better Boy 23 NA
3 Big Beef 21 24
4 Black Krim 12 12
5 Bonny Best 27 15
6 Brandywine 16 NA
7 Bush Goliath NA 6
8 Cherokee Purple 14 10
9 Early Girl NA 29
10 Jet Star 13 NA
11 Mortgage Lifter 18 22
12 Old German 19 4
13 San Marzano NA 19
14 Striped German NA 8
15 Sweet 100 Cherry NA 28
16 grape 39 NA
17 volunteer NA 16
18 volunteers 31 NA
19 yellow NA 4
# A tibble: 14 × 6
variety year D J N O
<chr> <chr> <int> <int> <int> <int>
1 Amish Paste 2021 1 NA 1 NA
2 Amish Paste 2020 NA 1 1 NA
3 Big Beef 2021 1 NA 1 NA
4 Big Beef 2020 NA NA 1 NA
5 Black Krim 2020 NA NA 1 NA
6 Black Krim 2021 NA NA 1 NA
7 Bonny Best 2020 NA 1 NA NA
8 Bonny Best 2021 NA NA NA 1
9 Cherokee Purple 2020 NA 1 NA NA
10 Cherokee Purple 2021 NA NA 1 1
11 Mortgage Lifter 2021 1 NA 1 NA
12 Mortgage Lifter 2020 NA 1 1 NA
13 Old German 2021 1 NA NA NA
14 Old German 2020 NA 1 NA NA
Not a single variety is grown in the same plot each year. This might cause problems, if the plots are not equally good for growing tomatoes.
Again, just use the tomatoes that are grown in each year.
Planting is usually in late May, and it is consistent for all the varieties.
Harvesting starts around 50 days after planting. 2020 had an earlier harvest than 2021 for all varieties. Big Beef tends to be harvested first, and Black Kim later. Old German had a poor harvest in 2021 relative to 2022.
Exercise 3 Try to answer the original question.
How should you calibrate weight of harvest by amount of seeds planted?
Which variety produces the most return on investment?
Solution
We’ll divide the weight by number of seeds planted. An interesting observation is that so few tomato seeds were planted! Packets of tomato seeds have lots of seeds.