Instructions

Solutions to the exercises of this homework 7 should, just as for HW1-HW6, be written in an R-Markdown document with output: github_document. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW7 folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.

Note that there are new data-sets available in the HW_data repository. Downloading them by opening the associated R-project and issue a “pull”. If it fails, delete the HW_data folder on your computer and clone the repository again according to the instructions in HW2.

Deadline

This homework exercise corresponds to the re-examination of the homework part of the HT2019 course for students who particpated actively in the submissions of homeworks during the semester, but for various reasons did not pass one or several of their homework. Who has to which exercises depends sligtly on what homework was missed and is specified individually. All who followed the protocol should have instructions in their “Participating in HW7?” issue on what they need to do.

Deadline for this re-examination homework is 2020-02-28. Submission occurs as usual by pushing into your homework repository followed by a new issue with the title “HW7 ready for grading” in your repository or re-use the above mentioned issue. The date of passing the homework part, if you pass this additional homework, will be the date you raised your HW7 issue. Submissions beyond the final February deadline are not accepted. Please also add a link from your repository’s README.md file to HW7/HW7.md.

Exercise 1 - Birdwatching

At Artportalen you can register your sightings of animals, plants or mushrooms. In this task you should analyse sightings recorded in Stockholm during January to October 2018, downloaded from The Analysis portal for biodiversity data. We will focus on birds, since they are by far the most popular to report.

A glimpse of data available in HW_data/SpeciesObservations-2018-11-07-14-23-54.csv is given by

species_data <- read_csv(file.path("HW_data","SpeciesObservations-2018-11-07-14-23-54.csv"))
glimpse(species_data)
## Observations: 55,374
## Variables: 16
## $ `Scientific name`            <chr> "Lepus europaeus", "Larus fuscus", "Laru…
## $ `Common name`                <chr> "fälthare", "silltrut", "silltrut", "bre…
## $ `Organism group`             <chr> "Däggdjur", "Fåglar", "Fåglar", "Kärlväx…
## $ `Occurrence status`          <chr> "Present", "Present", "Present", "Presen…
## $ `Recorded by`                <chr> "Henry Gudmundson", "Pär Grönhagen", "To…
## $ Locality                     <chr> "Brännkyrka sn, Enskedefältet", "Västber…
## $ County                       <chr> "Stockholm", "Stockholm", "Stockholm", "…
## $ `Coordinate uncertainty (m)` <dbl> 10, 1962, 1962, 10, 70, 70, 70, 70, 70, …
## $ X                            <dbl> 2010152, 2004678, 2004678, 2006696, 2006…
## $ Y                            <dbl> 8242684, 8243673, 8243673, 8249223, 8249…
## $ Start                        <chr> "06/04/2018", "29/03/2018", "24/06/2018"…
## $ End                          <chr> "06/04/2018", "29/03/2018", "24/06/2018"…
## $ Dataset                      <chr> "Species Observations System (Artportale…
## $ `Uncertain determination`    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
## $ Identification               <chr> "Unvalidated", "Unvalidated", "Unvalidat…
## $ ObservationId                <dbl> 113645773, 113482322, 115220647, 1158302…

Task 1 - Data cleaning

There are a few things we would like to fix initially:

  • Some variable names contains spaces, which can be inconvenient. Either replace these spaces in variable names with underscores or use backtick quotation in your dplyr code, i.e. `Common name`.

  • The Start and End variables should be of Date-format, not chr (depending on your local settings they may actually be correctly interpreted), use either as.Date or lubridate functionality within a mutate to fix this.

  • Also turn Uncertain determination into a logical variable using as.logical.

  • Create a new variable Date that equals the value of Start, in the following this will represent the observation date.

Task 2: Working with the data

  1. List the most recorded bird species (Organism_group == "Fåglar") in January and in July (two tables with Common_name and number observed as columns). The function knitr::kable is useful for rendering tables in Markdown.

  2. Introduce a variable Weekday by applying the function weekdays to Date. Visualise the weekly activity of observers by a bar graph showing the number of bird sightings each day of the week. Make sure the bars are ordered from Monday to Sunday (see textbook 15.4).

  3. The Willow warbler (lövsångare, Phylloscopus trochilus) is one of the most common migratory birds in Sweden, spending winters in sub-Saharan Africa. List the first five (unique) observers (Recorded_by) recording a Willow warbler in Stockholm, together with the date recorded.

  4. Plot the monthly number of recorded “lövsångare” and “talgoxe” in the same figure.

Exercise 2

In this exercise we re-visit the Rubik’s cube setup from Homework 4, Exercise 1.

Tasks

  1. Write stringr code which, given an algorithm, returns the left-right (L/R) mirrored version of the algorithm. See for example http://cube.crider.co.uk/algtrans.html for what the L/R mirrored version is. Determine the L/R mirrored version of the T-perm using your code.

  2. Use the http://cube.crider.co.uk/visualcube.php API to generate a .png image of applying the L/R mirrored version of the T-Perm to a solved cube through R. Include the resulting image using knitr::include_graphics(). Hint: Just use file.download on the appropriate URL.

Exercise 3

In this exercise we study longitudinal data on the protein content of milk from 79 cows on three different diets. The protein content is measured per cow for each week from 1 week up to 19 weeks after calving.

Tasks

  1. Install the nlme package and load the Milk data from the package. Write code which computes the number of cows in the dataset and report the result (it should be 79).
  2. Create a single ggplot showing the trajectory in protein content as a function of time since calving for each cow and use geom_smooth to show a smooth curve for the expectation in each of the 3 diet groups. Which diet appears to provide the highest level of protein in the milk?
  3. Use spread to convert the data into wide format (Milk_wide), such that each cow has one row. Study the NAs in the resulting dataset and describe which two patterns you observe and how these patterns are probably caused. Hint: Before spread works on the milk dataset, you might need to massage the input data slightly.
  4. Convert the data back to long format thus making the result (Milk_long) now contain appropriate NAs. Use the naniar package to extend your plot from task a. using the naniar::geom_miss_point geometry. Comment on the patterns you see in this plot.