Solutions to the exercises of this homework 7 should, just as for HW1-HW6, be written in an R-Markdown document with output: github_document
. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW7
folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.
Note that there are new data-sets available in the HW_data
repository. Downloading them by opening the associated R-project and issue a “pull”. If it fails, delete the HW_data
folder on your computer and clone the repository again according to the instructions in HW2.
This homework exercise corresponds to the re-examination of the homework part of the HT2019 course for students who particpated actively in the submissions of homeworks during the semester, but for various reasons did not pass one or several of their homework. Who has to which exercises depends sligtly on what homework was missed and is specified individually. All who followed the protocol should have instructions in their “Participating in HW7?” issue on what they need to do.
Deadline for this re-examination homework is 2020-02-28. Submission occurs as usual by pushing into your homework repository followed by a new issue with the title “HW7 ready for grading” in your repository or re-use the above mentioned issue. The date of passing the homework part, if you pass this additional homework, will be the date you raised your HW7 issue. Submissions beyond the final February deadline are not accepted. Please also add a link from your repository’s README.md
file to HW7/HW7.md
.
At Artportalen you can register your sightings of animals, plants or mushrooms. In this task you should analyse sightings recorded in Stockholm during January to October 2018, downloaded from The Analysis portal for biodiversity data. We will focus on birds, since they are by far the most popular to report.
A glimpse of data available in HW_data/SpeciesObservations-2018-11-07-14-23-54.csv
is given by
species_data <- read_csv(file.path("HW_data","SpeciesObservations-2018-11-07-14-23-54.csv"))
glimpse(species_data)
## Observations: 55,374
## Variables: 16
## $ `Scientific name` <chr> "Lepus europaeus", "Larus fuscus", "Laru…
## $ `Common name` <chr> "fälthare", "silltrut", "silltrut", "bre…
## $ `Organism group` <chr> "Däggdjur", "Fåglar", "Fåglar", "Kärlväx…
## $ `Occurrence status` <chr> "Present", "Present", "Present", "Presen…
## $ `Recorded by` <chr> "Henry Gudmundson", "Pär Grönhagen", "To…
## $ Locality <chr> "Brännkyrka sn, Enskedefältet", "Västber…
## $ County <chr> "Stockholm", "Stockholm", "Stockholm", "…
## $ `Coordinate uncertainty (m)` <dbl> 10, 1962, 1962, 10, 70, 70, 70, 70, 70, …
## $ X <dbl> 2010152, 2004678, 2004678, 2006696, 2006…
## $ Y <dbl> 8242684, 8243673, 8243673, 8249223, 8249…
## $ Start <chr> "06/04/2018", "29/03/2018", "24/06/2018"…
## $ End <chr> "06/04/2018", "29/03/2018", "24/06/2018"…
## $ Dataset <chr> "Species Observations System (Artportale…
## $ `Uncertain determination` <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
## $ Identification <chr> "Unvalidated", "Unvalidated", "Unvalidat…
## $ ObservationId <dbl> 113645773, 113482322, 115220647, 1158302…
There are a few things we would like to fix initially:
Some variable names contains spaces, which can be inconvenient. Either replace these spaces in variable names with underscores or use backtick quotation in your dplyr code, i.e. `Common name`
.
The Start
and End
variables should be of Date-format, not chr (depending on your local settings they may actually be correctly interpreted), use either as.Date
or lubridate
functionality within a mutate to fix this.
Also turn Uncertain determination
into a logical variable using as.logical
.
Create a new variable Date
that equals the value of Start
, in the following this will represent the observation date.
List the most recorded bird species (Organism_group == "Fåglar"
) in January and in July (two tables with Common_name and number observed as columns). The function knitr::kable is useful for rendering tables in Markdown.
Introduce a variable Weekday by applying the function weekdays to Date. Visualise the weekly activity of observers by a bar graph showing the number of bird sightings each day of the week. Make sure the bars are ordered from Monday to Sunday (see textbook 15.4).
The Willow warbler (lövsångare, Phylloscopus trochilus) is one of the most common migratory birds in Sweden, spending winters in sub-Saharan Africa. List the first five (unique) observers (Recorded_by
) recording a Willow warbler in Stockholm, together with the date recorded.
Plot the monthly number of recorded “lövsångare” and “talgoxe” in the same figure.
In this exercise we re-visit the Rubik’s cube setup from Homework 4, Exercise 1.
Write stringr code which, given an algorithm, returns the left-right (L/R) mirrored version of the algorithm. See for example http://cube.crider.co.uk/algtrans.html for what the L/R mirrored version is. Determine the L/R mirrored version of the T-perm using your code.
Use the http://cube.crider.co.uk/visualcube.php API to generate a .png image of applying the L/R mirrored version of the T-Perm to a solved cube through R. Include the resulting image using knitr::include_graphics()
. Hint: Just use file.download
on the appropriate URL.
In this exercise we study longitudinal data on the protein content of milk from 79 cows on three different diets. The protein content is measured per cow for each week from 1 week up to 19 weeks after calving.
nlme
package and load the Milk
data from the package. Write code which computes the number of cows in the dataset and report the result (it should be 79).geom_smooth
to show a smooth curve for the expectation in each of the 3 diet groups. Which diet appears to provide the highest level of protein in the milk?spread
to convert the data into wide format (Milk_wide
), such that each cow has one row. Study the NA
s in the resulting dataset and describe which two patterns you observe and how these patterns are probably caused. Hint: Before spread
works on the milk dataset, you might need to massage the input data slightly.Milk_long
) now contain appropriate NA
s. Use the naniar
package to extend your plot from task a. using the naniar::geom_miss_point
geometry. Comment on the patterns you see in this plot.