group_by
and summarise
, more ggplot2
Read R4DS chapters 5.6-5.7, 3.7-3.10.
Solve the Grouping and Summarizing and Types of vizualisations chapters of the Introduction to the Tidyverse course at DataCamp.
Open your Class_files
-project and “Pull Branch” (under Tools > Version control in RStudio) in order to make sure you have files ready and updated.
The script Class_files/SR_music.R
contains a simple function get_SR_music
for grabbing music played on Swedish Radio channels from their open API. Load it by
source("Class_files/SR_music.R")
and grab e.g. the songs on P3 (channel 164) at the turn of 2018, i.e. 2018-12-31, by
get_SR_music(channel = 164, date = "2018-12-31") %>% select(title, artist, start_time) %>% head()
## title artist start_time
## 1 Waste It On Me Steve Aoki feat. BTS 2018-12-31 23:56:16
## 2 Taki Taki DJ Snake 2018-12-31 23:55:56
## 3 Let You Love Me Rita Ora 2018-12-31 23:52:35
## 4 Pillowtalk ZAYN 2018-12-31 23:49:17
## 5 In My Mind Dynoro 2018-12-31 23:46:35
## 6 Ruin My Life Zara Larsson 2018-12-31 23:42:56
If you want multiple dates, the map
-functions from the purrr
-package (included in the tidyverse
) are convenient (more about these later on in the course). Grabbing music played from e.g. 2019-01-01 to 2019-01-07 into music
is done by
days <- seq(as.Date("2019-01-01"), as.Date("2019-01-07"), "days")
music <- map_df(days, get_SR_music, channel = 164)
Note: Data is not entirely clean and the same artist/song may be coded in multiple ways (e.g. Cherrie & Z.E.
, Cherrie, Z.e
and Cherrie, Z.E
). You may ignore this for now.
start_time
s are distributed over the day. Repeat for another channel, e.g. P2 (channel 163). You can grab components of a date-time (POSIXct) object with format
as inas.POSIXct("2019-01-01 23:57:04 CET") %>% format("%H:%M")
## [1] "23:57"
for extracting the hour and minute, see ?format.POSIXct
for more examples. Note that the above code results in a value of character-type, you may want to further convert to numeric format (e.g. minutes or hours after midnight) before plotting. The tidyverse package for date formatting is called lubridate
and has function to extract hours and minutes as well, e.g.
as.POSIXct("2019-01-01 23:57:04 CET") %>% lubridate::hour()
## [1] 23
Kammarkollegiet is a public agency that among other things issue insurances. The file Class_files/claims.csv
contains data on claims from one of their personal insurances. Each claim has an unique Claim id
, a Claim date
, a Closing date
and a number of Payment
s disbursed at Payment date
s. If the claim is not closed (there may be more payments coming) Closing date
is given value NA
. Null claims, i.e. claims that has been closed without payment, are not included.
Read the data by
claim_data <- read_csv("Class_files/claims.csv")
Claim id
should only be counted once!).Actuaries are very fond of loss triangles. This is a table where the value on row \(i\), column \(j\) is the sum of all payments on claims with Claim date
in year \(i\) that are disbursed until the \(j\):th calendar year after the year of the claim/accident. The table will be a triangle since future payments are not available.
knitr::kable
. Try to do it in a single sequence of pipes. If future payments are coded as NA
, using options(knitr.kable.NA = '')
will result in a nicer looking table.All political parties participating in the 2018 Swedish elections can be downloaded from Valmyndigheten by
parties_2018 <- read_csv2("https://data.val.se/val/val2018/valsedlar/partier/deltagande_partier.skv",
locale = locale("sv", encoding = "ISO-8859-1"))
How many unique parties participated in each of the three elections (VALTYP
equals R
for Riksdagen, L
for Landsting and K
for Kommun)? Note that the same party may appear multiple times (based on e.g. multiple reasons of inclusion in DELTAGANDEGRUND
)
How many local parties (parties only participating within a single VALKRETSKOD
) participated in the Kommunalval (VALTYP
equals K
)?
As in last class load Systembolaget’s assortment and select the regular product range.
Varugrupp
)? Use filter
and is.na
to filter out beverages where Varugrupp
is not available.PrisPerLiter
for each vintage and visualise using ggplot
.PrisPerLiter
) in each Varugrupp
.