dplyr
and ggplot2
.Read R4DS chapters 3.1-3.6, 5.1-5.5
Complete assignments Data wrangling and Data visualization (first two chapters of Introduction to the Tidyverse) at DataCamp.
Start by creating a new R-project “Classroom” that you will use for your class activities. Some activities will require data or scripts from the repo Class_files
, we therefore recommend that you clone this into a subfolder of your Classroom
directory. Now create a new R Markdown document Class1.Rmd
where you will do your work for this class.
Systembolaget’s assortment of beverages from 2019-10-30 is available in the file Class_files/systembolaget2019-10-30.csv
. It is downloaded from Systembolaget’s public API and saved in csv-format by the script Class_files/Systembolaget.R
. Load its contents by
# Define date when scraping took place - can then be easily changed.
date_systembolaget_scrape <- "2019-10-30"
library(tidyverse)
file_name <- paste0("systembolaget",date_systembolaget_scrape,".csv")
Sortiment_hela <- read_csv(file.path("Class_files", file_name))
arrange
, filter
, mutate
, select
, %>%
)The variable Alkoholhalt
(alcohol by volume) has been classified as character
by read_delim
, since it contains a percent sign. Convert it to numeric using mutate
by first removing the percent sign (e.g. with gsub
) and then transform with as.numeric
.
A few wines are labelled as Röda - lägre alkoholhalt
and Vita - lägre alkoholhalt
instead of Rött vin
(red wine) respektive Vitt vin
(white wine) in the Varugrupp
(group of products) column. Merge these wines into Rött vin
and Vitt vin
, respectively, e.g. by using mutate
and ifelse
.
What beverage has the highest PrisPerLiter
? Display the answer (the Namn
of the beverage) as dynamically coded in the text body of your .Rmd
-document.
Create a new data frame Sortiment_ord
with the regular product range (where SortimentText
equals Ordinarie sortiment
). Make a table (with kable
from the knitr
-library) of the 10 most expensive (PrisPerLiter
) beverages from this range. Use select
to select suitable columns for the table.
if you have not already done so, write the code from the previous excercise using a sequence of pipes (%>%
).
ggplot
, geom_point
, geom_line
, facet_wrap
)For the regular product range in Sortiment_ord
PrisPerLiter
against Alkoholhalt
, color the points by Varugrupp
and consider using a log-scale for PrisPerLiter
.PrisPerLiter
(possibly on a log-scale) against Varugrupp
. Consider coord_flip
to improve readability.c("Vitt vin", "Rött vin", "Rosévin", "Mousserande vin")
of vintage (Argang
) 2010-2019, plot PrisPerLiter
against Argang
. Try both using a facet
for each group and coloring by group in the same facet.The Stockholm international film festival takes place early November each year. In Class_files/Film_events_2018-11-07.csv
you will find their event schedule for the 2018 edition scraped on 2018-11-07. The data were obtained by a query using the Stockholm Film Festival API for developers.
arrange
, filter
, mutate
, select
, %>%
)The file Class_files/Winter_medals2019-10-30.csv
contains the number of medals per country and olympic year at the winter olympics since 1980 together with the total population of the country. The data set is scraped from Wikipedia using the script Class_files/Winter_medals.R
which contains more information, in particular on countries that has been split or joined during the period.
Load the file using
winter_medals <- read_csv("Class_files/Winter_medals2018-09-26.csv")
arrange
, filter
, mutate
, select
, %>%
)medals_per_mill
, the number of medals per million inhabitants.medals_per_mill
, during the 2018 Winter olympics.ggplot
, geom_point
, geom_line
, facet_wrap
)?geom_point
for a list of aesthetics geom_point
understands).facet
” for each of Sweden, Norway and Finland.Use ggplot
to recreate (static versions) of some figures from Hans Rosling’s talks. Data is available in the package gapminder
.