Solutions to the exercises of this homework 5 should, just as for HW1-HW4, be written in an R-Markdown document with output: github_document
. Both the R-Markdown document (.Rmd-file) and the compiled Markdown document (.md file), as well as any figures needed for properly rendering the Markdown file on GitHub, should be uploaded to your Homework repository as part of a HW4
folder. Code should be written clearly in a consistent style, see in particular Hadley Wickham’s tidyverse style guide. As an example, code should be easily readable and avoid unnecessary repetition of variable names.
Note that there are new data-sets available in the HW_data
repository. Downloading them by opening the associated R-project and issue a “pull”. If it fails, delete the HW_data
folder on your computer and clone the repository again according to the instructions in HW2.
Deadline for the homework is 2019-12-08 at 23.59. Submission occurrs as usual by creating a new issue with the title “HW5 ready for grading” in your repository. Please also add a link from your repository’s README.md
file to HW5/HW5.md
.
The file ../HW_data/LoofLofvenTweets.Rdata
contains tables Loof
and Lofven
of tweets during the period from 2018-11-20 to 2018-11-30 mentioning “Lööf” and “Löfven”, respectively. The data were fetched from the Twitter API using the R package rtweet
, which provides a convenient R access point to the twitter API. Load the data using the R function load
.
Construct a table tweets
that joins the two tables and contains a variable Person
identifying whether the observation comes from the “Lööf” of “Löfven” table. Tweets common to both tables should not be included in the join.
Illustrate how the intensity of tweets containing the word “statsminister” (or “Statsminister”) has evolved in time for the Person
:s using, e.g., histograms with time on the x-axis.
Compute and plot the daily average sentiment of words in the tweet texts for the two Person:s. We define the average sentiment as the average strength of words common to the text and the sentiment lexicon at https://svn.spraakdata.gu.se/sb-arkiv/pub/lmf/sentimentlex/sentimentlex.csv. Note that the function separate_rows
can be useful in splitting the text into words.
The 2019 Nobel lectures start this week. The Nobel foundation even maintains an API to look up information about the Nobel Laureates.
Fetch a list in JSON format with information on the Nobel prizes in Literature from The Nobel Prize API.
Extract all the prize motivations from the JSON-list, convert into a character vector of words, remove stop words and visualise the relative frequencies of remaining words in a word-cloud. R-packages for plotting word clouds include e.g. wordcloud, wordcloud2 and ggwordcloud and a list of stop words can be fetched by
stop_words_url <- "https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.txt"
stopwords <- read_table(stop_words_url, col_names = "words")
The repo to student review will be assigned at 2019-12-09. Deadline for the peer-review is Wed, 2019-12-11 at 12:00 (noon). The specific tasks to do during peer review:
The main purpose of the peer review is to encourage you to read/consider other people’s coding solutions.
As for the review itself, focus on the positive sides and leave the “marking” to the teachers. You will not be graded by the length of your review. Sometimes a quick “Nice work!” with 1-2 examples of what was nice is sufficient and it may even preferable to a long review submitted late or not submitted at all.
Furthermore, please connect your comments to a particular revision, you can use the commithash to refer to the commit in the issue, i.e. like
I’ve reviewed your commit 287b1ae. Nice work! What I find cool in your code is that…
The commit hash can be found in the upper right corner when looking at the file on GitHub (see screenshot below).
This helps put your comments in context in case future revisions change fundamental things you commented. Reviews in Swedish are fine!