Day 10: Collecting data from the internet using APIs

During class

a-pi

Write a function

get_pi <- function(start, numberOfDigits){
    ...
}

that calls a-pi and returns the digits of \(\pi\) from start to start + numberOfDigits - 1. A sample call is https://api.pi.delivery/v1/pi?start=1000&numberOfDigits=5.

TimeEdit

A TimeEdit-schedule can be obtained in JSON-format by changing .html to .json in the url, try in a web-browser with https://cloud.timeedit.net/su/web/stud1/ri107455X48Z06Q5Z16g3Y05y5006Y48Q02gQY6Q55727.html

Part 1: We can use GET to import it to R

library(httr)
schema_response <- GET("https://cloud.timeedit.net/su/web/stud1/ri107455X48Z06Q5Z16g3Y05y5006Y48Q02gQY6Q55727.json")
schema_json <- content(schema_response, "text")

The result can be explored with jsonedit in the package listviewer

library(knitr)
library(tidyverse)
library(jsonlite)
library(listviewer)
jsonedit(schema_json)

We convert it with fromJSON and choose the reservations

schema_df <- fromJSON(schema_json)$reservations

The result is now a data.frame with six columns, where the last column contains a vector in each cell. In order to extract elements from this column, we use mutate in combination map_chr. The family of map-functions comes from the purrr package which is part of the tidyverse, more abot them later. Here map_chr(columns, 1) corresponds to sapply(columns, function (x) x[1]) in base-R.

schema_df %>% mutate(sal = map_chr(columns, 3), 
                     kurs = map_chr(columns, 1), 
                     tid = paste(starttime, endtime, sep = " - ")) %>% 
    select(kurs, datum = startdate, tid, sal) %>% 
    kable()

kurs	datum	tid	sal
MT5013	2019-11-07	13:15 - 16:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-08	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-12	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-13	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-19	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-20	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-22	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-26	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-11-27	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-04	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-06	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-10	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-11	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-13	09:15 - 12:00	Sal 22. Kräftriket hus 5
MT5013	2019-12-18	15:00 - 18:00	Sal 36. Kräftriket hus 5
MT5013	2020-01-15	14:00 - 17:00	Sal 36. Kräftriket hus 5
MT5013	2020-01-17	09:00 - 15:00	Sal 14. Kräftriket hus 5

Fun fact: The schedule of our github page is based on such a TimeEdit API call - see schedule.html, which makes it easy to adapt the page each year the course is given.

Part 2: The schedule for room 14 during the 2nd part of the winter semester can be found on

https://cloud.timeedit.net/su/web/stud1/ri107455X28Z07Q5Z76g0Y05y5076Y31Q09gQY6Q55777.html

What teacher spends most time teaching in room 14 during the 2nd part of the winter semester?

SCB

If you generate a table at Statistikdatabasen, you will find a link “API för denna tabell” that gives an url and a query to be made by POST in order to fetch the table. Try fetching a table with httr::POST, the query should be placed in the body.

Note: at the end of the query you may change "format": "px" to "format": "json" in order to get a reply in JSON-format. Fetch a table, examine its structure and try to extract a suitable data.frame. See SCB_KPI_API.R for a simple example.

Note: A dedicated R package pxweb exists for querying the SCB database through the API in a slightly more comfortable way. See the vignette for a demonstratio.

Latest day at Bromma

The last days hourly temperatures (parameter 1) at Bromma (station 97200) can be fetched from SMHI by (switch xml for json if you want to change format)

temp_response <- GET("https://opendata-download-metobs.smhi.se/api/version/1.0/parameter/1/station/97200/period/latest-day/data.xml")
http_type(temp_response)

## [1] "application/xml"

We extract the XML-content by

library(xml2)
temp_xml <- read_xml(temp_response)
class(temp_xml)

## [1] "xml_document" "xml_node"

The structure can be viewed by opening

https://opendata-download-metobs.smhi.se/api/version/1.0/parameter/1/station/97200/period/latest-day/data.xml

in your web-browser. We see that temperatures can be found in XPATH "/metObsSampleData/value/value":

xml_ns_strip(temp_xml) # Överkurs
xml_find_all(temp_xml, "/metObsSampleData/value/value")

## {xml_nodeset (25)}
##  [1] <value>3.0</value>
##  [2] <value>2.9</value>
##  [3] <value>3.0</value>
##  [4] <value>3.1</value>
##  [5] <value>3.4</value>
##  [6] <value>3.5</value>
##  [7] <value>3.6</value>
##  [8] <value>3.5</value>
##  [9] <value>3.4</value>
## [10] <value>3.4</value>
## [11] <value>3.6</value>
## [12] <value>4.0</value>
## [13] <value>4.4</value>
## [14] <value>5.0</value>
## [15] <value>5.3</value>
## [16] <value>5.3</value>
## [17] <value>5.4</value>
## [18] <value>5.0</value>
## [19] <value>4.9</value>
## [20] <value>5.2</value>
## ...

(use xml_text to get the values).

Also extract the times.
Try with another station/parameter (see SMHI). Also try the JSON-format.

Systembolag

Systembolaget’s API uses XML, the list of stores from HW4 can be fetched by

stores_response <- GET("https://www.systembolaget.se/api/assortment/stores/xml")
http_type(stores_response)

## [1] "application/xml"

We extract XML with

stores_xml <- read_xml(stores_response)

and look at the first with

xml_find_first(stores_xml, "/ButikerOmbud/ButikOmbud")

## {xml_node}
## <ButikOmbud type="StoreAssortmentViewModel">
##  [1] <Typ>Butik</Typ>
##  [2] <Nr>0102</Nr>
##  [3] <Namn>Fältöversten</Namn>
##  [4] <Address1>Karlaplan 13</Address1>
##  [5] <Address2/>
##  [6] <Address3>115 20</Address3>
##  [7] <Address4>STOCKHOLM</Address4>
##  [8] <Address5>Stockholms län</Address5>
##  [9] <Telefon>08/662 22 89</Telefon>
## [10] <ButiksTyp/>
## [11] <Tjanster/>
## [12] <SokOrd>STOCKHOLM;STHLM;ÖSTERMALM;KARLAPLANSRONDELLEN;FÄLTAN</SokOrd>
## [13] <Oppettider>2019-11-26;10:00;19:00;;;0;_*2019-11-27;10:00;19:00;;;0;_*20 ...
## [14] <RT90x>6582011</RT90x>
## [15] <RT90y>1630064</RT90y>

In order to extract the names we may use

xml_find_all(stores_xml, "//Namn")[1:10]

## {xml_nodeset (10)}
##  [1] <Namn>Fältöversten</Namn>
##  [2] <Namn/>
##  [3] <Namn>Garnisonen</Namn>
##  [4] <Namn>Norra Djurgårdsstaden</Namn>
##  [5] <Namn/>
##  [6] <Namn>Sergel</Namn>
##  [7] <Namn>PK-Huset</Namn>
##  [8] <Namn/>
##  [9] <Namn>Marieberg</Namn>
## [10] <Namn/>

evidently, not all stores have names.

Convert the result to a data.frame.