web scraping

COVID-19 Scraping

NB: This was last updated on March 25, 2020. Building Oregon COVID data I have a few days of data now. To rebuild it, I will have to use the waybackmachine. The files that I need to locate and follow updates to this page from Oregon’s OHA. A Scraper Let me explain the logic for the scraper. NB: I had to rewrite it; the original versions of the website had three tables without data on hospitalizations.

Visualising COVID-19 in Oregon

Oregon COVID data I now have a few days of data. These data are current as of March 24, 2020. I will present the first version of these visualizations here and then move the auto-update to a different location. A messy first version of the scraping exercise is at the bottom of this post. paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData") ## [1] "https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID2020-03-24.RData" load(url(paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData"))) A base map Load the tigris library then grab the map as an sf object; there is a geom_sf that makes them easy to work with.

a quick tidyTuesday on Passwords

First, I wanted to acquire the distribution of letters and then play with that. I embedded the result here. The second step is to import the tidyTuesday data. library(tidyverse) Letter.Freq <- data.frame(stringsAsFactors=FALSE, Letter = c("E", "T", "A", "O", "I", "N", "S", "R", "H", "D", "L", "U", "C", "M", "F", "Y", "W", "G", "P", "B", "V", "K", "X", "Q", "J", "Z"), Frequency = c(12.02, 9.1, 8.12, 7.68, 7.31, 6.95, 6.28, 6.

Scraping EPL Salary Data

EPL Scraping In a previous post, I scraped some NFL data and learned the structure of Sportrac. Now, I want to scrape the available data on the EPL. The EPL data is organized in a few distinct but potentially linked tables. The basic structure is organized around team folders. Let me begin by isolating those URLs. library(rvest) library(tidyverse) base_url <- "http://www.spotrac.com/epl/" read.base <- read_html(base_url) team.URL <- read.base %>% html_nodes(".team-name") %>% html_attr('href') team.