tidyverse | robertwwalker.work

Non-Profits in Oregon: Socrata is Cool

Socrata: The Open Data Portal I did not previously know much about precisely how open data portals had evolved. Oregon’s is quite nice and I will take the opportunity to map and summarise non-profits throughout the state. Here is the data. library(RSocrata) Oregon.Nonprofits <- read.socrata("https://data.oregon.gov/resource/8kyv-b2kw.csv") glimpse(Oregon.Nonprofits) ## Rows: 163,489 ## Columns: 18 ## $ registry_number <int> 299818, 299818, 299818, 299818, 299818, 5… ## $ business_name <chr> "UNITED METHODIST CHURCH, OREGON CITY, OR… ## $ entity_type <chr> "DOMESTIC NONPROFIT CORPORATION", "DOMEST… ## $ registry_date <chr> "1850-05-17 00:00:00", "1850-05-17 00:00:… ## $ nonprofit_type <chr> "RELIGIOUS WITH MEMBERS", "RELIGIOUS WITH… ## $ associated_name_type <chr> "MAILING ADDRESS", "PRESIDENT", "PRINCIPA… ## $ first_name <chr> "", "MIKE", "", "MIKE", "CHRISTA", "", "S… ## $ middle_name <chr> "", "", "", "", "", "", "E", "", "", "", … ## $ last_name <chr> "", "BENISCHEK", "", "BENISCHEK", "PALMER… ## $ suffix <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ not_of_record_entity <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ entity_of_record_reg_number <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… ## $ entity_of_record_name <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ address <chr> "18955 S SOUTH END RD", "18955 S SOUTH EN… ## $ address_continued <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ city <chr> "OREGON CITY", "OREGON CITY", "OREGON CIT… ## $ state <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR",… ## $ zip_code <chr> "97045", "97045", "97045", "97045", "9704… A basic zip code map or_zips <- zctas(cb = TRUE, starts_with = "97", class="sf") or_zips %>% ggplot(.

A GeoFacet of Credit Quality

In previous work with Skip Krueger, we conceptualized bond ratings as a multiple rater problem and extracted measure of state level creditworthiness. I had always had it on my list to do something like this and recently ran across a package called geofacet that makes it simply to easy to do. So here goes. The code is below the post. library(haven) library(dplyr) Pew.Data <- read_dta(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/Pew/modeledforprediction.dta")) library(tidyverse) load(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/Pew/Scaled-BR-Pew.RData")) state.ratings <- data.

COVID-19 Scraping

NB: This was last updated on March 25, 2020. Building Oregon COVID data I have a few days of data now. To rebuild it, I will have to use the waybackmachine. The files that I need to locate and follow updates to this page from Oregon’s OHA. A Scraper Let me explain the logic for the scraper. NB: I had to rewrite it; the original versions of the website had three tables without data on hospitalizations.

Visualising COVID-19 in Oregon

Oregon COVID data I now have a few days of data. These data are current as of March 24, 2020. I will present the first version of these visualizations here and then move the auto-update to a different location. A messy first version of the scraping exercise is at the bottom of this post. paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData") ## [1] "https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID2020-03-24.RData" load(url(paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData"))) A base map Load the tigris library then grab the map as an sf object; there is a geom_sf that makes them easy to work with.

Mapping COVID-19 in Oregon

Oregon COVID data The Oregon data are available from OHA here. I cut and pasted the first two days because it was easy with datapasta. As it goes on, it was easier to write a script that I detail elsewhere that I can self-update. urbnmapr The Urban Institute has an excellent state and county mapping package. I want to make use of the county-level data and plot the starter map.

tidyTuesday on the Office

The Office library(tidyverse) office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv') A First Plot The number of episodes for the Office by season. library(janitor) TableS <- office_ratings %>% tabyl(season) p1 <- TableS %>% ggplot(., aes(x=as.factor(season), y=n, fill=as.factor(season))) + geom_col() + labs(x="Season", y="Episodes", title="The Office: Episodes") + guides(fill=FALSE) p1 Ratings How are the various seasons and episodes rated? p2 <- office_ratings %>% ggplot(., aes(x=as.factor(season), y=imdb_rating, fill=as.factor(season), color=as.factor(season))) + geom_violin(alpha=0.3) + guides(fill=FALSE, color=FALSE) + labs(x="Season", y="IMDB Rating") + geom_point() p2 Patchwork Using patchwork, we can combine multiple plots.

Tracking COVID-19 2020-03-24

R to Import COVID Data library(tidyverse) library(gganimate) COVID.states <- read.csv(url("http://covidtracking.com/api/states/daily.csv")) COVID.states <- COVID.states %>% mutate(Date = as.Date(as.character(date), format = "%Y%m%d")) The Raw Testing Incidence I want to use patchwork to show the testing rate by state in the United States. Then I want to show where things currently stand. In both cases, a base-10 log is used on the number of tests.

Quick and Dirty Fredr

Some Data from FREDr Downloading the FRED data on national debt as a percentage of GDP. I first want to examine the US data and will then turn to some comparisons. fredr makes it markable asy to do! I will use two core tools from fredr. First, fredr_series_search allows one to enter search text and retrieve the responsive series given that search text. They can be sorted in particular ways, two such options are shown below.

The Carbon Footprint of Food Produced for Consumption

tidyTuesday on the Carbon Footprint of Feeding the Planet The tidyTuesday for this week relies on data scraped from the Food and Agricultural Organization of the United Nations. The blog post for obtaining the data can be found on r-tastic. The scraping exercise is nice and easy to follow and explored a case of cleaning up a very messy data structure. I took this exercise as practice for using pivot_wider and pivot_longer.

Mapping San Francisco Trees

Trees in San Francisco This week’s data cover trees in San Francisco. sf_trees <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-28/sf_trees.csv') library(tidyverse); library(ggmap); library(skimr) skim(sf_trees) Table 1: Data summary Name sf_trees Number of rows 192987 Number of columns 12 _______________________ Column type frequency: character 6 Date 1 numeric 5 ________________________ Group variables None Variable type: character skim_variable n_missing complete_rate min max empty n_unique whitespace legal_status 54 1.