R Markdown | robertwwalker.work

Non-Profits in Oregon: Socrata is Cool

Socrata: The Open Data Portal I did not previously know much about precisely how open data portals had evolved. Oregon’s is quite nice and I will take the opportunity to map and summarise non-profits throughout the state. Here is the data. library(RSocrata) Oregon.Nonprofits <- read.socrata("https://data.oregon.gov/resource/8kyv-b2kw.csv") glimpse(Oregon.Nonprofits) ## Rows: 163,489 ## Columns: 18 ## $ registry_number <int> 299818, 299818, 299818, 299818, 299818, 5… ## $ business_name <chr> "UNITED METHODIST CHURCH, OREGON CITY, OR… ## $ entity_type <chr> "DOMESTIC NONPROFIT CORPORATION", "DOMEST… ## $ registry_date <chr> "1850-05-17 00:00:00", "1850-05-17 00:00:… ## $ nonprofit_type <chr> "RELIGIOUS WITH MEMBERS", "RELIGIOUS WITH… ## $ associated_name_type <chr> "MAILING ADDRESS", "PRESIDENT", "PRINCIPA… ## $ first_name <chr> "", "MIKE", "", "MIKE", "CHRISTA", "", "S… ## $ middle_name <chr> "", "", "", "", "", "", "E", "", "", "", … ## $ last_name <chr> "", "BENISCHEK", "", "BENISCHEK", "PALMER… ## $ suffix <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ not_of_record_entity <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ entity_of_record_reg_number <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N… ## $ entity_of_record_name <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ address <chr> "18955 S SOUTH END RD", "18955 S SOUTH EN… ## $ address_continued <chr> "", "", "", "", "", "", "", "", "", "", "… ## $ city <chr> "OREGON CITY", "OREGON CITY", "OREGON CIT… ## $ state <chr> "OR", "OR", "OR", "OR", "OR", "OR", "OR",… ## $ zip_code <chr> "97045", "97045", "97045", "97045", "9704… A basic zip code map or_zips <- zctas(cb = TRUE, starts_with = "97", class="sf") or_zips %>% ggplot(.

Dog Movements: a tidyTuesday

Adoptable Dogs # devtools::install_github("thebioengineer/tidytuesdayR", force=TRUE) tuesdata51 <- tidytuesdayR::tt_load(2019, week = 51) dog_moves <- tuesdata51$dog_moves dog_des <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_descriptions.csv') library(tidyverse); library(scatterpie) library(rgeos) library(maptools) library(rgdal); library(usmap); library(ggthemes) The Base Map My.Map <- us_map(regions = "states") Base.Plot <- ggplot() + geom_polygon(data=My.Map, aes(x=x, y=y, group=group), fill="white", color="black") + theme_map() Base.Plot A fifty state map to plot this information on. New.Dat <- left_join(My.Map, dog_moves, by= c("full" = "location")) ggplot() + geom_polygon(data=New.

Trying out Leaflet

International Murders Are among the data for analysis in the tidyTuesday for December 10, 2019. These are made for a map. library(tidyverse) library(leaflet) library(stringr) library(sf) library(here) library(widgetframe) library(htmlwidgets) library(htmltools) options(digits = 3) set.seed(1234) theme_set(theme_minimal()) library(tidytuesdayR) tuesdata <- tt_load(2019, week = 50) murders <- tuesdata$gun_murders There isn’t much data so it should make this a bit easier. Now for some data. As it happens, the best way I currently know how to do this is going to involve acquiring a spatial frame.

Tables, Pivots, Bars, and Mosaics

R Markdown There is detailed help for all that Markdown can do under Help in the RStudio. The key to it is knitting documents with the Knit button in the RStudio. If we use helpers like the R Commander, Radiant, or esquisse, we will need the R code implanted in the Markdown document in particular ways. I will use Markdown for everything. I even use a close relation of Markdown in my scholarly pursuits.

The Economist's Visualization Errors

The Economist’s Errors and Credit Where Credit is Due The Economist is serious about their use of data visualization and they have occasionally owned up to errors in their visualizations. They can be deceptive, uninformative, confusing, excessively busy, and present a host of other barriers to clean communication. Their blog post on their errors is great. I have drawn the following example from a #tidyTuesday earlier this year that explores this.

nflscrapR is amazing

Scraping NFL data Note: An original version of this post had issues induced by overtime games. There is a better way to handle all of this that I learned from a brief analysis of a tie game between Cleveland and Pittsburgh in Week One. The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github.

Visualisation with Archigos: Leaders of the World

Archigos Is an amazing collaboration that produced a comprehensive dataset of world leaders going pretty far back; see Archigos on the web. For thinking about leadership, it is quite natural. In this post, I want to do some reshaping into country year and leader year datasets and explore the basic confines of Archigos. I also want to use gganimate for a few things. So what do we know?

fredr is very neat

FRED via fredr The Federal Reserve Economic Database [FRED] is a wonderful public resource for data and the r api that connects to it is very easy to use for the things that I have previously needed. For example, one of my students was interested in commercial credit default data. I used the FRED search instructions from the following vignette to find that data. My first step was the vignette for using fredr.

Trump's Tweets, Part II

Trump’s Tone A cool post on sentiment analysis can be found here. I will now get at the time series characteristics of his tweets and the sentiment stuff. I start by loading the tmls object that I created in the previous post. Trump’s Overall Tweeting What does it look like? library(tidyverse) library(tidytext) library(SnowballC) library(tm) library(syuzhet) library(rtweet) load(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/TMLS.RData")) names(tml.djt) ## [1] "user_id" "status_id" ## [3] "created_at" "screen_name" ## [5] "text" "source" ## [7] "display_text_width" "reply_to_status_id" ## [9] "reply_to_user_id" "reply_to_screen_name" ## [11] "is_quote" "is_retweet" ## [13] "favorite_count" "retweet_count" ## [15] "hashtags" "symbols" ## [17] "urls_url" "urls_t.

Trump Tweet Word Clouds

Mining Twitter Data Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR but it was harder to work with for what I wanted.