tidyverse | robertwwalker.work

Some Basic Text on the Mueller Report

So this Robert Mueller guy wrote a report I may as well analyse it a bit. First, let me see if I can get a hold of the data. I grabbed the report directly from the Department of Justice website. You can follow this link. library(tidyverse) library(pdftools) # Download report from link above mueller_report_txt <- pdf_text("../data/report.pdf") # Create a tibble of the text with line numbers and pages mueller_report <- tibble( page = 1:length(mueller_report_txt), text = mueller_report_txt) %>% separate_rows(text, sep = "\n") %>% group_by(page) %>% mutate(line = row_number()) %>% ungroup() %>% select(page, line, text) write_csv(mueller_report, "data/mueller_report.

nflscrapR is amazing

Scraping NFL data Note: An original version of this post had issues induced by overtime games. There is a better way to handle all of this that I learned from a brief analysis of a tie game between Cleveland and Pittsburgh in Week One. The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github.

NFL ScrapR

Scraping NFL data with nflscrapr The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github. devtools::install_github(repo = "maksimhorowitz/nflscrapR") The github page for nflscrapR is quite informative. It has a lot of useful insight for working with the data; the set itself is quite large. Getting Some Data Following the guide to the package on GitHub, let me try their example.

Visualisation with Archigos: Leaders of the World

Archigos Is an amazing collaboration that produced a comprehensive dataset of world leaders going pretty far back; see Archigos on the web. For thinking about leadership, it is quite natural. In this post, I want to do some reshaping into country year and leader year datasets and explore the basic confines of Archigos. I also want to use gganimate for a few things. So what do we know?

fredr is very neat

FRED via fredr The Federal Reserve Economic Database [FRED] is a wonderful public resource for data and the r api that connects to it is very easy to use for the things that I have previously needed. For example, one of my students was interested in commercial credit default data. I used the FRED search instructions from the following vignette to find that data. My first step was the vignette for using fredr.

Stocks and gganimate

tidyquant Automates a lot of equity research and calculation using tidy concepts. Here, I will first use it to get the components of the S and P 500 and pick out those with weights over 1.25 percent. In the next step, I download the data and finally calculate daily returns and a cumulative wealth index. library(tidyquant) library(tidyverse) tq_index("SP500") %>% filter(weight > 0.0125) %>% select(symbol,company) -> Tickers Tickers <- Tickers %>% filter(symbol!

Trump Tweet Word Clouds

Mining Twitter Data Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR but it was harder to work with for what I wanted.

tidyTuesday: coffee chains

The tidyTuesday for this week is coffee chain locations For this week: 1. The basic link to the #tidyTuesday shows an original article for Week 6. First, let’s import the data; it is a single Excel spreadsheet. The page notes that starbucks, Tim Horton, and Dunkin Donuts have raw data available. library(readxl) library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4 ## ✓ tibble 3.

Global mortality tidyTuesday

tidyTuesday on Global Mortality The three generic challenge graphics involve two global summaries, a raw count by type and a percentage by type. The individual county breakdowns are recorded for a predetermined year below. This can all be seen in the original. For whatever reason, I cannot open this data remotely. Here is this week’s tidyTuesday. library(skimr) library(tidyverse) library(rlang) # global_mortality <- readRDS("../../data/global_mortality.rds") global_mortality <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/global_mortality.rds")) skim(global_mortality) Table 1: Data summary Name global_mortality Number of rows 6156 Number of columns 35 _______________________ Column type frequency: character 2 numeric 33 ________________________ Group variables None Variable type: character

Scraping the NFL Salary Cap Data with Python and R

The NFL Data [SporTrac](http://www.sportrac.com] has a wonderful array of financial data on sports. A student going to work for the Seattle Seahawks wanted the NFL salary cap data and I also found data on the English Premier League there. Now I have a source to scrape the data from. With a source in hand, the key tool is the SelectorGadget. SelectorGadget is a browser add-in for Chrome that allows us to select text and identify the css or xpath selector to scrape the data.