tidyverse | robertwwalker.work

Dog Movements: a tidyTuesday

Adoptable Dogs # devtools::install_github("thebioengineer/tidytuesdayR", force=TRUE) tuesdata51 <- tidytuesdayR::tt_load(2019, week = 51) dog_moves <- tuesdata51$dog_moves dog_des <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_descriptions.csv') library(tidyverse); library(scatterpie) library(rgeos) library(maptools) library(rgdal); library(usmap); library(ggthemes) The Base Map My.Map <- us_map(regions = "states") Base.Plot <- ggplot() + geom_polygon(data=My.Map, aes(x=x, y=y, group=group), fill="white", color="black") + theme_map() Base.Plot A fifty state map to plot this information on. New.Dat <- left_join(My.Map, dog_moves, by= c("full" = "location")) ggplot() + geom_polygon(data=New.

The Generation Squeeze

Hashtag OKBoomer The generational banter that has followed the use of #OKBoomer reminded me of an interesting feature of US population data. I believe it to be true that Generation X has never and will never be the largest generation of Americans. There are tons of Millenials and Baby Boomers alike, though the rate of decline in the latter means that the former are about to surpass them. Or perhaps they have.

Fariss Human Rights Data with Animation

Fariss Data Is neat and complete. load("FarissHRData.RData") skimr::skim(HR.Data) Table 1: Data summary Name HR.Data Number of rows 11717 Number of columns 27 _______________________ Column type frequency: factor 1 numeric 26 ________________________ Group variables None Variable type: factor skim_variable n_missing complete_rate ordered n_unique top_counts COW_YEAR 0 1 FALSE 11717 100: 1, 100: 1, 100: 1, 100: 1 Variable type: numeric

Tables, Pivots, Bars, and Mosaics

R Markdown There is detailed help for all that Markdown can do under Help in the RStudio. The key to it is knitting documents with the Knit button in the RStudio. If we use helpers like the R Commander, Radiant, or esquisse, we will need the R code implanted in the Markdown document in particular ways. I will use Markdown for everything. I even use a close relation of Markdown in my scholarly pursuits.

The Economist's Visualization Errors

The Economist’s Errors and Credit Where Credit is Due The Economist is serious about their use of data visualization and they have occasionally owned up to errors in their visualizations. They can be deceptive, uninformative, confusing, excessively busy, and present a host of other barriers to clean communication. Their blog post on their errors is great. I have drawn the following example from a #tidyTuesday earlier this year that explores this.

nflscrapR is amazing

Scraping NFL data Note: An original version of this post had issues induced by overtime games. There is a better way to handle all of this that I learned from a brief analysis of a tie game between Cleveland and Pittsburgh in Week One. The nflscrapR package is designed to make data on NFL games more easily available. To install the package, we need to grab it from github.

Visualisation with Archigos: Leaders of the World

Archigos Is an amazing collaboration that produced a comprehensive dataset of world leaders going pretty far back; see Archigos on the web. For thinking about leadership, it is quite natural. In this post, I want to do some reshaping into country year and leader year datasets and explore the basic confines of Archigos. I also want to use gganimate for a few things. So what do we know?

Trump Tweet Word Clouds

Mining Twitter Data Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR but it was harder to work with for what I wanted.

tidyTuesday: coffee chains

The tidyTuesday for this week is coffee chain locations For this week: 1. The basic link to the #tidyTuesday shows an original article for Week 6. First, let’s import the data; it is a single Excel spreadsheet. The page notes that starbucks, Tim Horton, and Dunkin Donuts have raw data available. library(readxl) library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4 ## ✓ tibble 3.

Global mortality tidyTuesday

tidyTuesday on Global Mortality The three generic challenge graphics involve two global summaries, a raw count by type and a percentage by type. The individual county breakdowns are recorded for a predetermined year below. This can all be seen in the original. For whatever reason, I cannot open this data remotely. Here is this week’s tidyTuesday. library(skimr) library(tidyverse) library(rlang) # global_mortality <- readRDS("../../data/global_mortality.rds") global_mortality <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/global_mortality.rds")) skim(global_mortality) Table 1: Data summary Name global_mortality Number of rows 6156 Number of columns 35 _______________________ Column type frequency: character 2 numeric 33 ________________________ Group variables None Variable type: character