tidyTuesday | robertwwalker.work

Dog Movements: a tidyTuesday

Adoptable Dogs # devtools::install_github("thebioengineer/tidytuesdayR", force=TRUE) tuesdata51 <- tidytuesdayR::tt_load(2019, week = 51) dog_moves <- tuesdata51$dog_moves dog_des <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-17/dog_descriptions.csv') library(tidyverse); library(scatterpie) library(rgeos) library(maptools) library(rgdal); library(usmap); library(ggthemes) The Base Map My.Map <- us_map(regions = "states") Base.Plot <- ggplot() + geom_polygon(data=My.Map, aes(x=x, y=y, group=group), fill="white", color="black") + theme_map() Base.Plot A fifty state map to plot this information on. New.Dat <- left_join(My.Map, dog_moves, by= c("full" = "location")) ggplot() + geom_polygon(data=New.

tidyTuesday Measles

tidyTuesday: December 10, 2019 Replicating plots from simplystatistics. One nice twist is the development of a tidytuesdayR package to grab the necessary data in an easy way. You can install the package via github. I will also use fiftystater and ggflags. devtools::install_github("thebioengineer/tidytuesdayR") devtools::install_github("ellisp/ggflags") devtools::install_github("wmurphyrd/fiftystater") tuesdata <- tidytuesdayR::tt_load(2019, week = 50) ## --- Downloading #TidyTuesday Information for 2019-12-10 ---- ## --- Identified 4 files available for download ---- ## --- Downloading files --- ## Warning in identify_delim(temp_file): Not able to detect delimiter for the file.

Trying out Leaflet

International Murders Are among the data for analysis in the tidyTuesday for December 10, 2019. These are made for a map. library(tidyverse) library(leaflet) library(stringr) library(sf) library(here) library(widgetframe) library(htmlwidgets) library(htmltools) options(digits = 3) set.seed(1234) theme_set(theme_minimal()) library(tidytuesdayR) tuesdata <- tt_load(2019, week = 50) murders <- tuesdata$gun_murders There isn’t much data so it should make this a bit easier. Now for some data. As it happens, the best way I currently know how to do this is going to involve acquiring a spatial frame.

Philadelphia Parking Tickets: a tidyTuesday

Philadelphia Map Use ggmap for the base layer. library(ggmap); library(osmdata); library(tidyverse) PHI <- get_map(getbb("Philadelphia, PA"), maptype = "stamen", zoom=12) Get the Tickets Data TidyTuesday covers 1.26 million parking tickets in Philadelphia. tickets <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv") ## Parsed with column specification: ## cols( ## violation_desc = col_character(), ## issue_datetime = col_datetime(format = ""), ## fine = col_double(), ## issuing_agency = col_character(), ## lat = col_double(), ## lon = col_double(), ## zip_code = col_double() ## ) Two Lines of Code Left library(lubridate); library(ggthemes) tickets <- tickets %>% mutate(Day = wday(issue_datetime, label=TRUE)) # use lubridate to extract the day of the month.

US Census Mapping

Searching and Mapping the Census Searching for the Asian Population via the Census To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population. library(tidycensus); library(kableExtra) library(tidyverse); library(stringr) v10 <- load_variables(2010, "sf1", cache = TRUE) v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%") name label concept P012D026 Total!

The Economist's Visualization Errors

The Economist’s Errors and Credit Where Credit is Due The Economist is serious about their use of data visualization and they have occasionally owned up to errors in their visualizations. They can be deceptive, uninformative, confusing, excessively busy, and present a host of other barriers to clean communication. Their blog post on their errors is great. I have drawn the following example from a #tidyTuesday earlier this year that explores this.

tidyTuesday does Pizza

Pizza Ratings The #tidyTuesday for this week involves pizza shop ratings data. Let’s see what we have. pizza_jared <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-01/pizza_jared.csv") ## Parsed with column specification: ## cols( ## polla_qid = col_double(), ## answer = col_character(), ## votes = col_double(), ## pollq_id = col_double(), ## question = col_character(), ## place = col_character(), ## time = col_double(), ## total_votes = col_double(), ## percent = col_double() ## ) pizza_barstool <- readr::read_csv("https://raw.

tidyTuesday meets the Economics of Majors

This week’s tidyTuesday focuses on degrees and majors and their deployment in the labor market. The original data came from 538. A description of sources and measures. The tidyTesday writeup is here. library(tidyverse) options(scipen=6) library(extrafont) font_import() ## Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system. ## Continue? [y/n] Major.Employment <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-10-16/recent-grads.csv") library(skimr) skim(Major.Employment) Table 1: Data summary Name Major.

tidyTuesday: coffee chains

The tidyTuesday for this week is coffee chain locations For this week: 1. The basic link to the #tidyTuesday shows an original article for Week 6. First, let’s import the data; it is a single Excel spreadsheet. The page notes that starbucks, Tim Horton, and Dunkin Donuts have raw data available. library(readxl) library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4 ## ✓ tibble 3.

Global mortality tidyTuesday

tidyTuesday on Global Mortality The three generic challenge graphics involve two global summaries, a raw count by type and a percentage by type. The individual county breakdowns are recorded for a predetermined year below. This can all be seen in the original. For whatever reason, I cannot open this data remotely. Here is this week’s tidyTuesday. library(skimr) library(tidyverse) library(rlang) # global_mortality <- readRDS("../../data/global_mortality.rds") global_mortality <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/global_mortality.rds")) skim(global_mortality) Table 1: Data summary Name global_mortality Number of rows 6156 Number of columns 35 _______________________ Column type frequency: character 2 numeric 33 ________________________ Group variables None Variable type: character