Spending on Kids

Last updated on Apr 17, 2021 4 min read R, tidyTuesday

Spending on Kids

First, let me import the data.

kids <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')
# kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')

Now let me summarise it and show a table of the variables.

summary(kids)

##     state             variable              year           raw          
##  Length:23460       Length:23460       Min.   :1997   Min.   :  -60139  
##  Class :character   Class :character   1st Qu.:2002   1st Qu.:   71985  
##  Mode  :character   Mode  :character   Median :2006   Median :  252002  
##                                        Mean   :2006   Mean   : 1181359  
##                                        3rd Qu.:2011   3rd Qu.:  836324  
##                                        Max.   :2016   Max.   :83666088  
##                                                       NA's   :102       
##     inf_adj         inf_adj_perchild  
##  Min.   :  -60799   Min.   :-0.01361  
##  1st Qu.:   85876   1st Qu.: 0.12456  
##  Median :  298778   Median : 0.32757  
##  Mean   : 1359983   Mean   : 0.91448  
##  3rd Qu.:  985049   3rd Qu.: 0.83362  
##  Max.   :84584960   Max.   :20.27326  
##  NA's   :102        NA's   :102

A table of the variables. The definitions are best found here.

table(kids$variable)

## 
##         addCC           CTC       edservs        edsubs       fedEITC 
##          1020          1020          1020          1020          1020 
##        fedSSI           HCD HeadStartPriv      highered           lib 
##          1020          1020          1020          1020          1020 
## Medicaid_CHIP  other_health othercashserv       parkrec          pell 
##          1020          1020          1020          1020          1020 
##        PK12ed     pubhealth          SNAP        socsec     stateEITC 
##          1020          1020          1020          1020          1020 
##     TANFbasic         unemp         wcomp 
##          1020          1020          1020

It is very tidy. It is probably better shown after a pivot. 50 states, the District of Columbia, and 20 years gives us 1,020 observations. Let me show it wide.

Big.Wide <- pivot_wider(kids, id_cols = c(state,year), names_from = "variable", values_from = c("raw","inf_adj","inf_adj_perchild"))
Big.Wide

## # A tibble: 1,020 x 71
##    state  year raw_PK12ed raw_highered raw_edsubs raw_edservs raw_pell
##    <chr> <int>      <dbl>        <dbl>      <dbl>       <dbl>    <dbl>
##  1 Alab…  1997    3271969       956505     107733      246057  120833.
##  2 Alas…  1997    1042311       209433       5550       52355    7575.
##  3 Ariz…  1997    3388165       847032     111735      170281  120450.
##  4 Arka…  1997    1960613       457171      62447      189808   65904.
##  5 Cali…  1997   28708364      6858657    1121672      943805  775292.
##  6 Colo…  1997    3332994       861733      84129       77419   79004.
##  7 Conn…  1997    4014870       502177      71053      138932   36453.
##  8 Dela…  1997     776825       185114      31284       81880    9965.
##  9 Dist…  1997     544051        56693          0           0   18972.
## 10 Flor…  1997   11498394      2039186     391935      269777  318611.
## # … with 1,010 more rows, and 64 more variables: raw_HeadStartPriv <dbl>,
## #   raw_TANFbasic <dbl>, raw_othercashserv <dbl>, raw_SNAP <dbl>,
## #   raw_socsec <dbl>, raw_fedSSI <dbl>, raw_fedEITC <dbl>, raw_CTC <dbl>,
## #   raw_addCC <dbl>, raw_stateEITC <dbl>, raw_unemp <dbl>, raw_wcomp <dbl>,
## #   raw_Medicaid_CHIP <dbl>, raw_pubhealth <dbl>, raw_other_health <dbl>,
## #   raw_HCD <dbl>, raw_lib <dbl>, raw_parkrec <dbl>, inf_adj_PK12ed <dbl>,
## #   inf_adj_highered <dbl>, inf_adj_edsubs <dbl>, inf_adj_edservs <dbl>,
## #   inf_adj_pell <dbl>, inf_adj_HeadStartPriv <dbl>, inf_adj_TANFbasic <dbl>,
## #   inf_adj_othercashserv <dbl>, inf_adj_SNAP <dbl>, inf_adj_socsec <dbl>,
## #   inf_adj_fedSSI <dbl>, inf_adj_fedEITC <dbl>, inf_adj_CTC <dbl>,
## #   inf_adj_addCC <dbl>, inf_adj_stateEITC <dbl>, inf_adj_unemp <dbl>,
## #   inf_adj_wcomp <dbl>, inf_adj_Medicaid_CHIP <dbl>, inf_adj_pubhealth <dbl>,
## #   inf_adj_other_health <dbl>, inf_adj_HCD <dbl>, inf_adj_lib <dbl>,
## #   inf_adj_parkrec <dbl>, inf_adj_perchild_PK12ed <dbl>,
## #   inf_adj_perchild_highered <dbl>, inf_adj_perchild_edsubs <dbl>,
## #   inf_adj_perchild_edservs <dbl>, inf_adj_perchild_pell <dbl>,
## #   inf_adj_perchild_HeadStartPriv <dbl>, inf_adj_perchild_TANFbasic <dbl>,
## #   inf_adj_perchild_othercashserv <dbl>, inf_adj_perchild_SNAP <dbl>,
## #   inf_adj_perchild_socsec <dbl>, inf_adj_perchild_fedSSI <dbl>,
## #   inf_adj_perchild_fedEITC <dbl>, inf_adj_perchild_CTC <dbl>,
## #   inf_adj_perchild_addCC <dbl>, inf_adj_perchild_stateEITC <dbl>,
## #   inf_adj_perchild_unemp <dbl>, inf_adj_perchild_wcomp <dbl>,
## #   inf_adj_perchild_Medicaid_CHIP <dbl>, inf_adj_perchild_pubhealth <dbl>,
## #   inf_adj_perchild_other_health <dbl>, inf_adj_perchild_HCD <dbl>,
## #   inf_adj_perchild_lib <dbl>, inf_adj_perchild_parkrec <dbl>

My brief plan

I recently came across a geofacet for R. I want to use it to plot a little bit of this data. If you want to get a head start, try install.packages("geofacet", dependencies=TRUE). You can google geofacet to get an idea of what a geofacet plot is. I will build one on the fly using a couple of tidy tools: filter, mutate, and joins and then put it together.

library(viridis)

## Loading required package: viridisLite

library(geofacet)
state_ranks %>% filter(variable=="education") %>% select(state,name) -> mergeMe
p1 <- kids %>% 
  left_join(., mergeMe, by = c('state' = 'name')) %>% 
  filter(variable=="PK12ed")%>% 
  ggplot(., aes(x=year, y=inf_adj_perchild, color=inf_adj_perchild)) + 
  geom_line() + 
  facet_geo(~state.y) + 
  labs(x="year", y="Inflation Adjust Expenditures per child", title="Pre-K Through 12 Education Spending", color="Spend per child", caption="Data from #tidyTuesday: @PieRatio") + 
  scale_color_viridis_c() + theme_void()
p1

An Animation

library(gganimate)
p2 <- p1 + transition_reveal(year)
p3 <- animate(p2, renderer = gifski_renderer())
save_animation(p3, file = "./GeoAnimFacet.gif")

Animation

Neat-o an Oregon Grid

This isn’t very good though….

load(url("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID2020-09-15.RData"))
OR.County.COVID %>% 
  mutate(County = str_replace(County, " ", "")) %>% 
  ggplot(., aes(x=date, y=Number.of.cases, color=Number.of.cases)) + 
  geom_line(size=1.5) + 
  facet_geo(~ County, grid = "us_or_counties_grid1", label = "name", scales = "free_y") +
  scale_color_viridis_c(option = "plasma") +
  theme_void()

## Warning: Removed 3 row(s) containing missing values (geom_path).

tidyTuesday tidyverse

Robert W. Walker

Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.

Spending on Kids

Spending on Kids

My brief plan

An Animation

Neat-o an Oregon Grid

Robert W. Walker

Associate Professor of Quantitative Methods

Related