Resource to support my MBA data analysis course.
This is a more convenient place than the WISE ecosystem to host materials. The
R directory
also contains some useful howtos.
Stuff
Alumni Giving Prediction Example
A Linear Model
Mod.AG <- lm(Giving~SFR+Small.Classes+Big.Classes+Graduation.Rate+Freshman.Retention+Special, data=AlumniGiving)
summary(Mod.AG)
## ## Call:
## lm(formula = Giving ~ SFR + Small.Classes + Big.Classes + Graduation.Rate + ## Freshman.Retention + Special, data = AlumniGiving)
## ## Residuals:
## Min 1Q Median 3Q Max ## -0.124888 -0.030048 -0.005409 0.027063 0.145876 ## ## Coefficients:
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.188483 0.096503 -1.953 0.05317 . ## SFR -0.
Searching for the Asian Population via the Census
To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population.
library(tidycensus); library(kableExtra)
library(tidyverse); library(stringr)
v10 <- load_variables(2010, "sf1", cache = TRUE)
v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%")
name
label
concept
P012D026
Total!
A Citation
I found a starting point on local maps in Seattle.
library(ggmap)
## Loading required package: ggplot2
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(osmdata)
## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.
A Citation
I found a starting point on local maps in Seattle.
library(ggmap)
library(osmdata)
library(tidyverse)
# SLE <- get_map(getbb("Salem, OR"), source="osm")
# SLE %>% ggmap()
An Oregon Map of Liquor Stores
The setup for a Google Cloud account is kind of a pain and it requires a billing option. That was annoying but eventually fixed. It is required for geocoding addresses as OSM doesn’t do that anymore.
OR <- get_map(getbb("Oregon"), source="osm")
OR %>% ggmap()
# Get the Data on Liquor Stores from https://ryano.
A Citation
I found a starting point on local maps in Seattle.
library(ggmap)
## Loading required package: ggplot2
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(osmdata)
## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.
Mapping Points in R
My goal is a streamlined and self-contained freeware map maker with points denoting addresses. It is a three step process that involves:
Get a map.
Geocode the addresses into latitude and longitude.
Combine the the two with a first map layer and a second layer on top that contains the points.
From there, it is pretty easy to get fancy using ggplotly to put relevant text hovers into place.
Patchwork
Patchwork is a really neat R package that allows us to combine graphics in a fairly intuitive and ggplot-friendly fashion. Adding rows happens with + and dividing columns of plots works with /. It also respects orders of operations with parentheses. Let me use the Berkeley data to demonstrate a bit of this. I will make use of three packages that you may or may not have.
Fake Data
I will fake some data to work with according to the following equation.
\[ y = 2 + 2*x_{1} + 1*x_{2} -1*x_{3} + \epsilon \]
where each x and \(\epsilon\) are random draws from a standard normal distribution with mean zero and standard deviation 1.
x1 <- rnorm(100); x2 <- rnorm(100); x3 <- rnorm(100); e <- rnorm(100)
y <- 2 + 2*x1 + x2 - x3 + e
My.
tidyquant
Automates a lot of equity research and calculation using tidy concepts.
library(tidyquant)
ticker <- "GS"
GS <- ticker %>% tq_get(from="2017/02/01", to="2020/03/01")
GS.Returns <- GS %>% tq_transmute(close, periodReturn, period="monthly")
GSPC <- "^GSPC" %>% tq_get(from="2017/02/01", to="2020/03/01")
GSPC.Returns <- GSPC %>% tq_transmute(close, periodReturn, period="monthly")
My.data <- left_join(GSPC.Returns,GS.Returns,by="date")
summary(lm(monthly.returns.y~monthly.returns.x, data=My.data))
## ## Call:
## lm(formula = monthly.returns.y ~ monthly.returns.x, data = My.data)
## ## Residuals:
## Min 1Q Median 3Q Max ## -0.