GSM 5103: Data Analysis, Modelling, and Decision-Making

GSM 5103

Resource to support my MBA data analysis course.

This is a more convenient place than the WISE ecosystem to host materials. The R directory also contains some useful howtos.


Stuff

AlumniGiving

Alumni Giving Prediction Example A Linear Model Mod.AG <- lm(Giving~SFR+Small.Classes+Big.Classes+Graduation.Rate+Freshman.Retention+Special, data=AlumniGiving) summary(Mod.AG) ## ## Call: ## lm(formula = Giving ~ SFR + Small.Classes + Big.Classes + Graduation.Rate + ## Freshman.Retention + Special, data = AlumniGiving) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.124888 -0.030048 -0.005409 0.027063 0.145876 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.188483 0.096503 -1.953 0.05317 . ## SFR -0.

Asian Identification in the United States

Searching for the Asian Population via the Census To use tidycensus, there are limitations imposed by the available tables. There is ACS – a survey of about 3 million people – and the two main decennial census files [SF1] and [SF2]. I will search SF1 for the Asian population. library(tidycensus); library(kableExtra) library(tidyverse); library(stringr) v10 <- load_variables(2010, "sf1", cache = TRUE) v10 %>% filter(str_detect(concept, "ASIAN")) %>% filter(str_detect(label, "Female")) %>% kable() %>% scroll_box(width = "100%") name label concept P012D026 Total!

Local Portland Area Maps

A Citation I found a starting point on local maps in Seattle. library(ggmap) ## Loading required package: ggplot2 ## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/. ## Please cite ggmap if you use it! See citation("ggmap") for details. library(osmdata) ## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ tibble 3.0.4 ✓ dplyr 1.0.2 ## ✓ tidyr 1.1.2 ✓ stringr 1.4.0 ## ✓ readr 1.

Local Salem Area Maps

A Citation I found a starting point on local maps in Seattle. library(ggmap) library(osmdata) library(tidyverse) # SLE <- get_map(getbb("Salem, OR"), source="osm") # SLE %>% ggmap() An Oregon Map of Liquor Stores The setup for a Google Cloud account is kind of a pain and it requires a billing option. That was annoying but eventually fixed. It is required for geocoding addresses as OSM doesn’t do that anymore. OR <- get_map(getbb("Oregon"), source="osm") OR %>% ggmap() # Get the Data on Liquor Stores from https://ryano.

Local Salem Area Maps

A Citation I found a starting point on local maps in Seattle. library(ggmap) ## Loading required package: ggplot2 ## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/. ## Please cite ggmap if you use it! See citation("ggmap") for details. library(osmdata) ## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright library(tidyverse) ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ tibble 3.0.4 ✓ dplyr 1.0.2 ## ✓ tidyr 1.1.2 ✓ stringr 1.4.0 ## ✓ readr 1.

Making a Point Map in R

Mapping Points in R My goal is a streamlined and self-contained freeware map maker with points denoting addresses. It is a three step process that involves: Get a map. Geocode the addresses into latitude and longitude. Combine the the two with a first map layer and a second layer on top that contains the points. From there, it is pretty easy to get fancy using ggplotly to put relevant text hovers into place.

Patchwork

Patchwork Patchwork is a really neat R package that allows us to combine graphics in a fairly intuitive and ggplot-friendly fashion. Adding rows happens with + and dividing columns of plots works with /. It also respects orders of operations with parentheses. Let me use the Berkeley data to demonstrate a bit of this. I will make use of three packages that you may or may not have.

R Notes on Linear Models

Fake Data I will fake some data to work with according to the following equation. \[ y = 2 + 2*x_{1} + 1*x_{2} -1*x_{3} + \epsilon \] where each x and \(\epsilon\) are random draws from a standard normal distribution with mean zero and standard deviation 1. x1 <- rnorm(100); x2 <- rnorm(100); x3 <- rnorm(100); e <- rnorm(100) y <- 2 + 2*x1 + x2 - x3 + e My.

tidyQuant

tidyquant Automates a lot of equity research and calculation using tidy concepts. library(tidyquant) ticker <- "GS" GS <- ticker %>% tq_get(from="2017/02/01", to="2020/03/01") GS.Returns <- GS %>% tq_transmute(close, periodReturn, period="monthly") GSPC <- "^GSPC" %>% tq_get(from="2017/02/01", to="2020/03/01") GSPC.Returns <- GSPC %>% tq_transmute(close, periodReturn, period="monthly") My.data <- left_join(GSPC.Returns,GS.Returns,by="date") summary(lm(monthly.returns.y~monthly.returns.x, data=My.data)) ## ## Call: ## lm(formula = monthly.returns.y ~ monthly.returns.x, data = My.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.