GSMPR 622: Spring 2022

GSMPR 622: Spring 2022

Classroom videos

1 and 2 Sample Means-Proportions

The Basic Idea Two classes of statistics have known distributions. Means have a t distribution. Proportions have a normal distribution if the expected number of both categories exceeds 5. The Mean: t The t distribution - is entirely defined by degrees of freedom. - has as metric, the standard error [in this case of the mean] The equations: \[ \Large t = \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} \] and \[ \Large \mu = \overline{x} + t(\frac{s}{\sqrt{n}}) \] The true mean is symmetric about the sample mean with t defining the number of standard errors of the mean above and below.

telco

telco Churn The loss of customers is known as churn. Here is some data on telephone companies that gives us a number of features of the customer and the billing history relevant to this. library(tidyverse) library(readr) library(skimr) telco <- read_csv(url("https://github.com/robertwwalker/DADMStuff/raw/master/WA_Fn-UseC_-Telco-Customer-Churn.csv")) skim(telco) Table 1: Data summary Name telco Number of rows 7043 Number of columns 21 _______________________ Column type frequency: character 17 numeric 4 ________________________ Group variables None Variable type: character

Probability Distributions

Probability: The Logic of Science Jaynes presents a few core ideas and requirements for his rational system. Probability emerges as the representation of circumstances in which any given realization of a process is either TRUE or FALSE but both are possible and expressable by probabilities that sum to one for all events are greater than or equal to zero for any given event General Representation of Probability Is of necessity two-dimensional,

A Fast Food Solution

Fast Food Data These data came care of a Tidy Tuesday a while ago. The data consist of Fast Food menu items for a selection of fast food chains. The units are menu items. We have the chain [restaurant], item [the item name], and a series of variables (columns) representing sodium, cholesterol, fat, calories, and other information. Some is missing. The data can be imported from the tidytuesday website on github as .

FlexDashboard

There is wonderful documentation to the flexdashboard package. Furthermore, because it is built on underlying scripting, things like plotly just work. Here is an example.

Gender Gaps and Black Boxes

Variance in the Outcome: The Black Box Regression models engage an exercise in variance accounting. How much of the outcome is explained by the inputs, individually (slope divided by standard error is t) and collectively (Average explained/Average unexplained with averaging over degrees of freedom is F). This, of course, assumes normal errors. This document provides a function for making use of the black box. Just as in common parlance, a black box is the unexplained.

R Notes on Linear Models

Fake Data I will fake some data to work with according to the following equation. \[ y = 2 + 2*x_{1} + 1*x_{2} -1*x_{3} + \epsilon \] where each x and \(\epsilon\) are random draws from a standard normal distribution with mean zero and standard deviation 1. x1 <- rnorm(100); x2 <- rnorm(100); x3 <- rnorm(100); e <- rnorm(100) y <- 2 + 2*x1 + x2 - x3 + e My.