AlumniGiving
Alumni Giving Prediction Example
A Linear Model
Mod.AG <- lm(Giving~SFR+Small.Classes+Big.Classes+Graduation.Rate+Freshman.Retention+Special, data=AlumniGiving)
summary(Mod.AG)
##
## Call:
## lm(formula = Giving ~ SFR + Small.Classes + Big.Classes + Graduation.Rate +
## Freshman.Retention + Special, data = AlumniGiving)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.124888 -0.030048 -0.005409 0.027063 0.145876
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.188483 0.096503 -1.953 0.05317 .
## SFR -0.001085 0.001519 -0.715 0.47620
## Small.Classes 0.166839 0.054459 3.064 0.00271 **
## Big.Classes -0.023674 0.101584 -0.233 0.81613
## Graduation.Rate 0.108767 0.081848 1.329 0.18645
## Freshman.Retention 0.250587 0.148554 1.687 0.09428 .
## SpecialYes 0.184869 0.028313 6.529 1.74e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04809 on 118 degrees of freedom
## Multiple R-squared: 0.6626, Adjusted R-squared: 0.6454
## F-statistic: 38.61 on 6 and 118 DF, p-value: < 2.2e-16
A predicted value for each row is:
\[ Fitted value = Intercept + slope_{SFR}*SFR + slope_{Small.Classes}*Small.Classes + slope_{Big.Classes}*Big.Classes + slope_{Freshman.Retention}*Freshman.Retention + + slope_{Special}*Special=Yes \]
Let’s see the first row of the data.
library(tidyverse)
Arkansas <- AlumniGiving %>% filter(School == "University of Arkansas")
Arkansas
## School SFR Small.Classes Big.Classes Graduation.Rate
## 1 University of Arkansas 18 0.28 0.18 0.58
## Freshman.Retention Giving Special
## 1 0.83 0.23 No
So the predicted value for Arkansas is:
-0.1884827 + -0.0010855 x 18 + 0.1668386 x *0.28 + -0.0236738 x 0.18 + 0.1087666 x 0.58 + 0.2505868 x 0.83 + 0.1848693 x 0
which computes to:
# Intercept then Slopes on left and values on right with 1 first for the intercept
coefficients(Mod.AG)*c(1,18,0.28,0.18,0.58,0.83,0)
## (Intercept) SFR Small.Classes Big.Classes
## -0.188482706 -0.019538524 0.046714802 -0.004261286
## Graduation.Rate Freshman.Retention SpecialYes
## 0.063084620 0.207987053 0.000000000
# That gives us each piece of it, now add them
sum(coefficients(Mod.AG)*c(1,18,0.28,0.18,0.58,0.83,0))
## [1] 0.105504
predict(Mod.AG, newdata=Arkansas)
## 1
## 0.105504
or a 0.1055 giving rate. Their actual giving rate is 0.23, so the residual is 0.1245 or +12.45% residual giving. Or giving that is 0.1245 more than expected given Arkansas’s data. The R code for it is fitted.values
so I tend to use that term also.
AlumniGiving$resids <- residuals(Mod.AG)
AlumniGiving$FV <- fitted.values(Mod.AG)
Arkansas <- AlumniGiving %>% filter(School == "University of Arkansas")
Arkansas
## School SFR Small.Classes Big.Classes Graduation.Rate
## 1 University of Arkansas 18 0.28 0.18 0.58
## Freshman.Retention Giving Special resids FV
## 1 0.83 0.23 No 0.124496 0.105504
Predicting averages and outcome distributions. The confidence interval is obtained with:
predict(Mod.AG, newdata=Arkansas, interval = "confidence")
## fit lwr upr
## 1 0.105504 0.09058656 0.1204214
The prediction interval covering the hypothetical y given the values of the predictors that we choose/specify will have variation that closely follows a plus or minus two standard errors of the residual because that’s the variation in the data [averages vary much less because of, well, averaging].
predict(Mod.AG, newdata=Arkansas, interval = "confidence")
## fit lwr upr
## 1 0.105504 0.09058656 0.1204214
For Radiant users, this whole thing is a black box. And it does not give us the option of generating the prediction
interval in selection boxes. It only gives us that option with a change in the code that it generates.
library(radiant)
result <- regress(
AlumniGiving,
rvar = "Giving",
evar = c(
"SFR", "Small.Classes", "Big.Classes", "Graduation.Rate",
"Freshman.Retention", "Special"
)
)
summary(result)
## Linear regression (OLS)
## Data : AlumniGiving
## Response variable : Giving
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special
## Null hyp.: the effect of x on Giving is zero
## Alt. hyp.: the effect of x on Giving is not zero
##
## coefficient std.error t.value p.value
## (Intercept) -0.188 0.097 -1.953 0.053 .
## SFR -0.001 0.002 -0.715 0.476
## Small.Classes 0.167 0.054 3.064 0.003 **
## Big.Classes -0.024 0.102 -0.233 0.816
## Graduation.Rate 0.109 0.082 1.329 0.186
## Freshman.Retention 0.251 0.149 1.687 0.094 .
## Special|Yes 0.185 0.028 6.529 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-squared: 0.663, Adjusted R-squared: 0.645
## F-statistic: 38.614 df(6,118), p.value < .001
## Nr obs: 125
pred <- predict(result, pred_data = Arkansas, interval = "confidence")
print(pred, n = 10)
## Linear regression (OLS)
## Data : AlumniGiving
## Response variable : Giving
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special
## Interval : confidence
## Prediction dataset : Arkansas
##
## SFR Small.Classes Big.Classes Graduation.Rate Freshman.Retention Special
## 18.000 0.280 0.180 0.580 0.830 No
## Prediction 2.5% 97.5% +/-
## 0.106 0.091 0.120 0.015
Arkansas <- store(Arkansas, pred, name = "pred_reg")
pred <- predict(result, pred_data = Arkansas, interval = "prediction")
print(pred, n = 10)
## Linear regression (OLS)
## Data : AlumniGiving
## Response variable : Giving
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special
## Interval : prediction
## Prediction dataset : Arkansas
##
## SFR Small.Classes Big.Classes Graduation.Rate Freshman.Retention Special
## 18.000 0.280 0.180 0.580 0.830 No
## Prediction 2.5% 97.5% +/-
## 0.106 0.009 0.202 0.096