AlumniGiving

Alumni Giving Prediction Example

A Linear Model

Mod.AG <- lm(Giving~SFR+Small.Classes+Big.Classes+Graduation.Rate+Freshman.Retention+Special, data=AlumniGiving)
summary(Mod.AG)
## 
## Call:
## lm(formula = Giving ~ SFR + Small.Classes + Big.Classes + Graduation.Rate + 
##     Freshman.Retention + Special, data = AlumniGiving)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.124888 -0.030048 -0.005409  0.027063  0.145876 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -0.188483   0.096503  -1.953  0.05317 .  
## SFR                -0.001085   0.001519  -0.715  0.47620    
## Small.Classes       0.166839   0.054459   3.064  0.00271 ** 
## Big.Classes        -0.023674   0.101584  -0.233  0.81613    
## Graduation.Rate     0.108767   0.081848   1.329  0.18645    
## Freshman.Retention  0.250587   0.148554   1.687  0.09428 .  
## SpecialYes          0.184869   0.028313   6.529 1.74e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04809 on 118 degrees of freedom
## Multiple R-squared:  0.6626, Adjusted R-squared:  0.6454 
## F-statistic: 38.61 on 6 and 118 DF,  p-value: < 2.2e-16

A predicted value for each row is:

\[ Fitted value = Intercept + slope_{SFR}*SFR + slope_{Small.Classes}*Small.Classes + slope_{Big.Classes}*Big.Classes + slope_{Freshman.Retention}*Freshman.Retention + + slope_{Special}*Special=Yes \]

Let’s see the first row of the data.

library(tidyverse)
Arkansas <- AlumniGiving %>% filter(School == "University of Arkansas")
Arkansas
##                   School SFR Small.Classes Big.Classes Graduation.Rate
## 1 University of Arkansas  18          0.28        0.18            0.58
##   Freshman.Retention Giving Special
## 1               0.83   0.23      No

So the predicted value for Arkansas is:

-0.1884827 + -0.0010855 x 18 + 0.1668386 x *0.28 + -0.0236738 x 0.18 + 0.1087666 x 0.58 + 0.2505868 x 0.83 + 0.1848693 x 0

which computes to:

# Intercept then Slopes on left and values on right with 1 first for the intercept
coefficients(Mod.AG)*c(1,18,0.28,0.18,0.58,0.83,0)
##        (Intercept)                SFR      Small.Classes        Big.Classes 
##       -0.188482706       -0.019538524        0.046714802       -0.004261286 
##    Graduation.Rate Freshman.Retention         SpecialYes 
##        0.063084620        0.207987053        0.000000000
# That gives us each piece of it, now add them
sum(coefficients(Mod.AG)*c(1,18,0.28,0.18,0.58,0.83,0))
## [1] 0.105504
predict(Mod.AG, newdata=Arkansas)
##        1 
## 0.105504

or a 0.1055 giving rate. Their actual giving rate is 0.23, so the residual is 0.1245 or +12.45% residual giving. Or giving that is 0.1245 more than expected given Arkansas’s data. The R code for it is fitted.values so I tend to use that term also.

AlumniGiving$resids <- residuals(Mod.AG)
AlumniGiving$FV <- fitted.values(Mod.AG)
Arkansas <- AlumniGiving %>% filter(School == "University of Arkansas")
Arkansas
##                   School SFR Small.Classes Big.Classes Graduation.Rate
## 1 University of Arkansas  18          0.28        0.18            0.58
##   Freshman.Retention Giving Special   resids       FV
## 1               0.83   0.23      No 0.124496 0.105504

Predicting averages and outcome distributions. The confidence interval is obtained with:

predict(Mod.AG, newdata=Arkansas, interval = "confidence")
##        fit        lwr       upr
## 1 0.105504 0.09058656 0.1204214

The prediction interval covering the hypothetical y given the values of the predictors that we choose/specify will have variation that closely follows a plus or minus two standard errors of the residual because that’s the variation in the data [averages vary much less because of, well, averaging].

predict(Mod.AG, newdata=Arkansas, interval = "confidence")
##        fit        lwr       upr
## 1 0.105504 0.09058656 0.1204214

For Radiant users, this whole thing is a black box. And it does not give us the option of generating the prediction interval in selection boxes. It only gives us that option with a change in the code that it generates.

library(radiant)
result <- regress(
  AlumniGiving, 
  rvar = "Giving", 
  evar = c(
    "SFR", "Small.Classes", "Big.Classes", "Graduation.Rate", 
    "Freshman.Retention", "Special"
  )
)
summary(result)
## Linear regression (OLS)
## Data     : AlumniGiving 
## Response variable    : Giving 
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special 
## Null hyp.: the effect of x on Giving is zero
## Alt. hyp.: the effect of x on Giving is not zero
## 
##                     coefficient std.error t.value p.value    
##  (Intercept)             -0.188     0.097  -1.953   0.053 .  
##  SFR                     -0.001     0.002  -0.715   0.476    
##  Small.Classes            0.167     0.054   3.064   0.003 ** 
##  Big.Classes             -0.024     0.102  -0.233   0.816    
##  Graduation.Rate          0.109     0.082   1.329   0.186    
##  Freshman.Retention       0.251     0.149   1.687   0.094 .  
##  Special|Yes              0.185     0.028   6.529  < .001 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-squared: 0.663,  Adjusted R-squared: 0.645 
## F-statistic: 38.614 df(6,118), p.value < .001
## Nr obs: 125
pred <- predict(result, pred_data = Arkansas, interval = "confidence")
print(pred, n = 10)
## Linear regression (OLS)
## Data                 : AlumniGiving 
## Response variable    : Giving 
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special 
## Interval             : confidence 
## Prediction dataset   : Arkansas 
## 
##     SFR Small.Classes Big.Classes Graduation.Rate Freshman.Retention Special
##  18.000         0.280       0.180           0.580              0.830      No
##  Prediction  2.5% 97.5%   +/-
##       0.106 0.091 0.120 0.015
Arkansas <- store(Arkansas, pred, name = "pred_reg")
pred <- predict(result, pred_data = Arkansas, interval = "prediction")
print(pred, n = 10)
## Linear regression (OLS)
## Data                 : AlumniGiving 
## Response variable    : Giving 
## Explanatory variables: SFR, Small.Classes, Big.Classes, Graduation.Rate, Freshman.Retention, Special 
## Interval             : prediction 
## Prediction dataset   : Arkansas 
## 
##     SFR Small.Classes Big.Classes Graduation.Rate Freshman.Retention Special
##  18.000         0.280       0.180           0.580              0.830      No
##  Prediction  2.5% 97.5%   +/-
##       0.106 0.009 0.202 0.096
Avatar
Robert W. Walker
Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.

Previous