telco
telco Churn
The loss of customers is known as churn. Here is some data on telephone companies that gives us a number of features of the customer and the billing history relevant to this.
library(tidyverse)
library(readr)
library(skimr)
telco <- read_csv(url("https://github.com/robertwwalker/DADMStuff/raw/master/WA_Fn-UseC_-Telco-Customer-Churn.csv"))
skim(telco)
| Name | telco |
| Number of rows | 7043 |
| Number of columns | 21 |
| _______________________ | |
| Column type frequency: | |
| character | 17 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| customerID | 0 | 1 | 10 | 10 | 0 | 7043 | 0 |
| gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| Partner | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| Dependents | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| PhoneService | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| MultipleLines | 0 | 1 | 2 | 16 | 0 | 3 | 0 |
| InternetService | 0 | 1 | 2 | 11 | 0 | 3 | 0 |
| OnlineSecurity | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| OnlineBackup | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| DeviceProtection | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| TechSupport | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| StreamingTV | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| StreamingMovies | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
| Contract | 0 | 1 | 8 | 14 | 0 | 3 | 0 |
| PaperlessBilling | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| PaymentMethod | 0 | 1 | 12 | 25 | 0 | 4 | 0 |
| Churn | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| SeniorCitizen | 0 | 1 | 0.16 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
| tenure | 0 | 1 | 32.37 | 24.56 | 0.00 | 9.00 | 29.00 | 55.00 | 72.00 | ▇▃▃▃▆ |
| MonthlyCharges | 0 | 1 | 64.76 | 30.09 | 18.25 | 35.50 | 70.35 | 89.85 | 118.75 | ▇▅▆▇▅ |
| TotalCharges | 11 | 1 | 2283.30 | 2266.77 | 18.80 | 401.45 | 1397.47 | 3794.74 | 8684.80 | ▇▂▂▂▁ |
telco <- telco[complete.cases(telco),] # Keep only rows with complete cases
telco.analyse <- telco %>% select(-customerID) # Drop customer ID
telco.F <- mutate_if(telco.analyse, is.character, as.factor)
install.packages("DataExplorer")
library(DataExplorer)
# Run separately
create_report(telco.F)
Remove the missing data. The outcome of interest is Churn. Let’s look at it and take on a few examples from proportions. Because it is Yes and No, everything will be about the probability of No unless we turn them to factors and order them backwards. I will just describe this as the probability of retention.
Paperless <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessY <- xtabs(~PaperlessBilling+Churn, data=Paperless)
prop.table(Tab.PaperlessY, 1)
## Churn
## PaperlessBilling No Yes
## Yes 0.6641075 0.3358925
prop.test(Tab.PaperlessY)
##
## 1-sample proportions test with continuity correction
##
## data: Tab.PaperlessY, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6495026 0.6784077
## sample estimates:
## p
## 0.6641075
For the paperless customers, the probability of retention is 0.65 to 0.68 with 95% confidence. For the paper customers, we have…
PaperBill <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessN <- xtabs(~PaperlessBilling+Churn, data=PaperBill)
prop.table(Tab.PaperlessN, 1)
## Churn
## PaperlessBilling No Yes
## Yes 0.6641075 0.3358925
prop.test(Tab.PaperlessN)
##
## 1-sample proportions test with continuity correction
##
## data: Tab.PaperlessN, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6495026 0.6784077
## sample estimates:
## p
## 0.6641075
The probability of retention ranges from 0.82 to 0.85 with 95% confidence. Because they do not overlap, there must be a difference and paperless customers churn more.
Comparisons
( Tab.Paperless <- xtabs(~PaperlessBilling+Churn, data=telco.analyse) )
## Churn
## PaperlessBilling No Yes
## No 2395 469
## Yes 2768 1400
prop.table(Tab.Paperless, 1)
## Churn
## PaperlessBilling No Yes
## No 0.8362430 0.1637570
## Yes 0.6641075 0.3358925
prop.test(Tab.Paperless)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: Tab.Paperless
## X-squared = 256.87, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1521111 0.1921600
## sample estimates:
## prop 1 prop 2
## 0.8362430 0.6641075
Paperless clients have a probability of churning that is 0.152 to 0.192 higher than non-paperless clients.
Phone Service?
Is there a difference in Churn between those with and without phone service?
( Tab.PS <- xtabs(~PhoneService+Churn, data=telco.analyse) )
## Churn
## PhoneService No Yes
## No 510 170
## Yes 4653 1699
prop.table(Tab.PS, 1)
## Churn
## PhoneService No Yes
## No 0.7500000 0.2500000
## Yes 0.7325252 0.2674748
prop.test(Tab.PS)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: Tab.PS
## X-squared = 0.87373, df = 1, p-value = 0.3499
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.01765710 0.05260672
## sample estimates:
## prop 1 prop 2
## 0.7500000 0.7325252
It seems not. Those without phone service could be 0.017 less likely or 0.053 more likely to be retained; no difference is certainly among the plausible values.
Contracts?
(Tab.Contract <- table(telco.analyse$Contract, telco.analyse$Churn))
##
## No Yes
## Month-to-month 2220 1655
## One year 1306 166
## Two year 1637 48
prop.table(Tab.Contract, 1)
##
## No Yes
## Month-to-month 0.57290323 0.42709677
## One year 0.88722826 0.11277174
## Two year 0.97151335 0.02848665
prop.test(Tab.Contract)
##
## 3-sample test for equality of proportions without continuity
## correction
##
## data: Tab.Contract
## X-squared = 1179.5, df = 2, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3
## 0.5729032 0.8872283 0.9715134
With more than two proportions, the question is, are the probabilities of churn the same or not.
Perhaps we should reexamine the row dimension to combine those under contracts of different lengths to one.
Contracts: Yes or No
First, use mutate to separate those under contract from those that are month-to-month. Second, I will examine the table and the proportions from the table calculated by rows.
TAC <- telco.analyse %>% mutate(No.Contract = (Contract=="Month-to-month"))
table(TAC$No.Contract, TAC$Churn)
##
## No Yes
## FALSE 2943 214
## TRUE 2220 1655
prop.table(table(TAC$No.Contract, TAC$Churn), 1)
##
## No Yes
## FALSE 0.93221413 0.06778587
## TRUE 0.57290323 0.42709677
Those under contract have a six percent churn rate while those not under contract churn just under forty-three percent of the time. It is likely that contracts are a very useful way to diminish the likelihood of churn. How much? We can use the test of proportions to examine this.
prop.test(table(TAC$No.Contract, TAC$Churn))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: table(TAC$No.Contract, TAC$Churn)
## X-squared = 1149.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.3411501 0.3774717
## sample estimates:
## prop 1 prop 2
## 0.9322141 0.5729032
In evaluating the claim of no difference, the probability of no difference is estimated to be basically zero [2.2e-16]. So what are the likely values? With 95% confidence, the probability of churning is between 0.34 and 0.377 more likely for those without a contract.