telco
telco Churn
The loss of customers is known as churn. Here is some data on telephone companies that gives us a number of features of the customer and the billing history relevant to this.
library(tidyverse)
library(readr)
library(skimr)
telco <- read_csv(url("https://github.com/robertwwalker/DADMStuff/raw/master/WA_Fn-UseC_-Telco-Customer-Churn.csv"))
skim(telco)
Name | telco |
Number of rows | 7043 |
Number of columns | 21 |
_______________________ | |
Column type frequency: | |
character | 17 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
customerID | 0 | 1 | 10 | 10 | 0 | 7043 | 0 |
gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Partner | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Dependents | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
PhoneService | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
MultipleLines | 0 | 1 | 2 | 16 | 0 | 3 | 0 |
InternetService | 0 | 1 | 2 | 11 | 0 | 3 | 0 |
OnlineSecurity | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
OnlineBackup | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
DeviceProtection | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
TechSupport | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
StreamingTV | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
StreamingMovies | 0 | 1 | 2 | 19 | 0 | 3 | 0 |
Contract | 0 | 1 | 8 | 14 | 0 | 3 | 0 |
PaperlessBilling | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
PaymentMethod | 0 | 1 | 12 | 25 | 0 | 4 | 0 |
Churn | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
SeniorCitizen | 0 | 1 | 0.16 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
tenure | 0 | 1 | 32.37 | 24.56 | 0.00 | 9.00 | 29.00 | 55.00 | 72.00 | ▇▃▃▃▆ |
MonthlyCharges | 0 | 1 | 64.76 | 30.09 | 18.25 | 35.50 | 70.35 | 89.85 | 118.75 | ▇▅▆▇▅ |
TotalCharges | 11 | 1 | 2283.30 | 2266.77 | 18.80 | 401.45 | 1397.47 | 3794.74 | 8684.80 | ▇▂▂▂▁ |
telco <- telco[complete.cases(telco),] # Keep only rows with complete cases
telco.analyse <- telco %>% select(-customerID) # Drop customer ID
telco.F <- mutate_if(telco.analyse, is.character, as.factor)
install.packages("DataExplorer")
library(DataExplorer)
# Run separately
create_report(telco.F)
Remove the missing data. The outcome of interest is Churn. Let’s look at it and take on a few examples from proportions. Because it is Yes and No, everything will be about the probability of No unless we turn them to factors and order them backwards. I will just describe this as the probability of retention.
Paperless <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessY <- xtabs(~PaperlessBilling+Churn, data=Paperless)
prop.table(Tab.PaperlessY, 1)
## Churn
## PaperlessBilling No Yes
## Yes 0.6641075 0.3358925
prop.test(Tab.PaperlessY)
##
## 1-sample proportions test with continuity correction
##
## data: Tab.PaperlessY, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6495026 0.6784077
## sample estimates:
## p
## 0.6641075
For the paperless customers, the probability of retention is 0.65 to 0.68 with 95% confidence. For the paper customers, we have…
PaperBill <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessN <- xtabs(~PaperlessBilling+Churn, data=PaperBill)
prop.table(Tab.PaperlessN, 1)
## Churn
## PaperlessBilling No Yes
## Yes 0.6641075 0.3358925
prop.test(Tab.PaperlessN)
##
## 1-sample proportions test with continuity correction
##
## data: Tab.PaperlessN, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6495026 0.6784077
## sample estimates:
## p
## 0.6641075
The probability of retention ranges from 0.82 to 0.85 with 95% confidence. Because they do not overlap, there must be a difference and paperless customers churn more.
Comparisons
( Tab.Paperless <- xtabs(~PaperlessBilling+Churn, data=telco.analyse) )
## Churn
## PaperlessBilling No Yes
## No 2395 469
## Yes 2768 1400
prop.table(Tab.Paperless, 1)
## Churn
## PaperlessBilling No Yes
## No 0.8362430 0.1637570
## Yes 0.6641075 0.3358925
prop.test(Tab.Paperless)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: Tab.Paperless
## X-squared = 256.87, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1521111 0.1921600
## sample estimates:
## prop 1 prop 2
## 0.8362430 0.6641075
Paperless clients have a probability of churning that is 0.152 to 0.192 higher than non-paperless clients.
Phone Service?
Is there a difference in Churn between those with and without phone service?
( Tab.PS <- xtabs(~PhoneService+Churn, data=telco.analyse) )
## Churn
## PhoneService No Yes
## No 510 170
## Yes 4653 1699
prop.table(Tab.PS, 1)
## Churn
## PhoneService No Yes
## No 0.7500000 0.2500000
## Yes 0.7325252 0.2674748
prop.test(Tab.PS)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: Tab.PS
## X-squared = 0.87373, df = 1, p-value = 0.3499
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.01765710 0.05260672
## sample estimates:
## prop 1 prop 2
## 0.7500000 0.7325252
It seems not. Those without phone service could be 0.017 less likely or 0.053 more likely to be retained; no difference is certainly among the plausible values.
Contracts?
(Tab.Contract <- table(telco.analyse$Contract, telco.analyse$Churn))
##
## No Yes
## Month-to-month 2220 1655
## One year 1306 166
## Two year 1637 48
prop.table(Tab.Contract, 1)
##
## No Yes
## Month-to-month 0.57290323 0.42709677
## One year 0.88722826 0.11277174
## Two year 0.97151335 0.02848665
prop.test(Tab.Contract)
##
## 3-sample test for equality of proportions without continuity
## correction
##
## data: Tab.Contract
## X-squared = 1179.5, df = 2, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## prop 1 prop 2 prop 3
## 0.5729032 0.8872283 0.9715134
With more than two proportions, the question is, are the probabilities of churn the same or not
.
Perhaps we should reexamine the row dimension to combine those under contracts of different lengths to one.
Contracts: Yes or No
First, use mutate to separate those under contract from those that are month-to-month. Second, I will examine the table and the proportions from the table calculated by rows.
TAC <- telco.analyse %>% mutate(No.Contract = (Contract=="Month-to-month"))
table(TAC$No.Contract, TAC$Churn)
##
## No Yes
## FALSE 2943 214
## TRUE 2220 1655
prop.table(table(TAC$No.Contract, TAC$Churn), 1)
##
## No Yes
## FALSE 0.93221413 0.06778587
## TRUE 0.57290323 0.42709677
Those under contract have a six percent churn rate while those not under contract churn just under forty-three percent of the time. It is likely that contracts are a very useful way to diminish the likelihood of churn. How much? We can use the test of proportions to examine this.
prop.test(table(TAC$No.Contract, TAC$Churn))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: table(TAC$No.Contract, TAC$Churn)
## X-squared = 1149.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.3411501 0.3774717
## sample estimates:
## prop 1 prop 2
## 0.9322141 0.5729032
In evaluating the claim of no difference, the probability of no difference is estimated to be basically zero [2.2e-16]. So what are the likely values? With 95% confidence, the probability of churning is between 0.34 and 0.377 more likely for those without a contract.