telco

Last updated on Apr 28, 2022

telco Churn

The loss of customers is known as churn. Here is some data on telephone companies that gives us a number of features of the customer and the billing history relevant to this.

library(tidyverse)
library(readr)
library(skimr)
telco <- read_csv(url("https://github.com/robertwwalker/DADMStuff/raw/master/WA_Fn-UseC_-Telco-Customer-Churn.csv"))
skim(telco)

Table 1: Data summary
Name	telco
Number of rows	7043
Number of columns	21
_______________________
Column type frequency:
character	17
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
customerID	1	10	10	7043
gender	1	4	6	2
Partner	1	2	3	2
Dependents	1	2	3	2
PhoneService	1	2	3	2
MultipleLines	1	2	16	3
InternetService	1	2	11	3
OnlineSecurity	1	2	19	3
OnlineBackup	1	2	19	3
DeviceProtection	1	2	19	3
TechSupport	1	2	19	3
StreamingTV	1	2	19	3
StreamingMovies	1	2	19	3
Contract	1	8	14	3
PaperlessBilling	1	2	3	2
PaymentMethod	1	12	25	4
Churn	1	2	3	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
SeniorCitizen	0	1	0.16	0.37	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
tenure	0	1	32.37	24.56	0.00	9.00	29.00	55.00	72.00	▇▃▃▃▆
MonthlyCharges	0	1	64.76	30.09	18.25	35.50	70.35	89.85	118.75	▇▅▆▇▅
TotalCharges	11	1	2283.30	2266.77	18.80	401.45	1397.47	3794.74	8684.80	▇▂▂▂▁

telco <- telco[complete.cases(telco),] # Keep only rows with complete cases
telco.analyse <- telco %>% select(-customerID) # Drop customer ID
telco.F <- mutate_if(telco.analyse, is.character, as.factor)

install.packages("DataExplorer")
library(DataExplorer)
# Run separately
create_report(telco.F)

Remove the missing data. The outcome of interest is Churn. Let’s look at it and take on a few examples from proportions. Because it is Yes and No, everything will be about the probability of No unless we turn them to factors and order them backwards. I will just describe this as the probability of retention.

Paperless <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessY <- xtabs(~PaperlessBilling+Churn, data=Paperless)
prop.table(Tab.PaperlessY, 1)

##                 Churn
## PaperlessBilling        No       Yes
##              Yes 0.6641075 0.3358925

prop.test(Tab.PaperlessY)

## 
##  1-sample proportions test with continuity correction
## 
## data:  Tab.PaperlessY, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.6495026 0.6784077
## sample estimates:
##         p 
## 0.6641075

For the paperless customers, the probability of retention is 0.65 to 0.68 with 95% confidence. For the paper customers, we have…

PaperBill <- telco.analyse %>% filter(PaperlessBilling=="Yes")
Tab.PaperlessN <- xtabs(~PaperlessBilling+Churn, data=PaperBill)
prop.table(Tab.PaperlessN, 1)

##                 Churn
## PaperlessBilling        No       Yes
##              Yes 0.6641075 0.3358925

prop.test(Tab.PaperlessN)

## 
##  1-sample proportions test with continuity correction
## 
## data:  Tab.PaperlessN, null probability 0.5
## X-squared = 448.34, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.6495026 0.6784077
## sample estimates:
##         p 
## 0.6641075

The probability of retention ranges from 0.82 to 0.85 with 95% confidence. Because they do not overlap, there must be a difference and paperless customers churn more.

Comparisons

( Tab.Paperless <- xtabs(~PaperlessBilling+Churn, data=telco.analyse) )

##                 Churn
## PaperlessBilling   No  Yes
##              No  2395  469
##              Yes 2768 1400

prop.table(Tab.Paperless, 1)

##                 Churn
## PaperlessBilling        No       Yes
##              No  0.8362430 0.1637570
##              Yes 0.6641075 0.3358925

prop.test(Tab.Paperless)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  Tab.Paperless
## X-squared = 256.87, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1521111 0.1921600
## sample estimates:
##    prop 1    prop 2 
## 0.8362430 0.6641075

Paperless clients have a probability of churning that is 0.152 to 0.192 higher than non-paperless clients.

Phone Service?

Is there a difference in Churn between those with and without phone service?

( Tab.PS <- xtabs(~PhoneService+Churn, data=telco.analyse) )

##             Churn
## PhoneService   No  Yes
##          No   510  170
##          Yes 4653 1699

prop.table(Tab.PS, 1)

##             Churn
## PhoneService        No       Yes
##          No  0.7500000 0.2500000
##          Yes 0.7325252 0.2674748

prop.test(Tab.PS)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  Tab.PS
## X-squared = 0.87373, df = 1, p-value = 0.3499
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01765710  0.05260672
## sample estimates:
##    prop 1    prop 2 
## 0.7500000 0.7325252

It seems not. Those without phone service could be 0.017 less likely or 0.053 more likely to be retained; no difference is certainly among the plausible values.

Contracts?

(Tab.Contract <- table(telco.analyse$Contract, telco.analyse$Churn))

##                 
##                    No  Yes
##   Month-to-month 2220 1655
##   One year       1306  166
##   Two year       1637   48

prop.table(Tab.Contract, 1)

##                 
##                          No        Yes
##   Month-to-month 0.57290323 0.42709677
##   One year       0.88722826 0.11277174
##   Two year       0.97151335 0.02848665

prop.test(Tab.Contract)

## 
##  3-sample test for equality of proportions without continuity
##  correction
## 
## data:  Tab.Contract
## X-squared = 1179.5, df = 2, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3 
## 0.5729032 0.8872283 0.9715134

With more than two proportions, the question is, are the probabilities of churn the same or not.

Perhaps we should reexamine the row dimension to combine those under contracts of different lengths to one.

Contracts: Yes or No

First, use mutate to separate those under contract from those that are month-to-month. Second, I will examine the table and the proportions from the table calculated by rows.

TAC <- telco.analyse %>% mutate(No.Contract = (Contract=="Month-to-month"))
table(TAC$No.Contract, TAC$Churn)

##        
##           No  Yes
##   FALSE 2943  214
##   TRUE  2220 1655

prop.table(table(TAC$No.Contract, TAC$Churn), 1)

##        
##                 No        Yes
##   FALSE 0.93221413 0.06778587
##   TRUE  0.57290323 0.42709677

Those under contract have a six percent churn rate while those not under contract churn just under forty-three percent of the time. It is likely that contracts are a very useful way to diminish the likelihood of churn. How much? We can use the test of proportions to examine this.

prop.test(table(TAC$No.Contract, TAC$Churn))

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  table(TAC$No.Contract, TAC$Churn)
## X-squared = 1149.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.3411501 0.3774717
## sample estimates:
##    prop 1    prop 2 
## 0.9322141 0.5729032

In evaluating the claim of no difference, the probability of no difference is estimated to be basically zero [2.2e-16]. So what are the likely values? With 95% confidence, the probability of churning is between 0.34 and 0.377 more likely for those without a contract.

Robert W. Walker

Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.