A Fast Food Solution

Last updated on Apr 28, 2022

Fast Food Data

These data came care of a Tidy Tuesday a while ago. The data consist of Fast Food menu items for a selection of fast food chains. The units are menu items. We have the chain [restaurant], item [the item name], and a series of variables (columns) representing sodium, cholesterol, fat, calories, and other information. Some is missing. The data can be imported from the tidytuesday website on github as .csv. It is worth noting that the code in the setup chunk is not echoed.

library(plotly); library(widgetframe)
FFood <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-09-04/fastfood_calories.csv")

Basic Summary

A basic summary of the data.

Table 1: Data summary
Name	FFood
Number of rows	515
Number of columns	18
_______________________
Column type frequency:
character	3
numeric	15
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
restaurant	1	5	11	8
item	1	5	63	505
salad	1	5	5	1

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
X	0	1.00	258.00	148.81	1	129.5	258.0	386.5	515	▇▇▇▇▇
calories	0	1.00	530.91	282.44	20	330.0	490.0	690.0	2430	▇▆▁▁▁
cal_fat	0	1.00	238.81	166.41	0	120.0	210.0	310.0	1270	▇▃▁▁▁
total_fat	0	1.00	26.59	18.41	0	14.0	23.0	35.0	141	▇▃▁▁▁
sat_fat	0	1.00	8.15	6.42	0	4.0	7.0	11.0	47	▇▃▁▁▁
trans_fat	0	1.00	0.47	0.84	0	0.0	0.0	1.0	8	▇▁▁▁▁
cholesterol	0	1.00	72.46	63.16	0	35.0	60.0	95.0	805	▇▁▁▁▁
sodium	0	1.00	1246.74	689.95	15	800.0	1110.0	1550.0	6080	▇▆▁▁▁
total_carb	0	1.00	45.66	24.88	0	28.5	44.0	57.0	156	▅▇▂▁▁
fiber	12	0.98	4.14	3.04	0	2.0	3.0	5.0	17	▇▅▂▁▁
sugar	0	1.00	7.26	6.76	0	3.0	6.0	9.0	87	▇▁▁▁▁
protein	1	1.00	27.89	17.68	1	16.0	24.5	36.0	186	▇▂▁▁▁
vit_a	214	0.58	18.86	31.38	0	4.0	10.0	20.0	180	▇▁▁▁▁
vit_c	210	0.59	20.17	30.59	0	4.0	10.0	30.0	400	▇▁▁▁▁
calcium	210	0.59	24.85	25.52	0	8.0	20.0	30.0	290	▇▁▁▁▁

A Cautionary Note

The NA….

For Next Time

In Excel, construct a pivot table of the data by fast food en terprise that displays five key quantities: the count of the number of menu i tems, the mean and sample standard deviation of calories, and the mean and sample standard deviation of sodium.
Which chain has the highest average calories? Mcdonalds
Which chain has the lower average calories? Chick Fil-A
Which chain has the highest average sodium content? Arbys

** The pivot table is the first sheet in the Excel file.**

To answer them in R:

FFood %>% group_by(restaurant) %>% skim(calories,sodium)

Table 2: Data summary
Name	Piped data
Number of rows	515
Number of columns	18
_______________________
Column type frequency:
numeric	2
________________________
Group variables	restaurant

Variable type: numeric

skim_variable	restaurant	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
calories	Arbys	1	532.73	210.34	70	360.0	550	690.0	1030	▃▆▇▇▂
calories	Burger King	1	608.57	290.42	190	365.0	555	760.0	1550	▇▇▃▂▁
calories	Chick Fil-A	1	384.44	220.49	70	220.0	390	480.0	970	▇▇▇▁▂
calories	Dairy Queen	1	520.24	259.34	20	350.0	485	630.0	1260	▂▇▆▂▁
calories	Mcdonalds	1	640.35	410.70	140	380.0	540	740.0	2430	▇▅▁▁▁
calories	Sonic	1	631.70	300.88	100	410.0	570	740.0	1350	▃▇▆▂▃
calories	Subway	1	503.02	282.22	50	287.5	460	740.0	1160	▅▇▃▃▂
calories	Taco Bell	1	443.65	184.34	140	320.0	420	575.0	880	▆▇▇▃▂
sodium	Arbys	1	1515.27	663.67	100	960.0	1480	2020.0	3350	▂▇▇▃▁
sodium	Burger King	1	1223.57	499.88	310	850.0	1150	1635.0	2310	▅▇▅▆▂
sodium	Chick Fil-A	1	1151.48	726.92	220	700.0	1000	1405.0	3660	▇▇▂▁▁
sodium	Dairy Queen	1	1181.79	609.94	15	847.5	1030	1362.5	3500	▂▇▂▁▁
sodium	Mcdonalds	1	1437.89	1036.17	20	870.0	1120	1780.0	6080	▇▅▁▁▁
sodium	Sonic	1	1350.75	665.13	470	900.0	1250	1550.0	4520	▇▆▂▁▁
sodium	Subway	1	1272.97	743.63	65	697.5	1130	1605.0	3540	▅▇▃▁▂
sodium	Taco Bell	1	1013.91	474.05	290	615.0	960	1300.0	2260	▇▇▆▂▂

In R, construct the same pivot table equivalent using the tidyverse,the piping operator, and the skim function from the skim library

FFood %>% group_by(restaurant) %>% summarise(Median.Calories = median(calories)) %>% arrange(Median.Calories)

## # A tibble: 8 x 2
##   restaurant  Median.Calories
##   <chr>                 <dbl>
## 1 Chick Fil-A             390
## 2 Taco Bell               420
## 3 Subway                  460
## 4 Dairy Queen             485
## 5 Mcdonalds               540
## 6 Arbys                   550
## 7 Burger King             555
## 8 Sonic                   570

Which chain has the highest median calories? Sonic
Which chain has the lowest median calories? Chick Fil-A

FFood %>% group_by(restaurant) %>% summarise(Median.Sodium = median(sodium))

## # A tibble: 8 x 2
##   restaurant  Median.Sodium
##   <chr>               <dbl>
## 1 Arbys                1480
## 2 Burger King          1150
## 3 Chick Fil-A          1000
## 4 Dairy Queen          1030
## 5 Mcdonalds            1120
## 6 Sonic                1250
## 7 Subway               1130
## 8 Taco Bell             960

Which chain has the highest median sodium? Arbys

FFood %>% group_by(restaurant) %>% summarise(Q3.Calories = quantile(calories, 0.75))

## # A tibble: 8 x 2
##   restaurant  Q3.Calories
##   <chr>             <dbl>
## 1 Arbys               690
## 2 Burger King         760
## 3 Chick Fil-A         480
## 4 Dairy Queen         630
## 5 Mcdonalds           740
## 6 Sonic               740
## 7 Subway              740
## 8 Taco Bell           575

Which chain has the lowest 3rd quartile of calories? Chick Fil-A

FFood %>% group_by(restaurant) %>% summarise(Q3.Protein = quantile(protein, 0.75, na.rm = TRUE))

## # A tibble: 8 x 2
##   restaurant  Q3.Protein
##   <chr>            <dbl>
## 1 Arbys               38
## 2 Burger King         36
## 3 Chick Fil-A         37
## 4 Dairy Queen         34
## 5 Mcdonalds           46
## 6 Sonic               35
## 7 Subway              40
## 8 Taco Bell           22

Which chain has the highest 3rd quartile of protein? Taco Bell
In R, provide two graphics – a boxplot and something resembling a density plot or histogram – that shows the distributions of sodium and cholesterol.

boxplot(FFood$sodium)

hist(FFood$sodium)

boxplot(FFood$cholesterol)

hist(FFood$cholesterol)

In R, provide two graphics – a boxplot and something resembling a density plot – that compares the distributions of total fat and cholesterol by chain. ** I will use an R package called patchwork to create multipane graphics with add (+).

install.packages("patchwork")

The structure of each graphic is identical with only the y changing or the geom changing amogst violin and boxplot.

ggplot(FFood) + aes(x=restaurant, y=total_fat, fill=restaurant) + geom_boxplot() + coord_flip() + guides(fill=FALSE)

library(patchwork)
ggplot(FFood) + aes(x=restaurant, y=total_fat, fill=restaurant) + geom_boxplot() + coord_flip() + guides(fill=FALSE) + 
ggplot(FFood) + aes(x=restaurant, y=total_fat, fill=restaurant) + geom_violin() + coord_flip() + guides(fill=FALSE) + 
ggplot(FFood) + aes(x=restaurant, y=cholesterol, fill=restaurant) + geom_boxplot() + coord_flip() + guides(fill=FALSE) + 
ggplot(FFood) + aes(x=restaurant, y=cholesterol, fill=restaurant) + geom_violin() + coord_flip() + guides(fill=FALSE)

In R, let’s construct some summary visualizations. We will use the piping operator, group_by() and summarise, to plot by chain: (a) average calories and (b) average sodium.

library(tidyverse)
( My.Plot <- FFood %>% group_by(restaurant) %>% summarise(calories = mean(calories, na.rm=TRUE)) %>% ggplot(.) + aes(x=reorder(x=restaurant, calories), y=calories) + geom_bar(stat="identity", fill="magenta") + labs(title="Average Calories for Fast Food Chains", x="Chain", y="Average Calories") + theme_light() + coord_flip() )

Ok. Now, something quite fancy.

MP <- ggplotly(My.Plot)
frameWidget(MP)

A Great Feature, filter grepl

# Let's Fix the Chicken Variable
FFood <- FFood %>% mutate(Chicken = grepl('Chicken|Chick-n',item))
FFood$Chicken[FFood$restaurant=="Chick Fil-A"] <- TRUE

Graphing Just Chicken

FFood %>% filter(Chicken==TRUE) %>% ggplot(.) + aes(x=restaurant, y=calories, fill=restaurant, color=restaurant) + geom_violin(alpha=0.3) + scale_fill_viridis_d() + scale_color_viridis_d() + geom_jitter(alpha=0.2) + coord_flip() + guides(fill=FALSE, color=FALSE) + labs(x="Chain", y="Calories", main="Calories of Chicken Items by Chain")

Robert W. Walker

Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.