Scraping EPL Salary Data
EPL Scraping
In a previous post, I scraped some NFL data and learned the structure of Sportrac. Now, I want to scrape the available data on the EPL. The EPL data is organized in a few distinct but potentially linked tables. The basic structure is organized around team folders. Let me begin by isolating those URLs.
library(rvest)
library(tidyverse)
base_url <- "http://www.spotrac.com/epl/"
read.base <- read_html(base_url)
team.URL <- read.base %>% html_nodes(".team-name") %>% html_attr('href')
team.URL
## [1] "https://www.spotrac.com/epl/afc-bournemouth/"
## [2] "https://www.spotrac.com/epl/arsenal-f.c/"
## [3] "https://www.spotrac.com/epl/aston-villa-f.c/"
## [4] "https://www.spotrac.com/epl/brighton-hove-albion/"
## [5] "https://www.spotrac.com/epl/burnley-f.c/"
## [6] "https://www.spotrac.com/epl/chelsea-f.c/"
## [7] "https://www.spotrac.com/epl/crystal-palace/"
## [8] "https://www.spotrac.com/epl/everton-f.c/"
## [9] "https://www.spotrac.com/epl/leicester-city/"
## [10] "https://www.spotrac.com/epl/liverpool-f.c/"
## [11] "https://www.spotrac.com/epl/manchester-city-f.c/"
## [12] "https://www.spotrac.com/epl/manchester-united-f.c/"
## [13] "https://www.spotrac.com/epl/newcastle-united-f.c/"
## [14] "https://www.spotrac.com/epl/norwich-city-f.c/"
## [15] "https://www.spotrac.com/epl/sheffield-united-f.c/"
## [16] "https://www.spotrac.com/epl/southampton-f.c/"
## [17] "https://www.spotrac.com/epl/tottenham-hotspur-f.c/"
## [18] "https://www.spotrac.com/epl/watford/"
## [19] "https://www.spotrac.com/epl/west-ham-united-f.c/"
## [20] "https://www.spotrac.com/epl/wolverhampton-wanderers-f.c/"
# Clean up the URLs to get the team names by themselves.
team.names <- gsub(base_url, "", team.URL)
team.names <- gsub("-f.c", " FC", team.names)
team.names <- gsub("afc", "AFC", team.names)
team.names <- gsub("a.f.c", "AFC", team.names)
# Dashes and slashes need to removed.
team.names <- gsub("-", " ", team.names)
team.names <- gsub("/", "", team.names)
# Fix FC and AFC for Bournemouth
simpleCap <- function(x) {
s <- strsplit(x, " ")[[1]]
paste(toupper(substring(s, 1,1)), substring(s, 2), sep="", collapse=" ")
}
# Capitalise and trim white space
team.names <- sapply(team.names, simpleCap)
#team.names <- sapply(team.names, trimws)
names(team.names) <- NULL
# Now I have a vector of 20 names.
short.names <- gsub(" FC","", team.names)
short.names <- gsub(" AFC","", short.names)
EPL.names <- data.frame(team.names,short.names,team.URL)
EPL.names
## team.names
## 1 Https:www.spotrac.comeplAFC Bournemouth
## 2 Https:www.spotrac.comeplarsenal FC
## 3 Https:www.spotrac.comeplaston Villa FC
## 4 Https:www.spotrac.comeplbrighton Hove Albion
## 5 Https:www.spotrac.comeplburnley FC
## 6 Https:www.spotrac.comeplchelsea FC
## 7 Https:www.spotrac.comeplcrystal Palace
## 8 Https:www.spotrac.comepleverton FC
## 9 Https:www.spotrac.comeplleicester City
## 10 Https:www.spotrac.comeplliverpool FC
## 11 Https:www.spotrac.comeplmanchester City FC
## 12 Https:www.spotrac.comeplmanchester United FC
## 13 Https:www.spotrac.comeplnewcastle United FC
## 14 Https:www.spotrac.comeplnorwich City FC
## 15 Https:www.spotrac.comeplsheffield United FC
## 16 Https:www.spotrac.comeplsouthampton FC
## 17 Https:www.spotrac.comepltottenham Hotspur FC
## 18 Https:www.spotrac.comeplwatford
## 19 Https:www.spotrac.comeplwest Ham United FC
## 20 Https:www.spotrac.comeplwolverhampton Wanderers FC
## short.names
## 1 Https:www.spotrac.comeplAFC Bournemouth
## 2 Https:www.spotrac.comeplarsenal
## 3 Https:www.spotrac.comeplaston Villa
## 4 Https:www.spotrac.comeplbrighton Hove Albion
## 5 Https:www.spotrac.comeplburnley
## 6 Https:www.spotrac.comeplchelsea
## 7 Https:www.spotrac.comeplcrystal Palace
## 8 Https:www.spotrac.comepleverton
## 9 Https:www.spotrac.comeplleicester City
## 10 Https:www.spotrac.comeplliverpool
## 11 Https:www.spotrac.comeplmanchester City
## 12 Https:www.spotrac.comeplmanchester United
## 13 Https:www.spotrac.comeplnewcastle United
## 14 Https:www.spotrac.comeplnorwich City
## 15 Https:www.spotrac.comeplsheffield United
## 16 Https:www.spotrac.comeplsouthampton
## 17 Https:www.spotrac.comepltottenham Hotspur
## 18 Https:www.spotrac.comeplwatford
## 19 Https:www.spotrac.comeplwest Ham United
## 20 Https:www.spotrac.comeplwolverhampton Wanderers
## team.URL
## 1 https://www.spotrac.com/epl/afc-bournemouth/
## 2 https://www.spotrac.com/epl/arsenal-f.c/
## 3 https://www.spotrac.com/epl/aston-villa-f.c/
## 4 https://www.spotrac.com/epl/brighton-hove-albion/
## 5 https://www.spotrac.com/epl/burnley-f.c/
## 6 https://www.spotrac.com/epl/chelsea-f.c/
## 7 https://www.spotrac.com/epl/crystal-palace/
## 8 https://www.spotrac.com/epl/everton-f.c/
## 9 https://www.spotrac.com/epl/leicester-city/
## 10 https://www.spotrac.com/epl/liverpool-f.c/
## 11 https://www.spotrac.com/epl/manchester-city-f.c/
## 12 https://www.spotrac.com/epl/manchester-united-f.c/
## 13 https://www.spotrac.com/epl/newcastle-united-f.c/
## 14 https://www.spotrac.com/epl/norwich-city-f.c/
## 15 https://www.spotrac.com/epl/sheffield-united-f.c/
## 16 https://www.spotrac.com/epl/southampton-f.c/
## 17 https://www.spotrac.com/epl/tottenham-hotspur-f.c/
## 18 https://www.spotrac.com/epl/watford/
## 19 https://www.spotrac.com/epl/west-ham-united-f.c/
## 20 https://www.spotrac.com/epl/wolverhampton-wanderers-f.c/
With clean names, I can take each of the scraping tasks in order.
Payroll Data
The teams have payroll information that is broken down into active players, reserves, and loanees. The workflow is first to create the relevant URLs to scrape the payroll data.
team_links <- paste0(team.URL,"payroll/",sep="")
With URLs, I am going to set forth on the task. First, the SelectorGadget and a glimpse of the documents suggests an easy solution. I want to isolate the table nodes and keep the tables. First, a function for the URLs.
data.creator <- function(link) {
read_html(link) %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
}
Now I want to apply data scraping function to the URLs. Then, I want to name the list items, assess the size of the active roster, and then clean up the relevant data.
EPL.salary <- sapply(team_links, function(x) {data.creator(x)})
names(EPL.salary) <- EPL.names$short.names
team.len <- sapply(seq(1,20), function(x) { dim(EPL.salary[[x]][[1]])[[1]]})
Team <- rep(EPL.names$short.names, team.len)
Players <- sapply(seq(1,20), function(x) { str_split(EPL.salary[[x]][[1]][,1], "\t", simplify=TRUE)[,31]})
Position <- sapply(seq(1,20), function(x) { EPL.salary[[x]][[1]][,2]})
Base.Salary <- sapply(seq(1,20), function(x) { Res <- gsub("£", "", EPL.salary[[x]][[1]][,3]); gsub(",","",Res)})
EPL.Result <- data.frame(Players=unlist(Players), Team=Team, Position=unlist(Position), Base.Salary=unlist(Base.Salary))
EPL.Result$Base.Salary <- str_replace(EPL.Result$Base.Salary, "-", NA_character_)
EPL.Result$Base.Num <- as.numeric(EPL.Result$Base.Salary)
EPL.Result %>% group_by(Position) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 4 x 3
## Position Mean.Base.Salary sdBS
## <fct> <dbl> <dbl>
## 1 D 25.8 3.77
## 2 F 24.6 3.80
## 3 GK 28.9 4.63
## 4 M 24.8 5.21
EPL.Result %>% group_by(Position,Team) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 80 x 4
## # Groups: Position [4]
## Position Team Mean.Base.Salary sdBS
## <fct> <fct> <dbl> <dbl>
## 1 D Https:www.spotrac.comeplAFC Bournemouth 25.1 4.30
## 2 D Https:www.spotrac.comeplarsenal 24.9 3.59
## 3 D Https:www.spotrac.comeplaston Villa 24.9 3.09
## 4 D Https:www.spotrac.comeplbrighton Hove Albion 26.5 3.96
## 5 D Https:www.spotrac.comeplburnley 28.2 2.60
## 6 D Https:www.spotrac.comeplchelsea 25.3 2.66
## 7 D Https:www.spotrac.comeplcrystal Palace 27.8 3.19
## 8 D Https:www.spotrac.comepleverton 27 3.82
## 9 D Https:www.spotrac.comeplleicester City 26 5.81
## 10 D Https:www.spotrac.comeplliverpool 25 3.42
## # … with 70 more rows
Finally, a little picture to describe spending on the active roster.
fplot <- ggplot(EPL.Result, aes(Base.Num,Team))
gpl <- fplot + geom_jitter(height=0.25, width=0) + facet_wrap(~Position) + labs(x="Base Salary")
gpl
Contracts
The contracts are stored in a different URL structure that is accessible via contracts in the html tree by tean. Firstm I want to paste the names together with links to explore.
team_links <- paste0(team.URL,"contracts/",sep="")
Now I have all the links that I need and can turn to processing the data. This is something of a mess. Let me first grab some data to showcase the problem. In what follows, first I will grab the HTML files.
Base.Contracts <- lapply(team_links, read_html)
Processing them is a bit more difficult. What does the basic table look like?
Base.Contracts[[1]] %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
## [[1]]
## Player (31) Pos. Age
## 1 DefoeJermain Defoe F 37
## 2 BegovicAsmir Begovic GK 32
## 3 LermaJefferson Lerma M 25
## 4 WilsonCallum Wilson F 27
## 5 KingJoshua King F 28
## 6 AkeNathan Ake D 24
## 7 CookSteve Cook D 28
## 8 SurmanAndrew Surman M 33
## 9 ArterHarry Arter M 30
## 10 IbeJordon Ibe F 24
## 11 CookLewis Cook M 23
## 12 GoslingDan Gosling M 30
## 13 SmithAdam Smith D 28
## 14 FraserRyan Fraser F 25
## 15 DanielsCharlie Daniels D 33
## 16 StanislasJunior Stanislas F 30
## 17 SmithBrad Smith D 25
## 18 BrooksDavid Brooks M 22
## 19 RicoDiego Rico D 26
## 20 SolankeDominic Solanke F 22
## 21 BillingPhilip Billing M 23
## 22 FrancisSimon Francis D 34
## 23 MephamChris Mepham D 22
## 24 BorucArtur Boruc GK 39
## 25 RamsdaleAaron Ramsdale GK 21
## 26 SimpsonJack Simpson D 23
## 27 SurridgeSam Surridge F 22
## 28 KellyLloyd Kelly D 21
## 29 StaceyJack Stacey D 23
## 30 TraversMark Travers GK 20
## 31 DanjumaArnaut Danjuma F 23
## Contract Terms
## 1 16140000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£16,140,000
## 2 14560000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£14,560,000
## 3 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 4 12480000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£12,480,000
## 5 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 6 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 7 9360000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,360,000
## 8 9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 9 8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 10 7904000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,904,000
## 11 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 12 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 13 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 14 5616000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,616,000
## 15 5460000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,460,000
## 16 5460000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,460,000
## 17 5200000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 18 4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 19 4004000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,004,000
## 20 3900000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,900,000
## 21 3900000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,900,000
## 22 3120000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,120,000
## 23 3035000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,035,000
## 24 2080000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 25 780000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£780,000
## 26 468000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£468,000
## 27 0\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## 28 0\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## 29 0\n\t\t\t\t\t\t\t\t\t\t\t-\n\t\t\t\t\t\t\t\t\t\t\t-
## 30 0\n\t\t\t\t\t\t\t\t\t\t\t-\n\t\t\t\t\t\t\t\t\t\t\t-
## 31 0\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## Avg. Salary Transfer Fee Expires
## 1 £5,380,000 - 2020
## 2 £3,640,000 £9,780,000 2021
## 3 £2,600,000 £25,125,000 2023
## 4 £3,120,000 £3,000,000 2023
## 5 £2,340,000 - 2021
## 6 £2,080,000 £19,380,000 2022
## 7 £2,340,000 £150,000 2021
## 8 £1,820,000 £536,000 2020
## 9 £2,080,000 - 2021
## 10 £1,976,000 £15,000,000 2020
## 11 £1,820,000 £5,950,000 2022
## 12 £1,820,000 - 2021
## 13 £1,820,000 - 2021
## 14 £1,404,000 £400,000 2020
## 15 £1,820,000 £191,000 2020
## 16 £1,820,000 - 2021
## 17 £1,300,000 £3,060,000 2020
## 18 £1,040,000 £10,125,000 2022
## 19 £1,001,000 £10,750,000 2022
## 20 £1,300,000 £24,170,000 2022
## 21 £780,000 £18,810,000 2024
## 22 £1,560,000 - 2020
## 23 £607,000 £15,500,000 2023
## 24 £2,080,000 £10,000,000 2020
## 25 £156,000 £799,000 2021
## 26 £156,000 - 2020
## 27 - - 2021
## 28 - £16,870,000 -
## 29 - £5,070,000 -
## 30 - - -
## 31 - £20,520,000 2024
##
## [[2]]
## Player (31) Pos. Age Contract Terms Avg. Salary Transfer Fee
## 1 Harry Wilson M 21 1 yr\n\t\t\t\t\t\t\t\t\t\t\t- - -
## Expires
## 1 -
The names and the contract year and terms are going to require parsing. I have chosen the first html that corresponds to Bournemouth; other teams are worse because loan players are in a second table. That impacts the wage bill, perhaps, depending on the arrangement in the loan, but the contract details from the player do not have that team as signatory. This has to be fixed. That is easy enough to fix, there are two embedded tables and I can select the first one. When it comes to the names, there is no easy separation for the first column; I will grab them from nodes in the html.
data.creator <- function(data) {
data %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE) -> ret.tab
nrowsm <- dim(ret.tab[[1]])[[1]]
split.me <- ret.tab[[1]][,4]
tempdf <- data.frame(matrix(data=gsub("\t|-","",unlist(strsplit(split.me, "\\n"))), nrow=nrowsm, byrow=TRUE))
names(tempdf) <- c("value","years","value.pds")
data %>% html_nodes(".player") %>% html_nodes("a") %>% html_text() -> Player.Names
Player.Names <- Player.Names[c(1:nrowsm)]
data %>% html_nodes(".player") %>% html_nodes("a") %>% html_attr("href") -> Player.Links
Player.links <- Player.Links[c(1:nrowsm)]
data %>% html_nodes(".player") %>% html_nodes("span") %>% html_text() -> Last.Name
Last.Name <- Last.Name[c(1:nrowsm)]
names(ret.tab[1][[1]])[c(1:2)] <- c("Player","Position")
# data.frame(ret.tab[,c(5,6,7)])
return(data.frame(ret.tab[1][[1]],tempdf,Player.Names,Player.links,Last.Name))
}
EPL.Contracts <- lapply(Base.Contracts, data.creator)
names(EPL.Contracts) <- EPL.names$short.names
EPL.Contracts[[1]]
## Player Position Age
## 1 DefoeJermain Defoe F 37
## 2 BegovicAsmir Begovic GK 32
## 3 LermaJefferson Lerma M 25
## 4 WilsonCallum Wilson F 27
## 5 KingJoshua King F 28
## 6 AkeNathan Ake D 24
## 7 CookSteve Cook D 28
## 8 SurmanAndrew Surman M 33
## 9 ArterHarry Arter M 30
## 10 IbeJordon Ibe F 24
## 11 CookLewis Cook M 23
## 12 GoslingDan Gosling M 30
## 13 SmithAdam Smith D 28
## 14 FraserRyan Fraser F 25
## 15 DanielsCharlie Daniels D 33
## 16 StanislasJunior Stanislas F 30
## 17 SmithBrad Smith D 25
## 18 BrooksDavid Brooks M 22
## 19 RicoDiego Rico D 26
## 20 SolankeDominic Solanke F 22
## 21 BillingPhilip Billing M 23
## 22 FrancisSimon Francis D 34
## 23 MephamChris Mepham D 22
## 24 BorucArtur Boruc GK 39
## 25 RamsdaleAaron Ramsdale GK 21
## 26 SimpsonJack Simpson D 23
## 27 SurridgeSam Surridge F 22
## 28 KellyLloyd Kelly D 21
## 29 StaceyJack Stacey D 23
## 30 TraversMark Travers GK 20
## 31 DanjumaArnaut Danjuma F 23
## Contract.Terms
## 1 16140000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£16,140,000
## 2 14560000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£14,560,000
## 3 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 4 12480000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£12,480,000
## 5 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 6 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 7 9360000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,360,000
## 8 9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 9 8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 10 7904000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,904,000
## 11 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 12 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 13 7280000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,280,000
## 14 5616000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,616,000
## 15 5460000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,460,000
## 16 5460000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,460,000
## 17 5200000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 18 4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 19 4004000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,004,000
## 20 3900000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,900,000
## 21 3900000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,900,000
## 22 3120000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,120,000
## 23 3035000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£3,035,000
## 24 2080000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 25 780000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£780,000
## 26 468000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£468,000
## 27 0\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## 28 0\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## 29 0\n\t\t\t\t\t\t\t\t\t\t\t-\n\t\t\t\t\t\t\t\t\t\t\t-
## 30 0\n\t\t\t\t\t\t\t\t\t\t\t-\n\t\t\t\t\t\t\t\t\t\t\t-
## 31 0\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## Avg..Salary Transfer.Fee Expires value years value.pds Player.Names
## 1 £5,380,000 - 2020 16140000 3 yr £16,140,000 Jermain Defoe
## 2 £3,640,000 £9,780,000 2021 14560000 4 yr £14,560,000 Asmir Begovic
## 3 £2,600,000 £25,125,000 2023 13000000 5 yr £13,000,000 Jefferson Lerma
## 4 £3,120,000 £3,000,000 2023 12480000 4 yr £12,480,000 Callum Wilson
## 5 £2,340,000 - 2021 11700000 5 yr £11,700,000 Joshua King
## 6 £2,080,000 £19,380,000 2022 10400000 5 yr £10,400,000 Nathan Ake
## 7 £2,340,000 £150,000 2021 9360000 4 yr £9,360,000 Steve Cook
## 8 £1,820,000 £536,000 2020 9100000 5 yr £9,100,000 Andrew Surman
## 9 £2,080,000 - 2021 8320000 4 yr £8,320,000 Harry Arter
## 10 £1,976,000 £15,000,000 2020 7904000 4 yr £7,904,000 Jordon Ibe
## 11 £1,820,000 £5,950,000 2022 7280000 4 yr £7,280,000 Lewis Cook
## 12 £1,820,000 - 2021 7280000 4 yr £7,280,000 Dan Gosling
## 13 £1,820,000 - 2021 7280000 4 yr £7,280,000 Adam Smith
## 14 £1,404,000 £400,000 2020 5616000 4 yr £5,616,000 Ryan Fraser
## 15 £1,820,000 £191,000 2020 5460000 3 yr £5,460,000 Charlie Daniels
## 16 £1,820,000 - 2021 5460000 3 yr £5,460,000 Junior Stanislas
## 17 £1,300,000 £3,060,000 2020 5200000 4 yr £5,200,000 Brad Smith
## 18 £1,040,000 £10,125,000 2022 4160000 4 yr £4,160,000 David Brooks
## 19 £1,001,000 £10,750,000 2022 4004000 4 yr £4,004,000 Diego Rico
## 20 £1,300,000 £24,170,000 2022 3900000 3 yr £3,900,000 Dominic Solanke
## 21 £780,000 £18,810,000 2024 3900000 5 yr £3,900,000 Philip Billing
## 22 £1,560,000 - 2020 3120000 2 yr £3,120,000 Simon Francis
## 23 £607,000 £15,500,000 2023 3035000 5 yr £3,035,000 Chris Mepham
## 24 £2,080,000 £10,000,000 2020 2080000 1 yr £2,080,000 Artur Boruc
## 25 £156,000 £799,000 2021 780000 5 yr £780,000 Aaron Ramsdale
## 26 £156,000 - 2020 468000 3 yr £468,000 Jack Simpson
## 27 - - 2021 0 3 yr Sam Surridge
## 28 - £16,870,000 - 0 3 yr Lloyd Kelly
## 29 - £5,070,000 - 0 Jack Stacey
## 30 - - - 0 Mark Travers
## 31 - £20,520,000 2024 0 5 yr Arnaut Danjuma
## Player.links Last.Name
## 1 https://www.spotrac.com/redirect/player/23836/ Defoe
## 2 https://www.spotrac.com/redirect/player/22625/ Begovic
## 3 https://www.spotrac.com/redirect/player/27878/ Lerma
## 4 https://www.spotrac.com/redirect/player/22694/ Wilson
## 5 https://www.spotrac.com/redirect/player/22685/ King
## 6 https://www.spotrac.com/redirect/player/15521/ Ake
## 7 https://www.spotrac.com/redirect/player/22677/ Cook
## 8 https://www.spotrac.com/redirect/player/22692/ Surman
## 9 https://www.spotrac.com/redirect/player/22674/ Arter
## 10 https://www.spotrac.com/redirect/player/22684/ Ibe
## 11 https://www.spotrac.com/redirect/player/22676/ Cook
## 12 https://www.spotrac.com/redirect/player/22682/ Gosling
## 13 https://www.spotrac.com/redirect/player/22689/ Smith
## 14 https://www.spotrac.com/redirect/player/22680/ Fraser
## 15 https://www.spotrac.com/redirect/player/22678/ Daniels
## 16 https://www.spotrac.com/redirect/player/22691/ Stanislas
## 17 https://www.spotrac.com/redirect/player/22690/ Smith
## 18 https://www.spotrac.com/redirect/player/27877/ Brooks
## 19 https://www.spotrac.com/redirect/player/27879/ Rico
## 20 https://www.spotrac.com/redirect/player/22637/ Solanke
## 21 https://www.spotrac.com/redirect/player/23192/ Billing
## 22 https://www.spotrac.com/redirect/player/22679/ Francis
## 23 https://www.spotrac.com/redirect/player/28460/ Mepham
## 24 https://www.spotrac.com/redirect/player/22660/ Boruc
## 25 https://www.spotrac.com/redirect/player/22661/ Ramsdale
## 26 https://www.spotrac.com/redirect/player/24102/ Simpson
## 27 https://www.spotrac.com/redirect/player/22693/ Surridge
## 28 https://www.spotrac.com/redirect/player/30108/ Kelly
## 29 https://www.spotrac.com/redirect/player/32060/ Stacey
## 30 https://www.spotrac.com/redirect/player/32306/ Travers
## 31 https://www.spotrac.com/redirect/player/32702/ Danjuma
The data now have some junk alongside workable versions of the variables of interest. It is worth noting that the header of the contracts data allows us to verify the size of the table as we picked it up [though I do rename them to allow the rbind to work]. This also suggests a strategy for picking up the rownames that is different than the above method that uses the dimension of the html table. Perhaps I should just gsub the header to recover the integer number of players. To tidy the data, they need to be stacked. A simple do.call and row bind will probably work.
Team.Base <- sapply(EPL.Contracts, dim)[1,]
Team <- rep(as.character(names(Team.Base)),Team.Base)
EPL.Contracts.df <- do.call("rbind",EPL.Contracts)
rownames(EPL.Contracts.df) <- NULL
EPL.Contracts.df$Team <- Team
EPL.Contracts.df$value <- as.numeric(as.character(EPL.Contracts.df$value))
EPL.Contracts.df %>% group_by(Team) %>% summarise(Team.Mean=mean(value, na.rm=TRUE)/1e3, Team.SD=sd(value, na.rm=TRUE)) -> Team.mean
pp <- Team.mean %>% arrange(Team.Mean)
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Team.Mean,Team, size=Team.SD)) + geom_point() + labs(x="Avg. Contract (1000s)") -> cplot
cplot
EPL.Contracts.df %>% group_by(Team) %>% summarise(Age.Mean=mean(Age, na.rm=TRUE), Age.SD=sd(Age, na.rm=TRUE)) -> Team.mean
Team.mean %>% ungroup() %>% arrange(., Age.Mean) -> pp
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Age.Mean,Team,size=Age.SD)) + geom_point() + labs(x="Age") -> cplot
cplot