Importing Excel Data
How To Import a Microsoft Excel File
The go to tool comes from the readxl
library in R. We can install it with:
install.packages("readxl")
To use it, the Markdown must call it – make it active – just as we must at the command line to make it work. The Files pane will make this easier, we can right click to import and get code from the subsequent interaction.
library(readxl)
The command to read Excel files comes in three forms:
- read_excel()
- read_xls()
- read_xlsx()
where the first works for all file types while the second is written specifically for older .xls files and the third is written for newer .xlsx file types. If we type ?read_excel
we can obtain the help file that guides us through a host of situations including specifying a range of cells, whether or not to use the first row as column names, the data type in a column, what is missing [NA], whether rows should be skipped, and a host of others. The one thing that is required is the file that we wish to import [known to this command as the path]. We can acquire the file from the following link. If my file is known to my computer as /home/rob/Downloads/BondFunds.xlsx
then I also want to be careful about the sheet, in this case, it is JustData:
library(tidyverse)
library(readxl)
Bonds <- read_excel(path="/home/rob/Downloads/BondFunds.xlsx", sheet="JustData")
Bonds
## # A tibble: 184 x 9
## `Fund Number` Type Assets Fees `Expense Ratio` `Return 2009`
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 FN-1 Inte… 7268. No 0.45 6.9
## 2 FN-2 Inte… 475. No 0.5 9.8
## 3 FN-3 Inte… 193 No 0.71 6.3
## 4 FN-4 Inte… 18604. No 0.13 5.4
## 5 FN-5 Inte… 143. No 0.6 5.9
## 6 FN-6 Inte… 1402. No 0.54 5.7
## 7 FN-7 Inte… 986. No 0.49 3
## 8 FN-8 Inte… 2189. No 0.55 7.4
## 9 FN-9 Inte… 391. No 0.67 5.3
## 10 FN-10 Inte… 544. No 0.63 5.7
## # … with 174 more rows, and 3 more variables: `3-Year Return` <dbl>, `5-Year
## # Return` <dbl>, Risk <chr>
That works just as planned. If I leave the specification of the sheet out, it will load the first sheet.
FirstSheet <- read_excel(path="/home/rob/Downloads/BondFunds.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
FirstSheet
## # A tibble: 18 x 9
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 `mosaic(xtabs(~Typ…
## <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>
## 1 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 2 Fees an… Fees <NA> <NA> <NA> NA NA NA NA
## 3 Type No Yes Grand … <NA> NA NA NA NA
## 4 Interme… 0.2880… 0.1847… 0.4728… <NA> NA NA NA NA
## 5 Short T… 0.4184… 0.1086… 0.5271… <NA> NA NA NA NA
## 6 Grand T… 0.7065… 0.2934… 1 <NA> NA NA NA NA
## 7 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 8 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 9 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 10 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 11 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 12 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 13 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 14 Risk an… Risk <NA> <NA> <NA> NA NA NA NA
## 15 Type Above … Average Below … Grand… NA NA NA NA
## 16 Interme… 29 32 26 87 NA NA NA NA
## 17 Short T… 30 37 30 97 NA NA NA NA
## 18 Grand T… 59 69 56 184 NA NA NA NA
That is not well-formatted because the first sheet is a table of sorts. A little bit of Rvangelism.
A look at the magic of R.
I am going to use R’s ability to define variable(s) to make my life easier. I do not wish to know what the tabs are; let R figure it out.
AllSheets <- function(path) { # Feed a path to an Excel file
WorkBook <- excel_sheets(path=path) %>% # Get a vector of sheet names
map(~ read_excel(path=path, sheet=.x)) # Load each sheet by name
names(WorkBook) <- excel_sheets(path=path) # Rename the sheets by their names
return(WorkBook)
}
MyWB <- AllSheets(path="/home/rob/Downloads/BondFunds.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## New names:
## * `` -> ...10
MyWB
## $ContingencyTable
## # A tibble: 18 x 9
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 `mosaic(xtabs(~Typ…
## <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>
## 1 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 2 Fees an… Fees <NA> <NA> <NA> NA NA NA NA
## 3 Type No Yes Grand … <NA> NA NA NA NA
## 4 Interme… 0.2880… 0.1847… 0.4728… <NA> NA NA NA NA
## 5 Short T… 0.4184… 0.1086… 0.5271… <NA> NA NA NA NA
## 6 Grand T… 0.7065… 0.2934… 1 <NA> NA NA NA NA
## 7 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 8 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 9 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 10 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 11 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 12 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 13 <NA> <NA> <NA> <NA> <NA> NA NA NA NA
## 14 Risk an… Risk <NA> <NA> <NA> NA NA NA NA
## 15 Type Above … Average Below … Grand… NA NA NA NA
## 16 Interme… 29 32 26 87 NA NA NA NA
## 17 Short T… 30 37 30 97 NA NA NA NA
## 18 Grand T… 59 69 56 184 NA NA NA NA
##
## $DATA
## # A tibble: 184 x 11
## `Fund Number` Type Assets Fees `Expense Ratio` `Return 2009`
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 FN-1 Inte… 7268. No 0.45 6.9
## 2 FN-2 Inte… 475. No 0.5 9.8
## 3 FN-3 Inte… 193 No 0.71 6.3
## 4 FN-4 Inte… 18604. No 0.13 5.4
## 5 FN-5 Inte… 143. No 0.6 5.9
## 6 FN-6 Inte… 1402. No 0.54 5.7
## 7 FN-7 Inte… 986. No 0.49 3
## 8 FN-8 Inte… 2189. No 0.55 7.4
## 9 FN-9 Inte… 391. No 0.67 5.3
## 10 FN-10 Inte… 544. No 0.63 5.7
## # … with 174 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## # Return` <dbl>, Risk <chr>, ...10 <lgl>, `par(mfrow=c(2,2))` <chr>
##
## $IGDATA
## # A tibble: 87 x 11
## `Fund Number` Type Assets Fees `Expense Ratio` `Return 2009`
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 FN-1 Inte… 7268. No 0.45 6.9
## 2 FN-2 Inte… 475. No 0.5 9.8
## 3 FN-3 Inte… 193 No 0.71 6.3
## 4 FN-4 Inte… 18604. No 0.13 5.4
## 5 FN-5 Inte… 143. No 0.6 5.9
## 6 FN-6 Inte… 1402. No 0.54 5.7
## 7 FN-7 Inte… 986. No 0.49 3
## 8 FN-8 Inte… 2189. No 0.55 7.4
## 9 FN-9 Inte… 391. No 0.67 5.3
## 10 FN-10 Inte… 544. No 0.63 5.7
## # … with 77 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## # Return` <dbl>, Risk <chr>, Bins <dbl>, Midpoints <dbl>
##
## $STCDATA
## # A tibble: 97 x 11
## `Fund Number` Type Assets Fees `Expense Ratio` `Return 2009`
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 FN-88 Shor… 139. No 0.51 5.5
## 2 FN-89 Shor… 124. No 0.32 5
## 3 FN-90 Shor… 1922 Yes 1.08 12.1
## 4 FN-91 Shor… 203. Yes 1 8.3
## 5 FN-92 Shor… 66.1 No 0.71 6.8
## 6 FN-93 Shor… 1346 No 0.65 8.6
## 7 FN-94 Shor… 4773. No 0.56 5
## 8 FN-95 Shor… 77.5 No 0.51 2.2
## 9 FN-96 Shor… 76.2 No 0.68 2.5
## 10 FN-97 Shor… 146. No 0.55 12.2
## # … with 87 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## # Return` <dbl>, Risk <chr>, Bins <dbl>, Midpoints <dbl>
##
## $RETURN2009
## # A tibble: 97 x 2
## `Intermediate Government` `Short Term Corporate`
## <dbl> <dbl>
## 1 6.9 5.5
## 2 9.8 5
## 3 6.3 12.1
## 4 5.4 8.3
## 5 5.9 6.8
## 6 5.7 8.6
## 7 3 5
## 8 7.4 2.2
## 9 5.3 2.5
## 10 5.7 12.2
## # … with 87 more rows
##
## $JustData
## # A tibble: 184 x 9
## `Fund Number` Type Assets Fees `Expense Ratio` `Return 2009`
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 FN-1 Inte… 7268. No 0.45 6.9
## 2 FN-2 Inte… 475. No 0.5 9.8
## 3 FN-3 Inte… 193 No 0.71 6.3
## 4 FN-4 Inte… 18604. No 0.13 5.4
## 5 FN-5 Inte… 143. No 0.6 5.9
## 6 FN-6 Inte… 1402. No 0.54 5.7
## 7 FN-7 Inte… 986. No 0.49 3
## 8 FN-8 Inte… 2189. No 0.55 7.4
## 9 FN-9 Inte… 391. No 0.67 5.3
## 10 FN-10 Inte… 544. No 0.63 5.7
## # … with 174 more rows, and 3 more variables: `3-Year Return` <dbl>, `5-Year
## # Return` <dbl>, Risk <chr>