Importing Excel Data

How To Import a Microsoft Excel File

The go to tool comes from the readxl library in R. We can install it with:

install.packages("readxl")

To use it, the Markdown must call it – make it active – just as we must at the command line to make it work. The Files pane will make this easier, we can right click to import and get code from the subsequent interaction.

library(readxl)

The command to read Excel files comes in three forms:
- read_excel()
- read_xls()
- read_xlsx()

where the first works for all file types while the second is written specifically for older .xls files and the third is written for newer .xlsx file types. If we type ?read_excel we can obtain the help file that guides us through a host of situations including specifying a range of cells, whether or not to use the first row as column names, the data type in a column, what is missing [NA], whether rows should be skipped, and a host of others. The one thing that is required is the file that we wish to import [known to this command as the path]. We can acquire the file from the following link. If my file is known to my computer as /home/rob/Downloads/BondFunds.xlsx then I also want to be careful about the sheet, in this case, it is JustData:

library(tidyverse)
library(readxl)
Bonds <- read_excel(path="/home/rob/Downloads/BondFunds.xlsx", sheet="JustData")
Bonds
## # A tibble: 184 x 9
##    `Fund Number` Type  Assets Fees  `Expense Ratio` `Return 2009`
##    <chr>         <chr>  <dbl> <chr>           <dbl>         <dbl>
##  1 FN-1          Inte…  7268. No               0.45           6.9
##  2 FN-2          Inte…   475. No               0.5            9.8
##  3 FN-3          Inte…   193  No               0.71           6.3
##  4 FN-4          Inte… 18604. No               0.13           5.4
##  5 FN-5          Inte…   143. No               0.6            5.9
##  6 FN-6          Inte…  1402. No               0.54           5.7
##  7 FN-7          Inte…   986. No               0.49           3  
##  8 FN-8          Inte…  2189. No               0.55           7.4
##  9 FN-9          Inte…   391. No               0.67           5.3
## 10 FN-10         Inte…   544. No               0.63           5.7
## # … with 174 more rows, and 3 more variables: `3-Year Return` <dbl>, `5-Year
## #   Return` <dbl>, Risk <chr>

That works just as planned. If I leave the specification of the sheet out, it will load the first sheet.

FirstSheet <- read_excel(path="/home/rob/Downloads/BondFunds.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
FirstSheet
## # A tibble: 18 x 9
##    ...1     ...2    ...3    ...4    ...5   ...6  ...7  ...8  `mosaic(xtabs(~Typ…
##    <chr>    <chr>   <chr>   <chr>   <chr>  <lgl> <lgl> <lgl> <lgl>              
##  1 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  2 Fees an… Fees    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  3 Type     No      Yes     Grand … <NA>   NA    NA    NA    NA                 
##  4 Interme… 0.2880… 0.1847… 0.4728… <NA>   NA    NA    NA    NA                 
##  5 Short T… 0.4184… 0.1086… 0.5271… <NA>   NA    NA    NA    NA                 
##  6 Grand T… 0.7065… 0.2934… 1       <NA>   NA    NA    NA    NA                 
##  7 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  8 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  9 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 10 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 11 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 12 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 13 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 14 Risk an… Risk    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 15 Type     Above … Average Below … Grand… NA    NA    NA    NA                 
## 16 Interme… 29      32      26      87     NA    NA    NA    NA                 
## 17 Short T… 30      37      30      97     NA    NA    NA    NA                 
## 18 Grand T… 59      69      56      184    NA    NA    NA    NA

That is not well-formatted because the first sheet is a table of sorts. A little bit of Rvangelism.

A look at the magic of R.

I am going to use R’s ability to define variable(s) to make my life easier. I do not wish to know what the tabs are; let R figure it out.

AllSheets <- function(path) { # Feed a path to an Excel file
  WorkBook <- excel_sheets(path=path) %>% # Get a vector of sheet names
    map(~ read_excel(path=path, sheet=.x)) # Load each sheet by name
  names(WorkBook) <- excel_sheets(path=path) # Rename the sheets by their names
  return(WorkBook)
}
MyWB <- AllSheets(path="/home/rob/Downloads/BondFunds.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
## New names:
## * `` -> ...10
MyWB
## $ContingencyTable
## # A tibble: 18 x 9
##    ...1     ...2    ...3    ...4    ...5   ...6  ...7  ...8  `mosaic(xtabs(~Typ…
##    <chr>    <chr>   <chr>   <chr>   <chr>  <lgl> <lgl> <lgl> <lgl>              
##  1 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  2 Fees an… Fees    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  3 Type     No      Yes     Grand … <NA>   NA    NA    NA    NA                 
##  4 Interme… 0.2880… 0.1847… 0.4728… <NA>   NA    NA    NA    NA                 
##  5 Short T… 0.4184… 0.1086… 0.5271… <NA>   NA    NA    NA    NA                 
##  6 Grand T… 0.7065… 0.2934… 1       <NA>   NA    NA    NA    NA                 
##  7 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  8 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
##  9 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 10 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 11 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 12 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 13 <NA>     <NA>    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 14 Risk an… Risk    <NA>    <NA>    <NA>   NA    NA    NA    NA                 
## 15 Type     Above … Average Below … Grand… NA    NA    NA    NA                 
## 16 Interme… 29      32      26      87     NA    NA    NA    NA                 
## 17 Short T… 30      37      30      97     NA    NA    NA    NA                 
## 18 Grand T… 59      69      56      184    NA    NA    NA    NA                 
## 
## $DATA
## # A tibble: 184 x 11
##    `Fund Number` Type  Assets Fees  `Expense Ratio` `Return 2009`
##    <chr>         <chr>  <dbl> <chr>           <dbl>         <dbl>
##  1 FN-1          Inte…  7268. No               0.45           6.9
##  2 FN-2          Inte…   475. No               0.5            9.8
##  3 FN-3          Inte…   193  No               0.71           6.3
##  4 FN-4          Inte… 18604. No               0.13           5.4
##  5 FN-5          Inte…   143. No               0.6            5.9
##  6 FN-6          Inte…  1402. No               0.54           5.7
##  7 FN-7          Inte…   986. No               0.49           3  
##  8 FN-8          Inte…  2189. No               0.55           7.4
##  9 FN-9          Inte…   391. No               0.67           5.3
## 10 FN-10         Inte…   544. No               0.63           5.7
## # … with 174 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## #   Return` <dbl>, Risk <chr>, ...10 <lgl>, `par(mfrow=c(2,2))` <chr>
## 
## $IGDATA
## # A tibble: 87 x 11
##    `Fund Number` Type  Assets Fees  `Expense Ratio` `Return 2009`
##    <chr>         <chr>  <dbl> <chr>           <dbl>         <dbl>
##  1 FN-1          Inte…  7268. No               0.45           6.9
##  2 FN-2          Inte…   475. No               0.5            9.8
##  3 FN-3          Inte…   193  No               0.71           6.3
##  4 FN-4          Inte… 18604. No               0.13           5.4
##  5 FN-5          Inte…   143. No               0.6            5.9
##  6 FN-6          Inte…  1402. No               0.54           5.7
##  7 FN-7          Inte…   986. No               0.49           3  
##  8 FN-8          Inte…  2189. No               0.55           7.4
##  9 FN-9          Inte…   391. No               0.67           5.3
## 10 FN-10         Inte…   544. No               0.63           5.7
## # … with 77 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## #   Return` <dbl>, Risk <chr>, Bins <dbl>, Midpoints <dbl>
## 
## $STCDATA
## # A tibble: 97 x 11
##    `Fund Number` Type  Assets Fees  `Expense Ratio` `Return 2009`
##    <chr>         <chr>  <dbl> <chr>           <dbl>         <dbl>
##  1 FN-88         Shor…  139.  No               0.51           5.5
##  2 FN-89         Shor…  124.  No               0.32           5  
##  3 FN-90         Shor… 1922   Yes              1.08          12.1
##  4 FN-91         Shor…  203.  Yes              1              8.3
##  5 FN-92         Shor…   66.1 No               0.71           6.8
##  6 FN-93         Shor… 1346   No               0.65           8.6
##  7 FN-94         Shor… 4773.  No               0.56           5  
##  8 FN-95         Shor…   77.5 No               0.51           2.2
##  9 FN-96         Shor…   76.2 No               0.68           2.5
## 10 FN-97         Shor…  146.  No               0.55          12.2
## # … with 87 more rows, and 5 more variables: `3-Year Return` <dbl>, `5-Year
## #   Return` <dbl>, Risk <chr>, Bins <dbl>, Midpoints <dbl>
## 
## $RETURN2009
## # A tibble: 97 x 2
##    `Intermediate Government` `Short Term Corporate`
##                        <dbl>                  <dbl>
##  1                       6.9                    5.5
##  2                       9.8                    5  
##  3                       6.3                   12.1
##  4                       5.4                    8.3
##  5                       5.9                    6.8
##  6                       5.7                    8.6
##  7                       3                      5  
##  8                       7.4                    2.2
##  9                       5.3                    2.5
## 10                       5.7                   12.2
## # … with 87 more rows
## 
## $JustData
## # A tibble: 184 x 9
##    `Fund Number` Type  Assets Fees  `Expense Ratio` `Return 2009`
##    <chr>         <chr>  <dbl> <chr>           <dbl>         <dbl>
##  1 FN-1          Inte…  7268. No               0.45           6.9
##  2 FN-2          Inte…   475. No               0.5            9.8
##  3 FN-3          Inte…   193  No               0.71           6.3
##  4 FN-4          Inte… 18604. No               0.13           5.4
##  5 FN-5          Inte…   143. No               0.6            5.9
##  6 FN-6          Inte…  1402. No               0.54           5.7
##  7 FN-7          Inte…   986. No               0.49           3  
##  8 FN-8          Inte…  2189. No               0.55           7.4
##  9 FN-9          Inte…   391. No               0.67           5.3
## 10 FN-10         Inte…   544. No               0.63           5.7
## # … with 174 more rows, and 3 more variables: `3-Year Return` <dbl>, `5-Year
## #   Return` <dbl>, Risk <chr>
Avatar
Robert W. Walker
Associate Professor of Quantitative Methods

My research interests include causal inference, statistical computation and data visualization.

Next
Previous