R Read and Write xlsx Files

An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.

R has a built-in functionality that makes it easy to read and write a xlsx file.


Sample xlsx File

To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx with following data:

Sample xlsx File
Sample xlsx File

We will be reading these datas with the help of R's built-in functions.


Install and Load xlsx Package

In order to read, write, and format Excel files into R, we first need to install and load the xlsx package as:

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

Here, we have successfully installed and loaded the xlsx package.

Now, we are able to read data from an xlsx file.


Read a xlsx File in R

In R, we use the read.xlsx() function to read a xlsx file available in our current directory. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read studentinfo.xlsx file from our current directory
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam         2     Arts             Arizona

In the above example, we have read the studentinfo.xlsx file that is available in our current directory. Notice the code,

read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)

Here,

  • read.xlsx() - reads the xlsx file studentinfo.xlsx and creates a dataframe which is stored in the read_data variable.
  • sheetIndex = 1 - reads specified worksheet i.e. 1

Note:

  • If the file is in some other location, we have to specify the path along with the file name as: read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1).
  • We can also use the read.xlsx2() function if the dataset we are working on is larger.

xlsx rowIndex and colIndex Argument in R

In R, we can also read a specific range of data from excel files. We can pass the rowIndex and colIndex argument inside read.xlsx() to read specific range.

  • rowIndex - reads a specific range of rows
  • colIndex - read a specific range of columns

Example: Read Range of Rows

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read first five rows of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  rowIndex = 1:5
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas

In the above example, we have passed rowIndex = 1:5 inside read.xlsx() so the function reads only the first five rows from the studentinfo.xlsx file.

Example: Read Range of Columns

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read first three columns of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  colIndex = 1:3
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           
1    Abby       24     Business         
2    Hazzle     23     Engineering   
3    Cathy      20     Engineering    
4    Paterson  22     Arts                
5    Sammy    20    Economics      
6    Pam         21    Arts                

Here, colIndex = 1:3 inside read.xlsx() reads only the first three columns from the studentinfo.xlsx file.


xlsx startRow Argument in R

Sometimes the excel file may contain headers at the beginning which we may not want to include. For example,

startRow Argument
Headers at the Beginning

Here, the 1st Row of excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.

To start reading data from a specific row in excel worksheet, we pass the startRow argument inside read.xlsx().

Let's take a look at example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# start reading from 3rd row
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  startRow = 3
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam        21     Arts             Arizona

In the above example, we have used the startRow argument inside the read.xlsx() function to start reading from the specified row.

startRow = 3 means the first two rows are ignored and read.xlsx() starts reading data from the 3rd row.


Write Into xlsx File in R

In R, we use the write.xlsx() function to write into a xlsx file. We pass the data in the form of dataframe. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))

# write dataframe1 into file1 xlsx file
write.xlsx(dataframe1, "file1.xlsx")

In the above example, we have used the write.xlsx() function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx(),

write.xlsx(dataframe1, "file1.xlsx")

Here,

  • dataframe1 - name of the data frame we want to export
  • file1.xlsx - name of the xlsx file

Finally, the file1.xlsx file would look like this in our directory:

write into xlsx file output
file1.xlsx

Rename Current Worksheet

We can rename the current worksheet by using the sheetName argument inside the write.xlsx() function. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))

# name current worksheet
write.xlsx(dataframe1, "file1.xlsx",
  sheetName = "Voting Eligibility"
)

Here, we have passed sheetname = "Voting Eligibility" inside write.xlsx(), so the name of the sheet is changed to "Voting Eligibility".

So the file1.xlsx looks like this:

xlsx File
file1.xlsx