read.xlsx and colClasses

user2690051 picture user2690051 · Aug 16, 2013 · Viewed 24k times · Source

Does anyone know why argument colClasses does not seem to work in read.xlsx?

I create a sample *.xlsx file:

> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: int  1 2 3
 $ D: num  1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)

then try to read it with read.xlsx, without and with colClasses argument:

> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: num  1 2 3
 $ D: num  1.1 NA NA

The problem is colClasses seems to have no effect. Any ideas?

Thank you for your help.

Aleksey

P.S. I have R 3.0.1, xlsx 0.5.1

Answer

Didzis Elferts picture Didzis Elferts · Aug 16, 2013

colClasses= is working but the problem is that on your system default action when import data is to convert character columns to factor.

If you import test.xlsx and set that all columns should be "character", you see that all columns are made as factors (also numbers).

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
 str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: Factor w/ 3 levels "A","B","C": 1 2 3
 $ B: Factor w/ 3 levels "a","b","c": 1 2 3
 $ C: Factor w/ 3 levels "1","2","3": 1 2 3
 $ D: Factor w/ 1 level "1.1": 1 NA NA

To ensure that characters are not converted to factors you can add argument stringsAsFactors=FALSE to function read.xlsx().

d2 = read.xlsx('test.xlsx', sheetName='Sheet1', 
                colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)

str(d2)
'data.frame':   3 obs. of  4 variables:
 $ A: chr  "A" "B" "C"
 $ B: chr  "a" "b" "c"
 $ C: num  1 2 3
 $ D: num  1.1 NA NA