Does anyone know why argument colClasses
does not seem to work in read.xlsx
?
I create a sample *.xlsx file:
> library(xlsx)
> d1 = data.frame(A=LETTERS[1:3], B=letters[1:3], C=1:3, D=c(1.1, NA, NA))
> str(d1)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: int 1 2 3
$ D: num 1.1 NA NA
> write.xlsx(d1, 'test.xlsx', sheetName='Sheet1', row.names=F, showNA=F)
then try to read it with read.xlsx
, without and with colClasses
argument:
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1')
> str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: num 1 2 3
$ D: num 1.1 NA NA
> d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=c(B='character', 'A'='character'))
> str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: num 1 2 3
$ D: num 1.1 NA NA
The problem is colClasses
seems to have no effect. Any ideas?
Thank you for your help.
Aleksey
P.S. I have R 3.0.1, xlsx
0.5.1
colClasses=
is working but the problem is that on your system default action when import data is to convert character columns to factor.
If you import test.xlsx
and set that all columns should be "character"
, you see that all columns are made as factors (also numbers).
d2 = read.xlsx('test.xlsx', sheetName='Sheet1', colClasses=rep("character",4))
str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: Factor w/ 3 levels "A","B","C": 1 2 3
$ B: Factor w/ 3 levels "a","b","c": 1 2 3
$ C: Factor w/ 3 levels "1","2","3": 1 2 3
$ D: Factor w/ 1 level "1.1": 1 NA NA
To ensure that characters are not converted to factors you can add argument stringsAsFactors=FALSE
to function read.xlsx()
.
d2 = read.xlsx('test.xlsx', sheetName='Sheet1',
colClasses=c(B='character', A='character'),stringsAsFactors=FALSE)
str(d2)
'data.frame': 3 obs. of 4 variables:
$ A: chr "A" "B" "C"
$ B: chr "a" "b" "c"
$ C: num 1 2 3
$ D: num 1.1 NA NA