I'm trying to read an Excel created .csv file into R where all the vectors (columns in the CSV file) are numeric. However, they get converted to factor everytime I import them.
Here's a sample of how my CSV looks like:
This is my code:
options(StringsAsFactors=F)
data<-read.csv("in.csv",dec=",",sep=";")
As you can see, I take care of setting de dec to "," and the sep to ";". However, all the vectors that should be numerics are factors!
Can someone give me some advice? Thanks!
Your NA
strings in the csv file, N/A
, are interpreted as character
and then the whole column is converted to character
. If you have stringsAsFactors = TRUE
in options
or in read.csv
(default), the column is further converted to factor
. You can use the argument na.strings
to tell read.csv
which strings should be interpreted as NA
.
A small example:
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";")
str(df)
df <- read.csv(text = "x;y
N/A;2,2
3,3;4,4", dec = ",", sep = ";", na.strings = "N/A")
str(df)
Update following comment
Although not apparent from the sample data provided, there is also a problem with instances of '$' concatenated to the numbers, e.g. '$3,3'. Such values will be interpreted as character
, and then the dec = ","
doesn't help us. We need to replace both the '$' and the ',' before the variable is converted to numeric.
df <- read.csv(text = "x;y;z
N/A;1,1;2,2$
$3,3;5,5;4,4", dec = ",", sep = ";", na.strings = "N/A")
df
str(df)
df[] <- lapply(df, function(x){
x2 <- gsub(pattern = "$", replacement = "", x = x, fixed = TRUE)
x3 <- gsub(pattern = ",", replacement = ".", x = x2, fixed = TRUE)
as.numeric(x3)
}
)
df
str(df)