.xlsx
file into R so that numbers are represented as numbers, when their original decimal separator is comma not a dot?The only package I know of, when dealing with excel is readxl
from tidyverse
.
I'm looking for a solution that won't need opening and editing excel files in any other software (and can deal with hundreds of columns to import) - if that would be possible I'd export all excels to .csv
and import them using tools I know of, that can take the dec=
argument.
So far my best working solution is to import numbers as characters and then transform it:
library(dplyr)
library(stringr)
var1<- c("2,1", "3,2", "4,5")
var2<- c("1,2", "3,33", "5,55")
var3<- c("3,44", "2,2", "8,88")
df<- data.frame(cbind(var1, var2, var3))
df %>%
mutate_at(vars(contains("var")),
str_replace,
pattern = ",",
replacement = "\\.") %>%
mutate_at(vars(contains("var")), funs(as.numeric))
I suspect strongly that there is some other reason these columns are being read as character, most likely that they are the dreaded "Number Stored as Text".
For ordinary numbers (stored as numbers), after switching to comma as decimal separator either for an individual file or in the overall system settings, readxl::read_excel
reads in a numeric properly. (This is on my Windows system.) Even when adding a character to one of the cells in that column or setting col_types="text"
, I get the number read in using a period as decimal, not as comma, giving more evidence that readxl
is using the internally stored data type.
The only way I have gotten R to read in a comma as a decimal is when the data is stored in Excel as text instead of as numeric. (You can enter this by prefacing the number with a single quote, like '1,7
.) I then get a little green triangle in the corner of the cell, which gives the popup warning "Number Stored as Text". In my exploration, I was surprised to discover that Excel will do calculations on numbers stored as text, so that's not a valid way of checking for this.