R: splitting a numeric string

rvrvrv picture rvrvrv · Jun 3, 2012 · Viewed 15.5k times · Source

I'm trying to split a numeric string of 40 digits (ie. splitting 123456789123456789123456789 into 1 2 3 4 etc.)

Unfortunately strsplit doesn't work as it requires characters, and converting the string using as.character doesn't work as it is very long and R automatically cuts off decimals for long digits (maximum is 22 decimals). I thus end up with "1.2345e+35" as a character string, instead of the full digit.

Is there a numeric variant of strsplit, or a work around to the decimal-cutting-off issue? I can't seem to find the answer on stackoverflow, but apologies if this has already been answered before. Thanks in advance!

Answer

Mark Miller picture Mark Miller · Jun 4, 2012

If R is calculating the number I do not know the solution. If the number is in a data file I think the code below might work. Although, if the number is in a data file there are probably much easier solutions.

a1 <- read.table("c:/users/Mark W Miller/simple R programs/long_number.txt", colClasses = 'character')

# a1 <- c('1234567891234567891234567891234567891234') ;

a1 <- as.character(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a4 <- as.vector(as.numeric(a3)) ;
a4
# [1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4

EDIT

I realize I might not understand the question, and my answer is probably pretty silly. Nevertheless, if you have an entire data set of really long numbers you could split all of them with the code below. Note that there are no quotes in the file 'three_long_numbers.txt', and the data start out as numeric:

a1 <- read.table("c:/users/Mark W Miller/simple R programs/three_long_numbers.txt", colClasses = 'character')
a1

#      V1                                        
# [1,] "1234567891234567891234567891234567891234"
# [2,] "1888678912345678912345678912345678912388"
# [3,] "1234999891234567891234567891234567891239"

# a1 <- matrix(c(
# "1234567891234567891234567891234567891234",
# "1888678912345678912345678912345678912388",
# "1234999891234567891234567891234567891239"), nrow=3, byrow=T)

a1 <- as.matrix(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a3 <- as.numeric(a3) ;
a4 <- matrix(a3, nrow=dim(a1)[1], byrow=T)
a4