Splitting a string into new rows in R

user3703195 picture user3703195 · Aug 20, 2014 · Viewed 7.1k times · Source

I have a data set like below:

Country Region    Molecule      Item Code   
    IND     NA       PB102      FR206985511 
   THAI     AP       PB103      BA-107603 / F000113361 / 107603
   LUXE     NA       PB105      1012701 / SGP-1012701 / F041701000
    IND     AP       PB106      AU206985211 / CA-F206985211
   THAI     HP       PB107      F034702000 / 1010701 / SGP-1010701
   BANG     NA       PB108      F000007970/25781/20009021

I want to split based the string values in ITEMCODE column on / and create a new row for each entry.

For instance, the desired output will be:

Country Region Molecule      Item.Code
    IND     NA    PB102    FR206985511
   THAI     AP    PB103      BA-107603
   THAI     AP    PB103     F000113361
   THAI     AP    PB103         107603
   LUXE     NA    PB105        1012701
   LUXE     NA    PB105    SGP-1012701
   LUXE     NA    PB105     F041701000
    IND     AP    PB106    AU206985211
    IND     AP    PB106  CA-F206985211
   THAI     HP    PB107     F034702000
   THAI     HP    PB107        1010701
   THAI     HP    PB107    SGP-1010701
   BANG     NA    PB108     F000007970
   BANG     NA    PB108          25781
   BANG     NA    PB108       20009021

I tried the below code

library(splitstackshape)
df2=concat.split.multiple(df1,"Plant.Item.Code","/", direction="long")

but got the Error

"Error: memory exhausted (limit reached?)"

When i tried strsplit() i got the below error message.

Error in strsplit(df1$Plant.Item.Code, "/") : non-character argument

Answer

David Arenburg picture David Arenburg · Aug 20, 2014

Try the cSplit function (as you already using @Anandas package). Note that is will return a data.table object, so make sure you have this package installed. You can revert back to data.frame (if you want to) by doing something like setDF(df2)

library(splitstackshape)
df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long")
df2
#     Country Region Molecule      Item.Code
#  1:     IND     NA    PB102    FR206985511
#  2:    THAI     AP    PB103      BA-107603 
#  3:    THAI     AP    PB103     F000113361 
#  4:    THAI     AP    PB103         107603
#  5:    LUXE     NA    PB105        1012701 
#  6:    LUXE     NA    PB105    SGP-1012701 
#  7:    LUXE     NA    PB105     F041701000
#  8:     IND     AP    PB106    AU206985211 
#  9:     IND     AP    PB106  CA-F206985211
# 10:    THAI     HP    PB107     F034702000 
# 11:    THAI     HP    PB107        1010701 
# 12:    THAI     HP    PB107    SGP-1010701
# 13:    BANG     NA    PB108     F000007970
# 14:    BANG     NA    PB108          25781
# 15:    BANG     NA    PB108       20009021