Splitting a dataframe string column into multiple different columns

Tim picture Tim · Sep 5, 2013 · Viewed 103.1k times · Source

What I am trying to accomplish is splitting a column into multiple columns. I would prefer the first column to contain "F", second column "US", third "CA6" or "DL", and the fourth to be "Z13" or "U13" etc etc. My entire df follows the same pattern of X.XX.XXXX.XXX or X.XX.XXX.XXX or X.XX.XX.XXX and I know the third column is where my problem lies because of the different lengths. I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this (unless there is a better option). Thank you in advance.

Here is my df:

c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", 
"F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", 
"F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13"
)

Answer

A5C1D2H2I1M1N2O1R2T1 picture A5C1D2H2I1M1N2O1R2T1 · Sep 5, 2013

A very direct way is to just use read.table on your character vector:

> read.table(text = text, sep = ".", colClasses = "character")
   V1 V2  V3  V4
1   F US CLE V13
2   F US CA6 U13
3   F US CA6 U13
4   F US CA6 U13
5   F US CA6 U13
6   F US CA6 U13
7   F US CA6 U13
8   F US CA6 U13
9   F US  DL U13
10  F US  DL U13
11  F US  DL U13
12  F US  DL Z13
13  F US  DL Z13

colClasses needs to be specified, otherwise F gets converted to FALSE (which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )


Update (> a year later)...

Alternatively, you can use my cSplit function, like this:

cSplit(as.data.table(text), "text", ".")
#     text_1 text_2 text_3 text_4
#  1:      F     US    CLE    V13
#  2:      F     US    CA6    U13
#  3:      F     US    CA6    U13
#  4:      F     US    CA6    U13
#  5:      F     US    CA6    U13
#  6:      F     US    CA6    U13
#  7:      F     US    CA6    U13
#  8:      F     US    CA6    U13
#  9:      F     US     DL    U13
# 10:      F     US     DL    U13
# 11:      F     US     DL    U13
# 12:      F     US     DL    Z13
# 13:      F     US     DL    Z13

Or, separate from "tidyr", like this:

library(dplyr)
library(tidyr)

as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_"))
#    V_1 V_2 V_3 V_4
# 1    F  US CLE V13
# 2    F  US CA6 U13
# 3    F  US CA6 U13
# 4    F  US CA6 U13
# 5    F  US CA6 U13
# 6    F  US CA6 U13
# 7    F  US CA6 U13
# 8    F  US CA6 U13
# 9    F  US  DL U13
# 10   F  US  DL U13
# 11   F  US  DL U13
# 12   F  US  DL Z13
# 13   F  US  DL Z13