string split on last comma in R

Question 1

string split on last comma in R

r string split comma

Jiqing Huang · Jul 24, 2014 · Viewed 8.2k times · Source

Answer

Answer

Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"

Question 2

I'm not new to R but I am relative new to regular expression.

A similar question can be found in here.

An example is if I use

> strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK"      "USA"     "Germany"

but I want to get

[[1]]
[1] "UK, USA"     "Germany"

Another example is

> strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London"     "Washington" "D.C."       "Berlin"

and I want to get

[[1]]
[1] "London, Washington, D.C."       "Berlin"

Definitely Washington, D.C. should not be not divided into two parts, and split only by the last comma, not every comma.

One viable way I think is to replace the last comma by something else such as

$, #, *, ...

then use

strsplit()

to split the string by the one you replaced (Make sure it is unique!), but I'm more happy if you can deal with the problem using some built in function directly.

So how can I do that? many thanks

string split on last comma in R

Answer

Related questions