Aggregating by unique identifier and concatenating related values into a string

roody picture roody · May 16, 2013 · Viewed 45.5k times · Source

I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out.

I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to de-dupicate by the name (brand) and concatenate the multiple possible id's into a string separated by a comment.

For example:

brand            id 
RadioShack       2308
Rag & Bone       4466
Ragu             1830
Ragu             4518
Ralph Lauren     1638
Ralph Lauren     2719
Ralph Lauren     2720
Ralph Lauren     2721
Ralph Lauren     2722 

should become:

RadioShack       2308
Rag & Bone       4466
Ragu             1830,4518
Ralph Lauren     1638,2719,2720,2721,2722

How would I accomplish this?

Answer

Jilber Urbina picture Jilber Urbina · May 16, 2013

Let's call your data.frame DF

> aggregate(id ~ brand, data = DF, c)
         brand                           id
1   RadioShack                         2308
2   Rag & Bone                         4466
3         Ragu                   1830, 4518
4 Ralph Lauren 1638, 2719, 2720, 2721, 2722

Another alternative using aggregate is:

result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")

This produces the same result and now id is not a list anymore. Thanks to @Frank comment. To see the class of each column try:

> sapply(result, class)
      brand          id 
   "factor" "character"

As mentioned by @DavidArenburg in the comments, another alternative is using the toString function:

aggregate(id ~ brand, data = DF, toString)