Matching multiple columns on different data frames and getting other column as result

user976991 picture user976991 · Nov 8, 2012 · Viewed 58.2k times · Source

I got two big data frames, one (df1) has this structure

   chr    init
1  12  25289552
2   3 180418785
3   3 180434779

The other (df2) has this

    V1    V2     V3
10  1     69094 medium
11  1     69094 medium
12  12 25289552 high
13  1     69095 medium
14  3 180418785 medium
15  3 180434779 low

What I'm trying to do is to add the column V3 of df2 to df1, to get the info of the mutation

   chr    init  Mut
1  12  25289552 high
2   3 180418785 medium
3   3 180434779 low

I'm trying loading both into R and then doing a for loop using match but it doesn't work. Do you know any special way to do this? I am also open to do using awk or something similar

Answer

Jilber Urbina picture Jilber Urbina · Nov 8, 2012

Use merge

df1 <- read.table(text='  chr    init
1  12  25289552
2   3 180418785
3   3 180434779', header=TRUE)


df2 <- read.table(text='    V1    V2     V3
10  1     69094 medium
11  1     69094 medium
12  12 25289552 high
13  1     69095 medium
14  3 180418785 medium
15  3 180434779 low', header=TRUE)


merge(df1, df2, by.x='init', by.y='V2') # this works!
       init chr V1     V3
1  25289552  12 12   high
2 180418785   3  3 medium
3 180434779   3  3    low

To get your desired output the way you show it

output <- merge(df1, df2, by.x='init', by.y='V2')[, c(2,1,4)]
colnames(output)[3] <- 'Mut' 
output
  chr      init    Mut
1  12  25289552   high
2   3 180418785 medium
3   3 180434779    low