I have two csv files like below.
CSV1
data13 data23 d main_data1;main_data2 data13 data23
data12 data22 d main_data1;main_data2 data12 data22
data11 data21 d main_data1;main_data2 data11 data21
data3 data4 d main_data2;main_data4 data3 data4
data52 data62 d main_data3 data51 data62
data51 data61 d main_data3 main_data3 data61
data7 data8 d main_data4 data7 data8
CSV2
id1 main_data1 a1 a2 a3
id2 main_data2 b1 b2 b3
id3 main_data3 c1 c2 c3
id4 main_data4 d1 d2 d3
id5 main_data5 e1 e2 e3
Now my question is, I know how to merge two CSV files when one of the columns is exactly the same in both the files. But my question is a little different. column 4 from CSV1 could contain column 2 from CSV2. I'd like to get a CSV file as below
FINAL_CSV
id1 main_data1 a1 a2 a3 data13
id2 main_data2 b1 b2 b3 data3
id3 main_data3 c1 c2 c3 main_data3
id4 main_data4 d1 d2 d3 data7
id5 main_data5 e1 e2 e3
where:
1. it matches the data from both the columns and gets corresponding rows from the first occurrence and write to the csv file.
2. When there's no match, it can leave the last column in FINAL_CSV blank or write 'NA' or anything of that sort.
3. When data in columns 4 and 5 of CSV1 match exactly, it returns that row instead of the first occurrence.
I'm totally lost on how to do this. Helping with a part of it is fine too. Any suggestions are highly appreciated.
PS- I know data from csv file should be separated by a comma, but for the sake of clarity, I preferred tabs, though the actual data is separated by commas.
EDIT: Actually, the 'main_data' can be in any column in CSV2, not in just column2. The same 'main_data' could also repeat in multiple rows, then I'd like to get all the corresponding rows.
A way with (g)awk .
awk -F, 'NR==FNR{a[$2]=$0;next}
{split($4,b,";");x=b[1]}
(x in a)&&!c[x]++{d[x]=$5}
($5 in a){d[$5]=$5}
END{n=asorti(a,e);for(i=1;i<=n;i++)print a[e[i]]","d[e[i]]}' CSV1 CSV2
Output
id1,main_data1,a1,a2,a3,data13
id2,main_data2,b1,b2,b3,data3
id3,main_data3,c1,c2,c3,main_data3
id4,main_data4,d1,d2,d3,data7
id5,main_data5,e1,e2,e3,