I can not figure out how I deal with that problem:
This is my Data:
Table1: Table2:
BRAND PRODUCT SOLD
Sony Sony ABCD 1233
Apple Sony adv 1233
Google Sony aaaa 1233
IBM Apple 123 1233
etc. Apple 345 1233
IBM 13123 1233
Is it possible to filter the query that I have a table where stands the brand and the total solds? My idea is:
Select table1.brand, sum(table2.sold) from table1
join table2
on (table1.brand LIKE '%table2.product%')
group by table.1.brand
That was my idea, but i always get an Error
The biggest problem is the Like-Operator or is there any other solution?
I see two issues: First of all, JOINs in hive only work with equality conditions, that like isn't going to work there.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.
Instead, that wants to go into a where clause.
Secondly, I also see a problem with the like statement itself: '%table2.product%' is being interpreted as literally the string '%table2.product%'. Additionally, even if this was doing what was intended, it would try to look for table2.product inside of brand, when you seem to want it the other way. To get the evaluation you intended, you need to add the wildcard to the contents of table1.brand; to accomplish this, you want to concatenate your wildcards into your expression.
table2.product LIKE concat('%',table1.brand,'%'))
By doing this, your like will evaluate for strings '%Sony%', '%Apple%'...etc instead of '%table2.product%'.
What you want is Brandon Bell's query, which I've merged into this answer:
SELECT table1.brand, SUM(table2.sold)
FROM table1, table2
WHERE table2.product LIKE concat('%', table1.brand, '%')
GROUP BY table1.brand;