Complete newbie to mySQL. So any help will be appreciated.
I have 3 tables -- carts
, users
, actions
.
carts:
+------------+-------------+-------+
| cartId | session_id | userId|
+------------+-------------+-------+
users:
+----------+-------------+
| usedId | email |
+----------+-------------+
actions:
+-------------+------------------+---- ---------+
| session_id | impressionAction | impressionId |
+-------------+------------------+-----+--------+
In carts
, there is one session_id
per line.
In users
, there is one userId
per line.
In actions
, there are multiple lines per session_id
counting for all the actions for that session.
I would like to JOIN
the three tables getting the output to be something like
+------+-------------+--------+------------------+--------------+-------+
userId | session_id | cartId | impressionAction | impressionId | email |
+------+-------------+--------+------------------+--------------+-------+
Where there will be multiple lines per userId
and session_id
; essentially a flattened file. I think if we JOIN
carts
and users
on userId
resulting in say A
and then JOIN
A
and actions' on
session_id`, we are home.
A sample expected output is:
+------------+-------------+--------+------------------+--------------+---------+
userId | session_id | cartId | impressionAction | impressionId | email |
+------------+-------------+--------+------------------+--------------+---------+
| 1234 | abc3f45 | 0001 | LOGIN | 2032 |[email protected]|
| 1234 | abc3f45 | 0001 | ADD | 4372 |[email protected]|
| 1234 | abc3f45 | 0001 | ADD | 4372 |[email protected]|
| 1234 | abc3f45 | 0001 | SENDMAIL | [email protected] |[email protected]|
| 4567 | def4rg4 | 0002 | LOGIN | 2032 |[email protected]|
| 4567 | def4rg4 | 0002 | ADD | 4372 |[email protected]|
| 4567 | def4rg4 | 0002 | REMOVE | 3210 |[email protected]|
+------------+-------------+--------+------------------+--------------+---------+**
I don't know how to JOIN 3 tables without one common key. I don't even know what type of join it is called.
Essentially, we are trying to join 3 tables with non-overlapping keys, gathering one common key through the first JOIN
and then joining the intermediate with the third one. Is this called a CROSS JOIN
? If no, is there a name?
Taken from your comment above
A USER may select many products, add them to their CART; a single USER may have multiple CARTS and at the end of the event, they can EMAIL the cart to themselves; the ACTIONS of the user are stored in the actions table
This is how I see your the structure (having in mind your data)
+---------------------+ +---------------------+ +---------------------+
| users | | carts | | actions |
+---------------------+ +---------------------+ +---------------------+
| user_id [PK] |--| | cart_id [PK] | | impression_id [PK] |
| email | |--| user_id [FK] | | action_name |
| | | product_id [FK] | |--| session_id [FK]* |
+---------------------+ | session_id [FK]* |--| | |
| | +---------------------+
+---------------------+
As you can see below, I'm joining first with carts and them with actions because only the table carts has both, user and session data.
The [FK]*
next to the session_id on carts and actions could seem as a foreign key but in this case it's not - 'cause there's no separate table for sessions where it would be placed as an PK (primary key)
You asked about join
- it is the same as inner join
. INNER JOIN creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate.
This is a possible content of the tables
+------------------------+
| users |
+------------------------+
| id | email |
+------+-----------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
+------+-----------------+
+------------------------------------------+
| carts |
+------------------------------------------+
| id | user_id | product_id | session_id |
+------+---------+------------+------------+
| 1 | 1 | 5 | 1aaaa |
| 2 | 2 | 5 | 2ffff |
| 3 | 3 | 8 | 3ddddd |
| 4 | 1 | 5 | 1aaaaa |
| 5 | 3 | 9 | 3bbbbb |
| 6 | 1 | 6 | 1ccccc |
+------+---------+------------+------------+
+-------------------------------+
| actions |
+-------------------------------+
| id | name | session_id |
+------+-----------+------------+
| 1 | ADD | 1aaaa |
| 2 | ADD | 2ffff |
| 3 | SENDMAIL | 3ddddd |
| 4 | ADD | 3ddddd |
| 5 | SENDMAIL | 2ffff |
| 6 | ADD | 1aaaaa |
| 7 | REMOVE | 3ddddd |
| 8 | ADD | 1ccccc |
| 9 | ADD | 3bbbbb |
| 10 | SENDMAIL | 3bbbbb |
+------+-----------+------------+
As you can see, there's six products in the table carts and exactly six add
actions in the table actions. Furthermore, as you can see user with an id=1 bought three products but not at the same time, since there are two sessions; user with an id=3 as well, bought these two products in different times etc...
The SQL statement
SELECT u.user_id, c.session_id, c.cart_id, a.impression_id, a.action_name, u.email
FROM users AS u
INNER JOIN carts AS c ON c.user_id = u.user_id
INNER JOIN actions AS a ON a.session_id = c.session_id
ORDER BY u.user_id, c.session_id, c.cart_id
Results:
+---------+------------+---------+---------------+-------------+-----------------+
| user_id | session_id | cart_id | impression_id | action_name | email |
+---------+------------+---------+---------------+-------------+-----------------+
| 1 | 1aaaa | 1 | 1 | ADD | [email protected] |
| 1 | 1aaaa | 1 | 6 | ADD | [email protected] |
| 1 | 1aaaa | 4 | 1 | ADD | [email protected] |
| 1 | 1aaaa | 4 | 6 | ADD | [email protected] |
| 1 | 1cccc | 6 | 8 | ADD | [email protected] |
| 2 | 2ffff | 2 | 5 | SENDMAIL | [email protected] |
| 2 | 2ffff | 2 | 2 | ADD | [email protected] |
| 3 | 3bbbb | 5 | 9 | ADD | [email protected] |
| 3 | 3bbbb | 5 | 10 | SENDMAIL | [email protected] |
| 3 | 3dddd | 3 | 3 | SENDMAIL | [email protected] |
| 3 | 3dddd | 3 | 4 | ADD | [email protected] |
| 3 | 3dddd | 3 | 7 | REMOVE | [email protected] |
+---------+------------+---------+---------------+-------------+-----------------+
Note
: There's no guarantee for session uniqueness.
(Updated) Working SQL Fiddle
UPDATE: (Finding and deleting duplicates)
I've updated the SQL Fiddle in order to simulate duplicate records (when user added the same product within the same session). With the following statement you'll be able to retrieve those duplicated rows.
SELECT c.card_id, c.user_id, c.product_id, c.session_id, a.action_name, a.impression_id
FROM cards As c
INNER JOIN actions AS a ON a.session_id = c.session_id
GROUP BY c.user_id, c.product_id, c.session_id, a.action_name
HAVING count(*) > 1
Results:
+---------+------------+------------+------------+-------------+-----------------+
| card_id | user_id | product_id | session_id | action_name | impression_id |
+---------+------------+------------+------------+-------------+-----------------+
| 1 | 1 | 5 | 1aaaa | ADD | 1 |
| 6 | 1 | 6 | 1cccc | ADD | 8 |
+---------+------------+------------+------------+-------------+-----------------+
In the SELECT part of the statement above you may omit everything except card_id and impression_id. Deleting these two duplicates in one statement is a bit tricky since you can't modify the same table selected in a sub-query within the same query. I would avoid the tricky part in this case (which involves another inner sub-query) and would delete duplicates using separate statements as following
-- delete duplicates from cards
--
DELETE FROM WHERE card_id IN (1,6)
-- delete duplicates from actions
--
DELETE FROM WHERE card_id IN (1,8)
Even better, you could check if the user already has been added a selected product and don't add it twice.