Relational Algebra - Cartesian Product vs Natural Join?

Myles Gray picture Myles Gray · Jan 2, 2013 · Viewed 20.3k times · Source

I am studying for exams and am failing to find a solid criteria by which I can determine if the Cartesian Product x is to be used or if Natural Join |X| is to be used.

I had come up with a rough guide that:

"If you need to project an attribute that has the same name as an attribute in the table to be joined you must use x and state the table names to be projected: tableA.colname1 = tableB.colname1"

This however doesn't follow some of the solutions in my notes and my lecturer seems to use x with the above convention or |x| interchangeably.

Does anyone have a rule that can be followed to define use of one over the other?


Take for example this schema (only schema related to the question quoted for brevity):

takes(ID, course_id, sec_id, semester, year, grade)
student(ID, name, dept_name, tot_cred)

Q) Find the name of all students who took courses that were taught in either Spring 2011 or Autumn 2011.

My answer attempt:

π name(σ semester="Spring" ^ year=2011(takes ⋈ student)) ∪ π name(σ semester="Autumn" ^ year=2011(takes ⋈ student))

Actual answer:

π name(σ semester="Spring" ^ year=2011 ^ takes.ID=student.ID(takes x student)) ∪ π name(σ semester="Autumn" ^ year=2011 ^ takes.ID=student.ID(takes x student))

Can anyone provide a reason as to why?

In my mind the Natural Join would take care of the takes.ID=student.ID?

Answer

user824425 picture user824425 · Jan 2, 2013

A natural join, as I understand it, is a projected, filtered Cartesian product:

  • You take the Cartesian product, then
  • select it, so that the values in columns of the same name have the same value, and
  • project it, so that all columns have distinct names.

Under this assumption, your answer is isomorphic to the actual answer.

To see this, you might want to expand the natural join to the above sequence of operators, and float them around using the laws of relational algebra. You'll see that the projection disappears due to the projection to name, and the selection criterion is fused with the selection above. You'll end up with exactly the same tree as the actual answer, even though you never changed the meaning of your own answer!


I can think of one reason why your lecturer uses these concepts interchangeably: your lecturer wants you to understand that these concepts can be used interchangeably, because "the natural join is just a shortcut" (though that's debatable).