Clear explanation of the "theta join" in relational algebra?

LuxuryMode picture LuxuryMode · Sep 27, 2011 · Viewed 18k times · Source

I'm looking for a clear, basic explanation of the concept of theta join in relational algebra and perhaps an example (using SQL perhaps) to illustrate its usage.

If I understand it correctly, the theta join is a natural join with a condition added in. So, whereas the natural join enforces equality between attributes of the same name (and removes the duplicate?), the theta join does the same thing but adds in a condition. Do I have this right? Any clear explanation, in simple terms (for a non-mathmetician) would be greatly appreciated.

Also (sorry to just throw this in at the end, but its sort of related), could someone explain the importance or idea of cartesian product? I think I'm missing something with regard to the basic concept, because to me it just seems like a restating of a basic fact, i.e that a set of 13 X a set of 4 = 52...

Answer

onedaywhen picture onedaywhen · Sep 27, 2011

Leaving SQL aside for a moment...

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

Consider these examples (Tutorial D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

However,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.


SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).