LEFT OUTER JOIN with subquery syntax

verkter picture verkter · May 25, 2014 · Viewed 69.2k times · Source

I am learning SQL trough a GalaXQL tutorial.

I can't figure out the following question (Exercise 12):

Generate a list of stars with star ids below 100 with columns "starname", "startemp", "planetname", and "planettemp". The list should have all stars, with the unknown data filled out with NULL. These values are, as usual, fictional. Calculate the temperature for a star with ((class+7)*intensity)*1000000, and a planet's temperature is calculated from the star's temperature minus 50 times orbit distance.

What is the syntax to write a LEFT OUTER JOIN query when you have sub-query items "AS" that you need to join together?

Here is what I have:

SELECT stars.name AS starname, startemp, planets.name AS planetname, planettemp 
FROM stars, planets 
LEFT OUTER JOIN (SELECT ((stars.class + 7) * stars.intensity) * 1000000 AS startemp 
                 FROM stars) 
             ON stars.starid < 100 = planets.planetid 
LEFT OUTER JOIN (SELECT (startemp - 50 * planets.orbitdistance) AS planettemp 
                 FROM planets) 
             ON stars.starid < 100

Here is the database schema (sorry, cant post the image file due to low rep):

CREATE TABLE stars (starid INTEGER PRIMARY KEY,
                    name TEXT,
                    x DOUBLE NOT NULL,
                    y DOUBLE NOT NULL,
                    z DOUBLE NOT NULL,
                    class INTEGER NOT NULL,
                    intensity DOUBLE NOT NULL);

CREATE TABLE hilight (starid INTEGER UNIQUE);

CREATE TABLE planets (planetid INTEGER PRIMARY KEY,
                      starid INTEGER NOT NULL,
                      orbitdistance DOUBLE NOT NULL,
                      name TEXT,
                      color INTEGER NOT NULL,
                      radius DOUBLE NOT NULL);

CREATE TABLE moons (moonid INTEGER PRIMARY KEY,
                    planetid INTEGER NOT NULL,
                    orbitdistance DOUBLE NOT NULL,
                    name TEXT,
                    color INTEGER NOT NULL,
                    radius DOUBLE NOT NULL);

CREATE INDEX planets_starid ON planets (starid);
CREATE INDEX moons_planetid ON moons (planetid);

Answer

Clockwork-Muse picture Clockwork-Muse · May 25, 2014

Lets build this up slowly.

First, lets see about getting just the information about stars:

SELECT name AS starName, (class + 7) * intensity * 1000000 AS starTemp 
FROM Stars
WHERE starId < 100

(this should look might familiar!)
We get a list of all stars whose starId is less than 100 (the WHERE clause), grabbing the name and calculating temperature. At this point, we don't need a disambiguating reference to source.

Next, we need to add planet information. What about an INNER JOIN (note that the actual keyword INNER is optional)?

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName
FROM Stars
INNER JOIN Planets
        ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

The ON clause is using an = (equals) condition to link planets to the star they orbit; otherwise, we'd be saying they were orbiting more than one star, which is very unusual! Each star is listed once for every planet it has, but that's expected.

...Except now we have a problem: Some of our stars from the first query disappeared! The (INNER) JOIN is causing only stars with at least one planet to be reported. But we still need to report stars without any planets! So what about a LEFT (OUTER) JOIN?

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

... And we have all the stars back, with planetName being null (and only appearing once) if there are no planets for that star. Good so far!

Now we need to add the planet temperature. Should be simple:

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName, starTemp - (50 * Planets.orbitDistance) as planetTemp
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

...except that on most RDBMSs, you'll get a syntax error stating the system can't find starTemp. What's going on? The problem is that the new column alias (name) isn't (usually) available until after the SELECT part of the statement runs. Which means we need to put in the calculation again:

SELECT Stars.name as starName, (Stars.class + 7) * Stars.intensity * 1000000 AS starTemp,
       Planets.name as planetName, 
       ((Stars.class + 7) * Stars.intensity * 1000000) - (50 * Planets.orbitDistance) as planetTemp
FROM Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId
WHERE Stars.starId < 100

(note that the db may actually be smart enough to perform the starTemp calculation only once per-line, but when writing you have to mention it twice in this context).
Well, that's slightly messy, but it works. Hopefully, you'll remember to change both references if that's necessary...

Thankfully, we can move the Stars portion of this into a subquery. We'll only have to list the calculation for starTemp once!

SELECT Stars.starName, Stars.starTemp,
       Planets.name as planetName, 
       Stars.starTemp - (50 * Planets.orbitDistance) as planetTemp
FROM (SELECT starId, name AS starName, (class + 7) * intensity * 1000000 AS starTemp 
      FROM Stars
      WHERE starId < 100) Stars
LEFT JOIN Planets
       ON Planets.starId = Stars.starId

Yeah, that looks like how I'd write it. Should work on essentially any RDBMS.

Note that the parenthesis in Stars.starTemp - (50 * Planets.orbitDistance) is only there for clarity for the reader, the meaning of the math would remain unchanged if they were removed. Regardless of how well you know operator-precedence rules, always put in parenthesis when mixing operations. This becomes especially beneficial when dealing with ORs and ANDs in JOIN and WHERE conditions - many people lose track of what's going to be effected.
Also note that the implicit-join syntax (the comma-separated FROM clause) is considered bad practice in general, or outright deprecated on some platforms (queries will still run, but the db may scold you). It also makes certain things - like LEFT JOINs - difficult to do, and increases the possibility of accidently sabotaging yourself. So please, avoid it.