How are super- and subtype relationships in ER diagrams represented as tables?

xingyu picture xingyu · Aug 20, 2012 · Viewed 17.1k times · Source

I am learning how to interpret Entity Relationship Diagrams into SQL DDL statements and I am confused by differences in notation. Consider a disjoint relationship as in the following diagram:

"Vehicle" box connects to "IsA" triangle, noted as "disjoint," which connects separately to "2WD" box and "4WD" box.

Would this be represented as:

  1. Vehicle, 2WD and 4WD tables (2WD and 4WD would point to the PK of Vehicle); or
  2. ONLY the 2WD and 4WD tables (and NO Vehicle table), both of which would duplicate whatever attributes Vehicle would have had?

I think these are other ways of writing the relationship:

"Vehicle" box connects with a thick line to "IsA" triangle, which connects with thin lines separately to "2WD" box and "4WD" box. "Vehicle" box connects to "IsA" triangle, which connects separately to "2WD" box and "4WD" box, all by thin lines.

I'm looking for a clear explanation of the difference in regard to what tables you'd end up with for each diagram.

Answer

Branko Dimitrijevic picture Branko Dimitrijevic · Aug 20, 2012

ER Notation

There are several ER notations. I'm not familiar with the one you are using, but it's clear enough you are trying to represent a subtype (aka. inheritance, category, subclass, generalization hierarchy...). This is the relational cousin of the OOP inheritance.

When doing subtyping, you are generally concerned with the following design decisions:

  • Abstract vs. concrete: Can the parent be instantiated? In your example: can a Vehicle exist without also being 2WD or 4WD?1
  • Inclusive vs. exclusive: Can more than one child be instantiated for the same parent? In your example, can Vehicle be both 2WD and 4WD?2
  • Complete vs. incomplete: Do you expect more children to be added in the future? In your example, do you expect a Bike or a Plane (etc...) could be later added to the database model?

The Information Engineering notation differentiates between inclusive and exclusive subtype relationship. IDEF1X notation, on the other hand, doesn't (directly) recognize this difference, but it does differentiate between complete and incomplete subtype (which IE doesn't).

The following diagram from the ERwin Methods Guide (Chapter 5, Subtype Relationships) illustrates the difference:

enter image description here

Neither IE nor IDEF1X directly allow specifying abstract vs. concrete parent.

Physical Representation

Unfortunately, practical databases don't directly support inheritance, so you'll need to transform this diagram to real tables. There are generally 3 approaches for doing so:

  1. Put all classes in the same table and leave child fields NULL-able. You can then have a CHECK to make sure the right subset of the fields in non-NULL.
    • Pros: No JOINing, so some queries can benefit. Can enforce parent-level keys (e.g. if you want to avoid different 2WD and 4WD vehicles having the same ID). Can easily enforce inclusive vs. exclusive children and abstract vs. concrete parent (by just varying the CHECK).
    • Cons: Some queries can be slower since they must filter-out "uninteresting" children. Depending on your DBMS, child-specific constraints can be problematic. A lot of NULLs can waste storage. Less suitable for incomplete subtyping - adding new child requires altering the existing table, which can be problematic in a production environment.
  2. Put all children in separate tables, but don't have a table for the parent (instead, repeat parent's fields and constraints in all children). Has most of the the characteristics of (3) while avoiding JOINs, at the price of lower maintainability (due to all these field and constraint repetitions) and inability to enforce parent-level keys or represent a concrete parent.
  3. Put both parent and children in separate tables.
    • Pros: Clean. No fields/constraints need to be artificially repeated. Enforces parent-level keys and easy to add child-specific constraints. Suitable for incomplete subtyping (relatively easy to add more child tables). Certain queries can benefit by only looking at "interesting" child table(s).
    • Cons: Some queries can be JOIN-heavy. Can be hard to enforce inclusive vs. exclusive children and abstract vs. concrete parent (these can be enforced declaratively if the DBMS supports circular and deferred foreign keys, but enforcing them at the application level is usually considered a lesser evil).

As you can see, the situation is less than ideal - you'll need to make compromises whatever approach you choose. The approach (3) should probably be your starting point, and only choose one of the alternatives if there is a compelling reason to do so.


1 I'm guessing this is what thickness of the line stands for in your diagrams.

2 I'm guessing this is what presence or absence of "disjoint" stands for in your diagrams.