Suppose you were setting up a database to store crash test data of various vehicles. You want to store data of crash tests for speedboats, cars, and go-karts.
You could create three separate tables: SpeedboatTests, CarTests, and GokartTests. But a lot of your columns are going to be the same in each table (for example, the employee id of the person who performed the test, the direction of the collision (front, side, rear), etc.). However, plenty of columns will be different, so you don't want to just put all of the test data in a single table because you'll have quite a few columns that will always be null for speedboats, quite a few that will always be null for cars, and quite a few that will always be null for go-karts.
Let's say you also want to store some information that isn't directly related to the tests (such as the employee id of the designer of the thing being tested). These columns don't seem right to put in a "Tests" table at all, especially because they'll be repeated for all tests on the same vehicle.
Let me illustrate one possible arrangement of tables, so you can see the questions involved.
Speedboats id | col_about_speedboats_but_not_tests1 | col_about_speedboats_but_not_tests2 Cars id | col_about_cars_but_not_tests1 | col_about_cars_but_not_tests2 Gokarts id | col_about_gokarts_but_not_tests1 | col_about_gokarts_but_not_tests2 Tests id | type | id_in_type | col_about_all_tests1 | col_about_all_tests2 (id_in_type will refer to the id column of one of the next three tables, depending on the value of type) SpeedboatTests id | speedboat_id | col_about_speedboat_tests1 | col_about_speedboat_tests2 CarTests id | car_id | col_about_car_tests1 | col_about_car_tests2 GokartTests id | gokart_id | col_about_gokart_tests1 | col_about_gokart_tests2
What is good/bad about this structure and what would be the preferred way of implementing something like this?
What if there's also some information that applies to all vehicles that you'd prefer to have in a Vehicles table? Would the CarTests table then look something like...
id | vehicle_id | ... With a Vehicles table like this: id | type | id_in_type (with id_in_type pointing to the id of either a speedboat, car, or go-kart)
This is just getting to be a royal mess it seems. How SHOULD something like this be set up?
The type
and id_in_type
design is called Polymorphic Associations. This design breaks rules of normalization in multiple ways. If nothing else, it should be a red flag that you can't declare a real foreign key constraint, because the id_in_type
may reference any of several tables.
Here's a better way of defining your tables:
Vehicles
to provide an abstract reference point for all vehicle sub-types and vehicle tests.Vehicles
.Tests
.Here's sample DDL:
CREATE TABLE Vehicles (
vehicle_id INT AUTO_INCREMENT PRIMARY KEY
);
CREATE TABLE Speedboats (
vehicle_id INT PRIMARY KEY,
col_about_speedboats_but_not_tests1 INT,
col_about_speedboats_but_not_tests2 INT,
FOREIGN KEY(vehicle_id) REFERENCES Vehicles(vehicle_id)
);
CREATE TABLE Cars (
vehicle_id INT PRIMARY KEY,
col_about_cars_but_not_tests1 INT,
col_about_cars_but_not_tests2 INT,
FOREIGN KEY(vehicle_id) REFERENCES Vehicles(vehicle_id)
);
CREATE TABLE Gokarts (
vehicle_id INT PRIMARY KEY,
col_about_gokarts_but_not_tests1 INT,
col_about_gokarts_but_not_tests2 INT,
FOREIGN KEY(vehicle_id) REFERENCES Vehicles(vehicle_id)
);
CREATE TABLE Tests (
test_id INT AUTO_INCREMENT PRIMARY KEY,
col_about_all_tests1 INT,
col_about_all_tests2 INT
);
CREATE TABLE SpeedboatTests (
test_id INT PRIMARY KEY,
vehicle_id INT NOT NULL,
col_about_speedboat_tests1 INT,
col_about_speedboat_tests2 INT,
FOREIGN KEY(test_id) REFERENCES Tests(test_id),
FOREIGN KEY(vehicle_id) REFERENCES Speedboats(vehicle_id)
);
CREATE TABLE CarTests (
test_id INT PRIMARY KEY,
vehicle_id INT NOT NULL,
col_about_car_tests1 INT,
col_about_car_tests2 INT,
FOREIGN KEY(test_id) REFERENCES Tests(test_id),
FOREIGN KEY(vehicle_id) REFERENCES Cars(vehicle_id)
);
CREATE TABLE GokartTests (
test_id INT PRIMARY KEY,
vehicle_id INT NOT NULL,
col_about_gokart_tests1 INT,
col_about_gokart_tests2 INT,
FOREIGN KEY(test_id) REFERENCES Tests(test_id),
FOREIGN KEY(vehicle_id) REFERENCES Gokarts(vehicle_id)
);
You could alternatively declare Tests.vehicle_id
which references Vehicles.vehicle_id
and get rid of the vehicle_id foreign keys in each test sub-type table, but that would permit anomalies, such as a speedboat test that references a gokart's id.