How do you model custom attributes of entities?

Michal picture Michal · May 19, 2009 · Viewed 8.3k times · Source

Let's say we're having an application which should be able to store all kind of products. Each product has at least an ID and a Name but all other attributes can be defined by the user himself.

  1. E.g. He could create a productgroup Ipods which would contain attributes capacity and generation
  2. E.g. He could create a productgroup TShirts with the attributes size and color
  3. We need to store the definition of a product and the concrete products itself.
  4. We want to ensure that it is easily possible to aggregate (GROUP BY) by product attributes. E.g. select the total sum of capacity for each generation of ipods
  5. The solution must not require schema changes (added requirement due to input from Bill Karwin - see his answer as well!)

How would you model your schema in respect to the above requirements?

Note: Requirment 4. is important!

Thanks everyone for contributing and discussing the approach. I have seen some solutions to this problem in the past but none of them made grouping easy for me :(

Answer

Bill Karwin picture Bill Karwin · May 19, 2009

I'd recommend either the Concrete Table Inheritance or the Class Table Inheritance designs. Both designs satisfy all four of your criteria.

In Concrete Table Inheritance:

  1. Ipods are stored in table product_ipods with columns ID, Name, Capacity, Generation.
  2. Tshirts are stored in table product_tshirts with columns ID, Name, Size, Color.
  3. The definition of the concrete product types are in the metadata (table definitions) of product_ipods and product_tshirts.
  4. SELECT SUM(Capacity) FROM product_ipods GROUP BY Generation;

In Class Table Inheritance:

  1. Generic product attributes are stored in table Products with columns ID, Name.

    Ipods are stored in table product_ipods with columns product_id (foreign key to Products.ID), Capacity, Generation.

  2. Tshirts are stored in table product_tshirts with columns product_id (foreign key to Products.ID), Size, Color.

  3. The definition of the concrete product types are in the metadata (table definitions) of products, product_ipods, and product_tshirts.

  4. SELECT SUM(Capacity) FROM product_ipods GROUP BY Generation;


See also my answer to "Product table, many kinds of product, each product has many parameters" where I describe several solutions for the type of problem you're describing. I also go into detail on exactly why EAV is a broken design.


Re comment from @dcolumbus:

With CTI, would each row of the product_ipods be a variation with it's own price?

I'd expect the price column to appear in the products table, if every type of product has a price. With CTI, the product type tables typically just have columns for attributes that pertain only to that type of product. Any attributes common to all product types get columns in the parent table.

Also, when storing order line items, would you then store the row from product_ipods as the line item?

In a line-items table, store the product id, which should be the same value in both the products table and the product_ipods table.


Re comments from @dcolumbus:

That seems so redundant to me ... in that scenario, I don't see the point of the sub-table. But even if the sub-table does make sense, what's the connecting id?

The point of the sub-table is to store columns that are not needed by all other product types.

The connecting id may be an auto-increment number. The sub-type table doesn't need to auto-increment its own id, because it can just use the value generated by the super-table.

CREATE TABLE products (
  product_id INT AUTO_INCREMENT PRIMARY KEY,
  sku VARCHAR(30) NOT NULL,
  name VARCHAR(100) NOT NULL,
  price NUMERIC(9,2) NOT NULL
);

CREATE TABLE product_ipods (
  product_id INT PRIMARY KEY,
  size TINYINT DEFAULT 16,
  color VARCHAR(10) DEFAULT 'silver',
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

INSERT INTO products (sku, name, price) VALUES ('IPODS1C1', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS1C2', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'black');
INSERT INTO products (sku, name, price) VALUES ('IPODS1C3', 'iPod Touch', 229.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 16, 'red');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C1', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C2', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'silver');
INSERT INTO products (sku, name, price) VALUES ('IPODS2C3', 'iPod Touch', 299.00);
INSERT INTO product_ipods VALUES (LAST_INSERT_ID(), 32, 'red');