Best practice for storing tags in a database?

John Dewans picture John Dewans · Aug 18, 2010 · Viewed 34k times · Source

I developed a site that uses tags (key words) in order to categorize photographs. Right now, what I have in my MySQL database is a table with the following structure:

image_id (int)
tag      (varchar(32))

Every time someone tags an image (if the tag is valid and has enough votes) it's added to the database. I think that this isn't the optimal way of doing things since now that I have 5000+ images with tags, the tags table has over 40000 entries. I fear that this will begin to affect performance (if it's not already affecting it).

I considered this other structure thinking that it'd be faster to fetch the tags associated to a particular image but then it looks horrible for when I want to get all the tags, or the most popular one for instance:

image_id (int)
tags     (text) //comma delimited list of tags for the image

Is there a correct way of doing this or are both ways more or less the same? Thoughts?

Answer

OMG Ponies picture OMG Ponies · Aug 18, 2010

Use a many-to-many table to link a TAG record to an IMAGE record:

IMAGE

DROP TABLE IF EXISTS `example`.`image`;
CREATE TABLE  `example`.`image` (
  `image_id` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`image_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

TAG

DROP TABLE IF EXISTS `example`.`tag`;
CREATE TABLE  `example`.`tag` (
 `tag_id` int(10) unsigned NOT NULL auto_increment,
 `description` varchar(45) NOT NULL default '',
 PRIMARY KEY  (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

IMAGE_TAG_MAP

DROP TABLE IF EXISTS `example`.`image_tag_map`;
CREATE TABLE  `example`.`image_tag_map` (
 `image_id` int(10) unsigned NOT NULL default '0',
 `tag_id` int(10) unsigned NOT NULL default '0',
 PRIMARY KEY  (`image_id`,`tag_id`),
 KEY `tag_fk` (`tag_id`),
 CONSTRAINT `image_fk` FOREIGN KEY (`image_id`) REFERENCES `image` (`image_id`),
 CONSTRAINT `tag_fk` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;