Migrating MySQL UTF8 to UTF8MB4 problems and questions

Banshee picture Banshee · Mar 22, 2015 · Viewed 16.4k times · Source

Im trying to convert my UTF8 MySQL 5.5.30 database to UTF8MB4. I have looked at this article https://mathiasbynens.be/notes/mysql-utf8mb4 but have some questions.

I have done these

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

The last one was manually done with 62 tables, one of them gave me this warning

13:08:30 ALTER TABLE bradspelold.games CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci 101289 row(s) affected, 2 warning(s): 1071 Specified key was too long; max key length is 767 bytes 1071 Specified key was too long; max key length is 767 bytes Records: 101289 Duplicates: 0 Warnings: 2 3.016 sec

  1. Is this a problem? What could I do to fix it?

The next step is

ALTER TABLE table_name CHANGE column_name column_name
         VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Im not sure about the command, why is there 2 column_name?
  2. Should I do this only on the VARCHAR(191) columns? I dont think I have any of them?
  3. Do you know of any more artickels like this that explains more id detail why and how?

Edit :

Table Show games

CREATE  TABLE `games` (
        `id` int(10) unsigned NOT NULL DEFAULT \'0\',
        `name` varchar(255) NOT NULL,
        `description` mediumtext,
        `yearPublished` datetime NOT NULL,
        `minPlayers` int(10) unsigned NOT NULL,
        `maxPlayers` int(10) unsigned NOT NULL,
        `playingTime` varchar(127) NOT NULL,
        `grade` double NOT NULL DEFAULT \'0\',
        `updated` datetime NOT NULL,
        `forumParentId` int(10) unsigned DEFAULT \'0\',
        `lastVisited` datetime DEFAULT NULL,
        `inactivatedDate` datetime DEFAULT NULL,
        `bggGrade` double DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `inactivatedDate` (`inactivatedDate`),
        KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8'

Edit 2:

    'CREATE TABLE `forum_threads` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `createrId` int(10) unsigned DEFAULT NULL,
      `replys` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostUserId` int(10) unsigned DEFAULT NULL,
      `lastPostId` int(10) unsigned DEFAULT NULL,
      `forumId` int(10) unsigned DEFAULT NULL,
      `visits` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createrNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `solved` tinyint(1) NOT NULL DEFAULT ''0'',
      `locked` tinyint(1) NOT NULL DEFAULT ''0'',
      `lockedByUserId` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lockedDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredUserId` int(10) unsigned DEFAULT NULL,
      `glued` tinyint(1) NOT NULL DEFAULT ''0'',
      `pollId` int(10) unsigned DEFAULT NULL,
      `facebookPostId` bigint(20) DEFAULT NULL,
      `facebookImportedDate` datetime DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `FK_forum_threads_1` (`forumId`),
      KEY `FK_forum_threads_2` (`pollId`),
      KEY `createdDate` (`createdDate`),
      KEY `createrId` (`createrId`),
      KEY `lastPostCreated` (`lastPostCreated`),
      CONSTRAINT `FK_forum_threads_1` FOREIGN KEY (`forumId`) REFERENCES `forum` (`id`) ON DELETE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=4306 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

'CREATE TABLE `forum` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `threads` int(10) unsigned NOT NULL DEFAULT ''0'',
  `createrId` int(10) unsigned DEFAULT NULL,
  `lastPostUserId` int(10) unsigned DEFAULT NULL,
  `lastThreadId` int(10) unsigned DEFAULT NULL,
  `parentForumId` int(10) unsigned DEFAULT NULL,
  `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `lastThreadTitle` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `alteredUserId` int(10) unsigned DEFAULT NULL,
  `placeOrder` int(10) unsigned NOT NULL DEFAULT ''0'',
  `separator` tinyint(1) NOT NULL DEFAULT ''0'',
  `rightLevel` int(10) unsigned NOT NULL DEFAULT ''1'',
  `createChildForum` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  `createThreads` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  PRIMARY KEY (`id`),
  KEY `Index_1` (`id`,`parentForumId`)
) ENGINE=InnoDB AUTO_INCREMENT=375 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

Answer

Rick James picture Rick James · Mar 23, 2015
  1. There are limits on the size of an INDEX. You bumped into the limit because utf8mb4 needs up to 4 bytes per character, where as utf8 needs only 3. Meanwhile the INDEX size limit is in bytes.

The 'solution' is to decide what to do about the over-sized index. (more below)

2.

ALTER TABLE t CHANGE col col ...

is the same as the more logical

ALTER TABLE t MODIFY col ...

The former allows you to change the name of the column, hence two copies of the column name when you don't need to change the name.

  1. Quite likely you had VARCHAR(255) which takes 767 bytes in utf8 (3*255+2; the "2" is the size of the length field). The equivalent in the 4-byte utf8mb4 would be (191) (4*191+2=766; not room for more than 191).

  2. I have not seen an article about it. I suspect that what I just said is most of what needs to be said.

So...

Plan A: Do you have foo VARCHAR(255) and it was utf8? Is the data in it always (now and in the future) shorter than 191 characters? If so, then simply do the ALTER.

Plan B: If you need more than 191, do you really need the INDEX? DROP INDEX may be an alternative.

Plan C: Or, you could use a "prefix" index: INDEX(foo(191)), while leaving it VARCHAR(255). Usually "prefix" indexes are useless, but you might have a use case for which it works.

To discuss this further, please provide SHOW CREATE TABLE for the table in question, and discuss the meaning of that particular field and its INDEX.