I'm changing some tables to store IP addresses as numbers rather than strings. This is simple with IPv4 where the 32 bit address can fit into an integer column. However, an IPv6 address is 128 bits.
The MySQL documentation only shows numeric types up to 64 bits ("bigint").
Should I stick with char/varchar for IPv6? (Ideally I'd like to use the same column for IPv4 and IPv6, so I'd prefer not to do this).
Is there anything better than using two bigint columns? I would prefer not to have to break the value into upper and lower /64 whenever using the address.
I'm using MariaDB 5.1 - if there's a better solution in a later version of MySQL then that would be nice to know, although not helpfully immediately.
[EDIT] Note that I'm after a recommendation for the best way to do this - it's obvious that there are various ways of doing this (including the existing string representation), but which is (in terms of performance) best? (i.e. if someone has done the analysis already, that would save me doing it, or if I'm missing something obvious, that would be great to know too).
I found myself asking this question and from all the posts I read never found any performance comparisons. So here's my attempt.
I've created the following tables, populated with 2,000,000 random ip address from 100 random networks.
CREATE TABLE ipv6_address_binary (
id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
addr BINARY(16) NOT NULL UNIQUE
);
CREATE TABLE ipv6_address_twobigints (
id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
haddr BIGINT UNSIGNED NOT NULL,
laddr BIGINT UNSIGNED NOT NULL,
UNIQUE uidx (haddr, laddr)
);
CREATE TABLE ipv6_address_decimal (
id SERIAL NOT NULL AUTO_INCREMENT PRIMARY KEY,
addr DECIMAL(39,0) NOT NULL UNIQUE
);
Then I SELECT all ip addresses for each network and record the response time. Average response time on the twobigints table is about 1 second while on the binary table it is about one-hundredth of a second.
Here are the queries.
Note:
X_[HIGH/LOW] is the most/least significant 64-bits of X
when NETMASK_LOW is 0 the AND condition is omitted as it always yields true. doesn't affect performance very much.
SELECT COUNT(*) FROM ipv6_address_twobigints
WHERE haddr & NETMASK_HIGH = NETWORK_HIGH
AND laddr & NETMASK_LOW = NETWORK_LOW
SELECT COUNT(*) FROM ipv6_address_binary
WHERE addr >= NETWORK
AND addr <= BROADCAST
SELECT COUNT(*) FROM ipv6_address_decimal
WHERE addr >= NETWORK
AND addr <= BROADCAST
Average response times:
Graph:
BINARY_InnoDB 0.0119529819489
BINARY_MyISAM 0.0139244818687
DECIMAL_InnoDB 0.017379629612
DECIMAL_MyISAM 0.0179929423332
BIGINT_InnoDB 0.782350552082
BIGINT_MyISAM 1.07809265852