Base10 to base64 url shortening

Alberto Zaccagni picture Alberto Zaccagni · Jul 8, 2010 · Viewed 10.1k times · Source

I'm coding an url shortener function for a project in which I'm learning php, here is the code (btw I suppose that global here is not a good thing to do :P):

$alphabet = array(1 => "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                "0","1","2","3","4","5","6","7","8","9","_","-");

function shorten($id){
    global $alphabet;
    $shortenedId = "";
    while($id>0){
        $remainder = $id % 64;
        $id = $id / 64;     
        $shortenedId = $alphabet[$remainder].$shortenedId;
    }
    return $shortenedId;
}

The code is taken from this Wikipedia article and adapted to php. My problem is that when I pass a multiple of 64 to the function I get a wrong (for my purpose) result, for instance 128 returns b which is not correct, it should have been aaa, but that's too long for a 3-digit number.

Also I'm starting to think that there's something wrong in this code, if I pass 1'000'000'000'000 as $id I get nItOq... I feel it's wrong because a url shortening service like bit.ly returns a 6 number id if I use it, and I don't think that this algorithm is better than theirs.

So, two questions:

  • do you spot any bug in the above code?
  • how to manage 64-multiple ids? Do I have to just ignore them and pass to the next one?

Answer

nathan picture nathan · Jul 8, 2010

Just a couple of little tweaks needed, the main two were to make the the alphabet zero indexed rather than one-indexed, and to subtract the remainder from the id before dividing

function shorten($id)
{
    $alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';
    $shortenedId = '';
    while($id>0) {
        $remainder = $id % 64;
        $id = ($id-$remainder) / 64;     
        $shortenedId = $alphabet{$remainder} . $shortenedId;
    };
    return $shortenedId;
}

and here's a further modified version which... well I just like

function shorten($id, $alphabet='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-')
{
    $base = strlen($alphabet);
    $short = '';
    while($id) {
        $id = ($id-($r=$id%$base))/$base;     
        $short = $alphabet{$r} . $short;
    };
    return $short;
}

EDIT: sorted concatenation to be the same as the OPs