Best algorithm to find the minimum absolute difference between two numbers in an array

SexyBeast picture SexyBeast · Sep 2, 2012 · Viewed 8.1k times · Source

There is an array which can contain, say, upto 1000 elements. The range of numbers it can spawn is say 1 to 10^10. Now I have to find the minimal absolute difference between two numbers in the array. I have thought of two algorithms:

For the first one, I have defined a binarysearch function which finds the position of a to-be-inserted number in a sorted array. Now I start the sorted array with only the first number of the given array and begin iterating on the given array from the second element onwards. For each number, I find its position in the sorted array. If the number at that position is this number, then the difference is 0, it is the lowest possible one, so I exit the loop. Else, I insert the number in the sorted array at that point, then check the difference between that number and the previous and next numbers in that array. Then I store the minimum of this result and the previous result, and continue in this fashion.

Second: I sort the array using quicksort. (The range is too large, so I think radix sort won't be that efficient). Then I iterate over it, breaking out with an answer of 0 if two consecutive numbers are equal, else storing the minimum of the difference between that number and the previous number and the previous result.

Which one will be more efficient?

Is there any better algo?

Stackoverflow has a number of posts in this regard, but they didn't help much. Here's my code in Perl:

sub position {
    my @list   = @{$_[0]};
    my $target = $_[1];

    my ($low,$high) = (0, (scalar @list)-1);

    while ($low <= $high) {
        $mid = int(($high + $low)/2);

        if ( $list[$mid] == $target ) {

            return $mid;
        }
        elsif ( $target < $list[$mid] ) {

            $high = $mid - 1; 
        }
        else {

            $low = $mid + 1;
        }
    }
    $low;
}
sub max { $_[0] > $_[1] ? $_[0] : $_[1]; }
sub min { $_[0] > $_[1] ? $_[1] : $_[0]; }

$ans        = 10_000_000_000;
@numbers    = (234, 56, 1, 34...123); #given array
($max,$min) = @num[0, 0];
@sorted     = ($numbers[0]);

for ( @num[1 .. $#num] ) {
    $pos = position(\@sorted, $_);

    if ( $sorted[$pos] == $_ ) { 

        $ans = 0;
        last;
    }
    splice @sorted, $pos, 0, $_;

    if ( $#sorted == $pos ) { 

        $ans = min($_-$sorted[-2], $ans);
    }
    elsif ( 0 == $pos ) {

        $ans = min($sorted[1]-$_, $ans);
    }
    else { 

        $ans = min(min(abs($sorted[$pos-1]-$_), abs($sorted[$pos+1]-$_)), $ans);
    }
    $max = max($_, $max);
    $min = min($_, $min);
}
print "$ans\n";

Answer

amit picture amit · Sep 2, 2012

You have up to 5k elements.

Note that a sandy bridge processor has 32KB L1-Cache, assuming 4 bytes integer - it means it can contain 8192 integers.

I'd try to avoid as much as possible creating additional data (except counters and such), and do everything in place using the same array. This will make the number of cache-misses very small, and will probably outpeform any algorithm.

Thus, an in-place quicksort and than iterating over the elements in the array will probably be better then any other solution, both for being cache-efficient, while still keeping decent asymptotical complexity of O(nlogn).

Note - Although this solution will probably be more efficient (at least theoretically), the scale is still very small - and unless you are going to do this oporation a lot of times - it just doesn't worth your time over-optimizing it.


General tip: when talking about small scale problems (and up to 5000 elements fits this criteria) the big-O notation is usually not enough. The cache performance is usually the dominant factor in these problems.