Merge sort in Haskell

Giorgio picture Giorgio · Aug 1, 2009 · Viewed 7.1k times · Source

I am new to Haskell and I am trying to implement a few known algorithms in it.

I have implemented merge sort on strings. I am a bit disappointed with the performance of my Haskell implementation compared to C and Java implementations. On my machine (Ubuntu Linux, 1.8 GHz), C (gcc 4.3.3) sorts 1 000 000 strings in 1.85 s, Java (Java SE 1.6.0_14) in 3.68 s, Haskell (GHC 6.8.2) in 25.89 s. With larger input (10 000 000 strings), C takes 21.81 s, Java takes 59.68 s, Haskell starts swapping and I preferred to stop the program after several minutes.

Since I am new to Haskell, I would be interested to know if my implementation can be made more time / space efficient.

Thank you in advance for any hint Giorgio

My implementation:

merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) = if x < y
                        then x : (merge xs (y:ys))
                        else y : (merge (x:xs) ys)

mergeSort :: [String] -> [String]
mergeSort xs = if (l < 2)
                 then xs
                 else merge h t
               where l = length xs
                     n = l `div` 2
                     s = splitAt n xs
                     h = mergeSort (fst s)
                     t = mergeSort (snd s)

Answer

Hynek -Pichi- Vychodil picture Hynek -Pichi- Vychodil · Aug 1, 2009

Try this version:

mergesort :: [String] -> [String]
mergesort = mergesort' . map wrap

mergesort' :: [[String]] -> [String]
mergesort' [] = []
mergesort' [xs] = xs
mergesort' xss = mergesort' (merge_pairs xss)

merge_pairs :: [[String]] -> [[String]]
merge_pairs [] = []
merge_pairs [xs] = [xs]
merge_pairs (xs:ys:xss) = merge xs ys : merge_pairs xss

merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
 = if x > y
        then y : merge (x:xs)  ys
        else x : merge  xs    (y:ys)

wrap :: String -> [String]
wrap x = [x]
  1. Bad idea is splitting list first. Instead of it just make list of one member lists. Haskell is lazy, it will be done in right time.
  2. Then merge pairs of lists until you have only one list.

Edit: Someone who down-vote this answer: above merge sort implementation is same algorithm as used in ghc Data.List.sort except with cmp function removed. Well ghc authors are may be wrong :-/