Fast concatenation
A handy (and quite well-known) trick in haskell is to turn list concatenation (a O(n) operation) into function composition that avoids copying and is O(1):
> let list1 = [1,2,3] in list1 ++ list1 > let list2 = (1:).(2:).(3:) in list2 . list2 $ [] -- ta-dah! no copying
I benchmarked this ages ago when I first heard about the trick, but
morrow
publishing a a
library that
uses this made me rerun and publish my observations. Here's the test
code, the classical quicksort:
> module Main > where > > import Random > import Control.Monad > import Data.List > > qsort1 :: [Int] -> [Int] > qsort1 [] = [] > qsort1 (x:xs) = qsort1 [a | a<-xs, a<x] ++ [x] ++ qsort1 [a | a<-xs, a>=x] > > qsort2 :: [Int] -> [Int] > qsort2 xs = qsort xs [] > where qsort [] = id > qsort (x:xs) = qsort [a | a<-xs, a<x] . (x:) . qsort [a | a<-xs, a>=x] > > main = do > l <- replicateM 500000 $ randomRIO (1,500000) > let l1 = qsort1 l > let l2 = qsort2 l > print (and $ zipWith (==) l1 l2)
Results with ghc -O0
:
% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000 13.09s user 0.28s system 99% cpu 13.420 total
True
./qsort +RTS -P -K12000000 12.81s user 0.28s system 99% cpu 13.135 total
True
./qsort +RTS -P -K12000000 13.38s user 0.31s system 99% cpu 13.709 total
COST CENTRE MODULE %time %alloc ticks bytes
qsort1 Main 35.4 46.2 109 143631183
qsort2 Main 32.5 34.3 100 106768654
main Main 26.9 17.2 83 53500026
CAF Main 5.2 2.3 16 7001237
and with ghc -O3
:
% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000 9.40s user 0.30s system 99% cpu 9.720 total
True
./qsort +RTS -P -K12000000 10.58s user 0.40s system 99% cpu 11.021 total
True
./qsort +RTS -P -K12000000 10.10s user 0.36s system 99% cpu 10.501 total
COST CENTRE MODULE %time %alloc ticks bytes
qsort1 Main 35.1 46.8 74 127559904
main Main 33.6 19.1 71 52001182
qsort2 Main 31.3 34.1 66 93115220
So it seems the compositive version is a tad faster and notably less
memory-hungry. Also, ghc optimizes the compositive version more
efficiently, presumably because (.)
enables all sorts of rewrites
(++)
doesn't.
As a sidenote, strangely enough replacing the list comprehensions in the above code with something like
> let (small,big) = partition (<x) xs in qsort small ++ [x] ++ qsort big
ended up in a performance decrease of some 3s with -O3
(didn't test
w/o optimizations)...
Update: As folks on #haskell
pointed out I should mention that
this isn't a real quicksort. Quicksort refers to the exact in-place
partitioning algorithm that the imperative version uses. This is of
course immaterial to the benchmark.
Update 15:01: quicksilver
noted that the profiling overheads for
the two implementations might be different. I ran a quick test that
showed the overheads to be roughly equal: unoptimized qsort2
was
about 3% faster than unoptimized qsort1
both with and without
profiling.