(λblog. blog blog) (λblog. blog blog)

diverging since 2008

Fast concatenation

2008-06-23 14:23

A handy (and quite well-known) trick in haskell is to turn list concatenation (a O(n) operation) into function composition that avoids copying and is O(1):

> let list1 = [1,2,3] in list1 ++ list1
> let list2 = (1:).(2:).(3:) in list2 . list2 $ []  -- ta-dah! no copying

I benchmarked this ages ago when I first heard about the trick, but morrow publishing a a library that uses this made me rerun and publish my observations. Here's the test code, the classical quicksort:

> module Main
>     where
>     
> import Random
> import Control.Monad
> import Data.List
> 
> qsort1 :: [Int] -> [Int]
> qsort1 [] = []
> qsort1 (x:xs) = qsort1 [a | a<-xs, a<x] ++ [x] ++ qsort1 [a | a<-xs, a>=x]
> 
> qsort2 :: [Int] -> [Int]
> qsort2 xs = qsort xs []
>     where qsort [] = id
>           qsort (x:xs) = qsort [a | a<-xs, a<x] . (x:) . qsort [a | a<-xs, a>=x]
>           
> main = do
>   l <- replicateM 500000 $ randomRIO (1,500000)
>   let l1 = qsort1 l
>   let l2 = qsort2 l
>   print (and $ zipWith (==) l1 l2)

Results with ghc -O0:

% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000  13.09s user 0.28s system 99% cpu 13.420 total
True
./qsort +RTS -P -K12000000  12.81s user 0.28s system 99% cpu 13.135 total
True
./qsort +RTS -P -K12000000  13.38s user 0.31s system 99% cpu 13.709 total

COST CENTRE                    MODULE               %time %alloc  ticks     bytes
qsort1                         Main                  35.4   46.2    109 143631183
qsort2                         Main                  32.5   34.3    100 106768654
main                           Main                  26.9   17.2     83  53500026
CAF                            Main                   5.2    2.3     16   7001237

and with ghc -O3:

% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000  9.40s user 0.30s system 99% cpu 9.720 total
True
./qsort +RTS -P -K12000000  10.58s user 0.40s system 99% cpu 11.021 total
True
./qsort +RTS -P -K12000000  10.10s user 0.36s system 99% cpu 10.501 total

COST CENTRE                    MODULE               %time %alloc  ticks     bytes
qsort1                         Main                  35.1   46.8     74 127559904
main                           Main                  33.6   19.1     71  52001182
qsort2                         Main                  31.3   34.1     66  93115220

So it seems the compositive version is a tad faster and notably less memory-hungry. Also, ghc optimizes the compositive version more efficiently, presumably because (.) enables all sorts of rewrites (++) doesn't.

As a sidenote, strangely enough replacing the list comprehensions in the above code with something like

> let (small,big) = partition (<x) xs in qsort small ++ [x] ++ qsort big

ended up in a performance decrease of some 3s with -O3 (didn't test w/o optimizations)...

Update: As folks on #haskell pointed out I should mention that this isn't a real quicksort. Quicksort refers to the exact in-place partitioning algorithm that the imperative version uses. This is of course immaterial to the benchmark.

Update 15:01: quicksilver noted that the profiling overheads for the two implementations might be different. I ran a quick test that showed the overheads to be roughly equal: unoptimized qsort2 was about 3% faster than unoptimized qsort1 both with and without profiling.