(λblog. blog blog) (λblog. blog blog)

diverging since 2008

Entries from June 2008

2008-06-11 12:00
No tags
2008-06-12 01:30

Testing HsColour. Here you go: fibonacci with iso-recursive types

> data Mu f = Mu {unMu :: f (Mu f)}
> type List a = Mu ((,) a)
> tl = snd . unMu
> hd = fst . unMu
> cns = (Mu .) . (,)
> ones = Mu (1,ones)
> fibs :: List Integer
> fibs = cns 1 . cns 1 $ go fibs (tl fibs)
>        where go (Mu (x,xs)) (Mu (y,ys)) = cns (x+y) $ go xs ys

(NB. planet readers, you can't see the colours)

2008-06-15 21:00

Irssi is a powerful but sometimes baroque irc client that seems to be the de facto standard nowadays. One of it's downsides is that the default theme, binds and settings are pretty awful.

By default, meta-0..9qwer..io (the number row and the qwerty top row) access the first 20 windows. However for most IRC junkies this won't do, they need some way to access window numbers, say, up to one hundred. Nike has one solution for this but this is what I prefer:

for i in $(seq 0 9); do for j in $(seq 0 9); do
echo "/bind meta-g-$i-$j change_window $i$j"; done; done

This shell snippet generates commands to bind the sequence alt+g x y to window 10x+y. This doesn't intrude on the default binds so I can still access the first 20 windows with just one modified keypress, but still keeps windows up to 100 nicely in reach.

To handle other shortcomings of irssi I recommend nospambot (sadly offline, it seems) for ignoring noisy bouncing, nickserv for authenticating with NickServ, act for removing noise from the activity. I've also designed two themes for irssi: trax (.theme file) (I know it's kinda tacky, I was young and 1337 back then) and simplicity (.theme file) (Works well on both black-on-white and white-on-black terminals).

Tags: conf.
2008-06-15 23:00

Prelude

I attended a compilers course this spring. The final excercise was to implement from scratch a compiler for what was basically a very restricted subset of C. The compiler was to generate LLVM assembly that could then be optimized and compiled into native binaries with the (very high quality) LLVM tools. I recommend having a look at the various LLVM benchmarks: it's going to be a major player in programming language implementation in the future.

All this however is but a backdrop to what I'm going to write about. I implemented the compiler in Haskell and bumped into very interesting design questions. First of all, I decided to implement the AST (abstract syntax tree, generated by the parser) as a generalized algebraic datatype (GADT) because I had been itching to find some real use for GADTs and ASTs were the canonical example of their use. The following is a condensed version of what the finished compiler uses (preceded by some boilerplate to make this post work as literate haskell):

> {-# LANGUAGE KindSignatures, EmptyDataDecls, MultiParamTypeClasses,
>              FlexibleInstances, GADTs #-}
>
> import Control.Monad
> import Control.Monad.Fix
> import Control.Monad.Error
>
> data Expr
> data Stmt
> data Location
>
> data NoAnn a = NoAnn
> type Identifier = String
> type Op = String
>
> data AST (ann :: * -> *) t where
>     If      :: ann Stmt -> AST ann Expr -> AST ann Stmt -> AST ann Stmt
>     Block   :: ann Stmt -> [AST ann Stmt] -> AST ann Stmt
>     Assign  :: ann Stmt -> AST ann Location -> AST ann Expr -> AST ann Stmt
>
>     Var     :: ann Location -> Identifier -> AST ann Location
>     Ref     :: ann Expr -> AST ann Location -> AST ann Expr
>
>     Binary  :: ann Expr -> Op -> AST ann Expr -> AST ann Expr -> AST ann Expr
>     NumLit  :: ann Expr -> Integer -> AST ann Expr

In addition to this there were declarations, array types and more control structures, but I'm trying to keep things simple. As you can see the GADT is parametrised over an annotation type ann and a phantom type that indicates the type of the node (statement, expression, location). The annotation type, however, is of kind *->*: it takes the node type as an argument. This is useful because different node types need to have different annotations, for example expressions have a type while statements do not. All this could also be formulated as vanilla Haskell types as follows (with a simple annotation type for simplicity):

> data Stmt' ann = If' ann (Expr' ann) (Stmt' ann)
>                | Block' ann [Stmt' ann]
>                | Assign' ann (Location' ann) (Expr' ann)
>
> data Location' ann = Var' ann Identifier
>
> data Expr' ann = Binary' ann Op (Expr' ann) (Expr' ann)
>                | Ref' ann (Location' ann)
>                | NumLit' ann Integer

However getting the different annotation types to flow as smoothly would be a bit bothersome. (The primes are added in order to keep this blog post a working .lhs file.)

Attribute grammars and first solution

I had decided to implement the compiler with attribute grammars, a very elegant and powerful formalism for tree traversals (see Why attribute grammars matter for an introduction). There exist attribute grammar systems that generate Haskell code from a description of the grammar (UUAG being the most noteworthy), but I decided to go on with a hand-coded approach. UUAG compiles the attribute grammar into a bunch of haskell functions that mutually recurse to calculate the attributes. This however mixes up all the attributes into one traversal and is very obfuscated. A more straightforward way would be to actually annotate the tree with attribute fields.

More specifically, syntetic attributes (ones which can be computed from a node's childs' attributes) would be stored in the node's annotation field while inherited attributes (ones which propagate from a node down to it's leaves) would be just passed down as arguments to the traversal function. The basic structure of the traversal would be something like

> class Traverse1 a inherited synthetic where
>     traverse1 :: AST NoAnn a -> inherited -> AST synthetic a
> -- and instances for Stmt, Expr and Location

This solution however has the problem that makes the traversal monolithic. I'd like to have the different phases of the compiler (e.g typecheck, code generation) define their own simple traversals and then simply compose these in the driver. One other solution I toyed with was:

> type M = Either String   -- for example, actually it was RWST on top of Error
>
> -- an action that we can mfix over (as long as M is an instance of MonadFix)
> type Result a = a -> M a 
>
> data AG a = AG { 
>       block   :: M [a Stmt]                     -> Result (a Stmt),
>       ifthen  :: M (a Expr) -> M (a Stmt)       -> Result (a Stmt),
>       assign  :: M (a Location) -> M (a Expr)   -> Result (a Stmt),
>       
>       var     :: Identifier                     -> Result (a Location),
> 
>       ref     :: M (a Location)                 -> Result (a Expr),
>       binary  :: Op -> M (a Expr) -> M (a Expr) -> Result (a Expr)
> }
>
> compose :: AG a -> AG a -> AG a
> compose (AG b' i' a' v' r' bin')
>         (AG b  i  a  v  r  bin ) =
>     AG (f1 b' b) (f2 i' i) (f2 a' a) (f1 v' v) (f1 r' r) (f3 bin' bin)
>     where f1 m n = \x -> m x >=> n x
>           f2 m n = \x y -> m x y >=> n x y
>           f3 m n = \x y z -> m x y z >=> n x y z
> 
> class TraverseFix attr t where
>     traverseFix :: AG attr -> AST NoAnn t -> M (attr t)
>
> -- All we do in these instances is define the recursion scheme
> instance TraverseFix attr Stmt where
>     traverseFix ag (Block _ ss)   = mfix $ (block ag) (mapM (traverseFix ag) ss)
>     traverseFix ag (If _ e s)     = mfix $
>         (ifthen  ag) (traverseFix ag e) (traverseFix ag s)
>     traverseFix ag (Assign _ l e) = mfix $
>         (assign  ag) (traverseFix ag l) (traverseFix ag e)
> instance TraverseFix attr Location where
>     traverseFix ag (Var _ i) = mfix $ (var ag) i
> instance TraverseFix attr Expr where
>     traverseFix ag (Ref _ l)          = mfix $ (ref ag) (traverseFix ag l)
>     traverseFix ag (Binary _ o e1 e2) = mfix $
>         (binary ag) o (traverseFix ag e1) (traverseFix ag e2)

This lets one compose multiple attribute grammars (AGs) and take their fixpoint over the AST: this makes the order of composition irrelevant and actually allows for more expressive AGs. Also, it evades actually storing the intermediate attributes in the tree quite elegantly. All this is very true to the spirit of attribute grammars but the requirement of knowing all attributes beforehand violates the modularity I needed.

Stand by for part 2...

2008-06-18 14:42

I saw Dante 01 with Kebbish yesterday. The film was mind-blowing, something in the vein of Riddick meeting Muad'dib under the gentle guidance of Kubrick. The script would've made a top-notch scifi novella on it's own but director Marc Caro pushed it even further. Kudos for this. Additionally, the multiple references to classical literature and the detached narration from Perséphone really made my day.

As a downside the narrative flow stumbles somewhat towards the end, resorting to a slight deus ex machina by a very flat character. Also, the film is divided into four segments, labeled "Cercle 1" to "Cercle 4", obviously alluding to Dante's Divine Comedy, but the Divine Comedy relates of nine circles of hell. This and the Space Odyssey 2001-esque ending might point towards truncating the script? My sentiment here is echoed by http://www.imdb.com/title/tt0487928/board/nest/94052306, though I don't agree with the writer's view of the ending. The film is a very compelling whole however, so the non-classical story arc might very well be intentional.

Spoiler warning!

The setting is space station Dante 01 in orbit around a fiery planet named Dante. The station is a mental asylum/prison/test center that houses criminals who have agreed to submit to medical experiments in order to evade a death sentence. The movie begins with a shuttle bringing a new doctor sent by the company running the station and a new inmate (the protagonist, soon christened Saint George by the other inmates).

The movie seems to be an allegory for a new scientific and social awakening of humankind. The medical company running the station injects an experimental DNA-altering nanovirus into the inmates and seemingly expects everybody to die (weapons research?). It also seems that the company is aware of the strange gift of healing that the protagonist has, and is seeking to commercially exploit it. As usual, the good prevail and the new doctor and an opportunistic inmate are the only ones to perish. Even the inmate who dies while trying to save the station from crashing into Dante is resuscitated by the protagonist. This lack of death nicely contrasts the harsh dystopian future and the grim visual language of the film.

In a psychedelic ending Saint George exits the station in a space suit and decomposes into a double helix (DNA anyone?) that rushes towards Dante. The planet turns green and Saint George has "beaten the dragon". Technology has been turned towards general progress instead of commercial gain. The collaboration of the crew and inmates points towards the need for a different kind of society, one that doesn't force human beings into cold robots with the hand of "established procedure".

Updated 15:50: minor changes, grammar
Updated 19:10: more on patchiness

Tags: culture.
2008-06-22 22:50

As some irc acquaintances of mine already know, choosing a blog platform turned out pretty difficult for me. Actually, let me retake that, finding a blog platform that didn't suck turned out a veritable odyssey. Now it's been about a week and one national holiday since my struggles so I can hopefully recount my tale w/o resorting to tuomov-style rhetorics.

My ideal platform would be a blog compiler with some cgi to handle comments. I want to keep my posts in a darcs repo and serve static html when it's possible. Markdown support and code syntax hilighting would be bonuses. I'd like to plug in HsColour. Doesn't sound so hard right?

When discussing my plans I received two platform suggestions: blosxom and chronicle. Originally I had nanoblogger in mind but it soon turned out it just wouldn't do. I set up chronicle but started experimenting with other engines. Here's the rundown:

Blosxom

features

  • cgi, implemented in perl (ports to various other scripting languages available)

problems

  • had to push data into blosxom's datadir with a custom makefile for two reasons:
    • wants control of the blog posts
    • no formatting support, posts would have to be in html
  • using markdown was a hassle because blosxom wants some unformatted information (title etc) in the beginning of a post file
  • wants publish date to be the same as file mtime (doesn't play well with version control...), can be fixed with a plugin
  • the comment plugin really is ugly
  • many of the plugins were dead or the authors use wordpress nowadays
  • uses home-grown template system and not, for instance, Text::Template

Chronicle

features

  • a blog compiler, comments available as a very hacky cgi add-on
  • supports markdown and html

problems

  • the version in debian seems has outdated documentation and seems not to really support the comment plugin
  • have to do syntax colouring with a makefile

Nanoblogger, bashblogger, vee

  • blog compilers
  • want control of the posts, have own version-control-systemish thing
  • formatting with a shell pipe mostly
  • comment support ugly or missing

Ikiwiki

features

  • a wiki compiler, blogging available with a plugin
  • wants posts in markdown
  • (IMO ugly) git support
  • nice commenting support

problems

  • overkill? a real mess to configure at least
  • plugins for mostly everything but not for just piping blocks through a second formatter...
  • I really don't like the way ikiwiki uses git: there's one central copy, and committing to it launches a hook that makes ikiwiki pull changes into its own working copy

Wordpress

(no, I didn't try it, I have enough experience with this monster already)

problems

  • de facto standard
  • ugly as hell, wants posts written with a web cli
  • wants sql db
  • implemented in php

So here I am, back using chronicle and not that pleased with it. Ikiwiki would be perfect for the job were it not for the need to write my own plugin (I already have) and some ideological problems. Maybe I'll get around to switching to it some time in the future.

Update 2008-06-23: language

Tags: conf.
2008-06-23 14:23

A handy (and quite well-known) trick in haskell is to turn list concatenation (a O(n) operation) into function composition that avoids copying and is O(1):

> let list1 = [1,2,3] in list1 ++ list1
> let list2 = (1:).(2:).(3:) in list2 . list2 $ []  -- ta-dah! no copying

I benchmarked this ages ago when I first heard about the trick, but morrow publishing a a library that uses this made me rerun and publish my observations. Here's the test code, the classical quicksort:

> module Main
>     where
>     
> import Random
> import Control.Monad
> import Data.List
> 
> qsort1 :: [Int] -> [Int]
> qsort1 [] = []
> qsort1 (x:xs) = qsort1 [a | a<-xs, a<x] ++ [x] ++ qsort1 [a | a<-xs, a>=x]
> 
> qsort2 :: [Int] -> [Int]
> qsort2 xs = qsort xs []
>     where qsort [] = id
>           qsort (x:xs) = qsort [a | a<-xs, a<x] . (x:) . qsort [a | a<-xs, a>=x]
>           
> main = do
>   l <- replicateM 500000 $ randomRIO (1,500000)
>   let l1 = qsort1 l
>   let l2 = qsort2 l
>   print (and $ zipWith (==) l1 l2)

Results with ghc -O0:

% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000  13.09s user 0.28s system 99% cpu 13.420 total
True
./qsort +RTS -P -K12000000  12.81s user 0.28s system 99% cpu 13.135 total
True
./qsort +RTS -P -K12000000  13.38s user 0.31s system 99% cpu 13.709 total

COST CENTRE                    MODULE               %time %alloc  ticks     bytes
qsort1                         Main                  35.4   46.2    109 143631183
qsort2                         Main                  32.5   34.3    100 106768654
main                           Main                  26.9   17.2     83  53500026
CAF                            Main                   5.2    2.3     16   7001237

and with ghc -O3:

% for i in 1 2 3; do time ./qsort +RTS -P -K12000000; done
True
./qsort +RTS -P -K12000000  9.40s user 0.30s system 99% cpu 9.720 total
True
./qsort +RTS -P -K12000000  10.58s user 0.40s system 99% cpu 11.021 total
True
./qsort +RTS -P -K12000000  10.10s user 0.36s system 99% cpu 10.501 total

COST CENTRE                    MODULE               %time %alloc  ticks     bytes
qsort1                         Main                  35.1   46.8     74 127559904
main                           Main                  33.6   19.1     71  52001182
qsort2                         Main                  31.3   34.1     66  93115220

So it seems the compositive version is a tad faster and notably less memory-hungry. Also, ghc optimizes the compositive version more efficiently, presumably because (.) enables all sorts of rewrites (++) doesn't.

As a sidenote, strangely enough replacing the list comprehensions in the above code with something like

> let (small,big) = partition (<x) xs in qsort small ++ [x] ++ qsort big

ended up in a performance decrease of some 3s with -O3 (didn't test w/o optimizations)...

Update: As folks on #haskell pointed out I should mention that this isn't a real quicksort. Quicksort refers to the exact in-place partitioning algorithm that the imperative version uses. This is of course immaterial to the benchmark.

Update 15:01: quicksilver noted that the profiling overheads for the two implementations might be different. I ran a quick test that showed the overheads to be roughly equal: unoptimized qsort2 was about 3% faster than unoptimized qsort1 both with and without profiling.

2008-06-26 18:05

Today's small haskell satori: the least fixpoint operator

> fix f = let x = f x in x

(which seemed like dark magic when I first saw it) could be written a bit more intuitively as

> fix' f = let f' = f . f' in f' undefined

This captures the intuition that fix just applies a function iteratively to undefined (i.e. bottom, _|_). Actually we could use any value in the place of undefined but that would restrict the type of fix'. You see, the only reason f' really needs to be applied to anything is to appease the typechecker.

Of course the original (Haskell98) definition is more elegant and probably has better sharing characteristics too.

Update 2008-07-11: found some related musings on haskell-cafe. See also the reply

2008-06-30

Paycheck came in today so I could finally pay off my library bills. While at it, borrowed some music:

  • Squarepusher - Selection Sixteen
    • 'nuff said
  • Ceephax - Exidy Tours
    • Brother of Squarepusher, new find. IDM with gabber and d'n'b flavours
  • LFO - Sheath
    • Nicely executed dark techno with IDM influences
  • German Elektroniks - Elektro Jugend
    • Kraftwerk goes industrialish
  • Itäväylä - Itäväylä
    • Undescribeable ambient electro-boogey
  • Four Tet - Everything Ecstatic
    • Abstract, unsettling, jazzy, hyperactive something; Venetian Snares goes artistic.

Great stuff, gotta love the libraries in Helsinki!

Update 2008-07-05: Gah, forgot all about Autechre - Chiastic Slide, probably the best in this pile. Not quite up to the aggressive-disassociative madness of Autechre's Untitled, Chiastic Slide presents a calmer and warmer side of the artist w/o compromising the weirdness and smooth execution that are the cornerstone of their music.

I've also found a weird, shy love that hides behind the samples in Everything Ecstatic, but I can't shake the forlorn and melancholic vibes it sends down my spine. Definitely recommended.

Also, added juno.co.uk links on the albums I could find, enjoy the free samples!

Tags: culture.

RSS Feed