(λblog. blog blog) (λblog. blog blog)

diverging since 2008

Attribute Grammars, GADTs and MonadFix (part 2)

2008-08-24 23:30

Sorry, somehow managed to forget about writing part 2 for multiple weeks

Okay, I'll pick up where part one ended. I've added booleans and declarations in order to make type checking non-trivial ;)

> {-# LANGUAGE KindSignatures, EmptyDataDecls, MultiParamTypeClasses,
>              FlexibleInstances, GADTs #-}
>
> import Control.Monad
> import Control.Monad.Error
> import Control.Monad.RWS
> import Control.Monad.Reader
>
> data Expr
> data Stmt
> data Location
> data Decl
>
> type Identifier = String
> type Op = String
> type Symbol = String
> data Type = BoolType | IntegerType
>
> -- the empty annotation
> data NoAnn a = NoAnn
>
>
> data AST (ann :: * -> *) t where
>     If      :: ann Stmt -> AST ann Expr -> AST ann Stmt -> AST ann Stmt
>     -- note the added declaration list
>     Block   :: ann Stmt -> [AST ann Decl] -> [AST ann Stmt] -> AST ann Stmt
>     Assign  :: ann Stmt -> AST ann Location -> AST ann Expr -> AST ann Stmt
>
>     Var     :: ann Location -> Identifier -> AST ann Location
>     Ref     :: ann Expr -> AST ann Location -> AST ann Expr
>
>     Binary  :: ann Expr -> Op -> AST ann Expr -> AST ann Expr -> AST ann Expr
>     NumLit  :: ann Expr -> Integer -> AST ann Expr
>
>     -- these are new
>     BoolLit :: ann Expr -> Bool -> AST ann Expr
>     Decl    :: Identifier -> Type -> AST ann Decl

Descending from the Tower o' Ivory

I decided to partly break the attribute grammar abstraction by performing different phases inside different suitable monads. This increased coding pleasure and clarity in comparison to the generalized versions presented in the previous post.

> type M = Either String    -- aka Error, used for inter-phase communication
> type TypeM = ReaderT TEnv (Either String)   -- typecheck monad
> type CodeM = RWST SEnv () SState (Either String) -- codegen monad

The Reader monad is used for passing down symbol tables, and, in the case of code generation, the label of the current loop for use in breaking. These are the inherited attributes of the respective attribute grammars.

> data SEnv = SEnv { symbols :: [Symbol], loopend :: Maybe String}
> data TEnv = TEnv { decls :: [AST NoAnn Decl]}

Additionally code generation needs to keep counters for disambiguating variables and scopes (labels). The State monad corresponds to a top-down left-to-right flowing attribute.

> data SState = SState { var :: Integer, scope :: Integer }

Synthetic attributes are embedded in the form of the recursive traversal, no fancy tricks are needed to represent them.

As another step down the abstraction ladder I differentiated the type signatures of different phases to enforce and clarify the flow of information between the phases.

Typechecking generates an annotated AST from an unannotated one:

> data Attr a where
>     EAttr {etyp :: AST Attr Type} :: Attr Expr
>     LAttr {ltyp :: AST Attr Type} :: Attr Location
>     Empty :: Attr a
>
> class Typecheck a where
>    typecheck :: AST NoAnn a -> TypeM (AST Attr a)
>
> do_typecheck :: AST NoAnn Stmt -> M (AST Attr Stmt)
> do_typecheck ast = runReaderT (typecheck ast) (TEnv [])

Code generation takes an annotated AST and transforms it into code:

> data Code a where 
>     -- code is the body, out is the register the result is in
>     ECode {code :: String, out :: String} :: Code Expr
>     -- code fetches the reference, which is then available in ptr
>     LCode {lcode :: String, ptr :: String} :: Code Location
>     -- statement code is just a body
>     SCode :: String -> Code Stmt
>
> class Codegen a where
>    codegen :: AST Attr a -> CodeM (Code a)
>
> do_codegen :: AST Attr Stmt -> Either String String
> do_codegen ast = do 
>   (SCode code,_) <- evalRWST (codegen ast) (SEnv [] Nothing) (SState 0 0)
>   return code

Now our glorious two phase compiler (still missing the instances ;) could be written as \s -> parser s >>= do_typecheck >>= do_codegen. The do_ functions are restricted to statements on purpose: we know that the root node of the AST will be a statement.

Notice how the Code and Attr types are parametrized with the same phantom types as nodes in the AST. This is what we had in mind when designing the GADT.

Now we have managed to present different phases modularily while not straying too much from the original attribute grammar approach.

Okay, this has been sitting in my draft directory for long enough, better "realease early, release often"

Coming next: the implementation.

Update 23:35

Update 2008-08-25: The do_ functions