Attribute Grammars, GADTs and MonadFix (part 2)
Sorry, somehow managed to forget about writing part 2 for multiple weeks
Okay, I'll pick up where part one ended. I've added booleans and declarations in order to make type checking non-trivial ;)
> {-# LANGUAGE KindSignatures, EmptyDataDecls, MultiParamTypeClasses, > FlexibleInstances, GADTs #-} > > import Control.Monad > import Control.Monad.Error > import Control.Monad.RWS > import Control.Monad.Reader > > data Expr > data Stmt > data Location > data Decl > > type Identifier = String > type Op = String > type Symbol = String > data Type = BoolType | IntegerType > > -- the empty annotation > data NoAnn a = NoAnn > > > data AST (ann :: * -> *) t where > If :: ann Stmt -> AST ann Expr -> AST ann Stmt -> AST ann Stmt > -- note the added declaration list > Block :: ann Stmt -> [AST ann Decl] -> [AST ann Stmt] -> AST ann Stmt > Assign :: ann Stmt -> AST ann Location -> AST ann Expr -> AST ann Stmt > > Var :: ann Location -> Identifier -> AST ann Location > Ref :: ann Expr -> AST ann Location -> AST ann Expr > > Binary :: ann Expr -> Op -> AST ann Expr -> AST ann Expr -> AST ann Expr > NumLit :: ann Expr -> Integer -> AST ann Expr > > -- these are new > BoolLit :: ann Expr -> Bool -> AST ann Expr > Decl :: Identifier -> Type -> AST ann Decl
Descending from the Tower o' Ivory
I decided to partly break the attribute grammar abstraction by performing different phases inside different suitable monads. This increased coding pleasure and clarity in comparison to the generalized versions presented in the previous post.
> type M = Either String -- aka Error, used for inter-phase communication > type TypeM = ReaderT TEnv (Either String) -- typecheck monad > type CodeM = RWST SEnv () SState (Either String) -- codegen monad
The Reader
monad is used for passing down symbol tables,
and, in the case of code generation, the label of the current loop for
use in breaking. These are the inherited attributes of the respective
attribute grammars.
> data SEnv = SEnv { symbols :: [Symbol], loopend :: Maybe String} > data TEnv = TEnv { decls :: [AST NoAnn Decl]}
Additionally code generation needs to keep counters for disambiguating
variables and scopes (labels). The State
monad
corresponds to a top-down left-to-right flowing attribute.
> data SState = SState { var :: Integer, scope :: Integer }
Synthetic attributes are embedded in the form of the recursive traversal, no fancy tricks are needed to represent them.
As another step down the abstraction ladder I differentiated the type signatures of different phases to enforce and clarify the flow of information between the phases.
Typechecking generates an annotated AST from an unannotated one:
> data Attr a where > EAttr {etyp :: AST Attr Type} :: Attr Expr > LAttr {ltyp :: AST Attr Type} :: Attr Location > Empty :: Attr a > > class Typecheck a where > typecheck :: AST NoAnn a -> TypeM (AST Attr a) > > do_typecheck :: AST NoAnn Stmt -> M (AST Attr Stmt) > do_typecheck ast = runReaderT (typecheck ast) (TEnv [])
Code generation takes an annotated AST and transforms it into code:
> data Code a where > -- code is the body, out is the register the result is in > ECode {code :: String, out :: String} :: Code Expr > -- code fetches the reference, which is then available in ptr > LCode {lcode :: String, ptr :: String} :: Code Location > -- statement code is just a body > SCode :: String -> Code Stmt > > class Codegen a where > codegen :: AST Attr a -> CodeM (Code a) > > do_codegen :: AST Attr Stmt -> Either String String > do_codegen ast = do > (SCode code,_) <- evalRWST (codegen ast) (SEnv [] Nothing) (SState 0 0) > return code
Now our glorious two phase compiler (still missing the instances ;)
could be written as \s -> parser s >>= do_typecheck >>= do_codegen
.
The do_
functions are restricted to statements on purpose: we know
that the root node of the AST will be a statement.
Notice how the Code
and Attr
types are parametrized with the same
phantom types as nodes in the AST. This is what we had in mind when
designing the GADT.
Now we have managed to present different phases modularily while not straying too much from the original attribute grammar approach.
Okay, this has been sitting in my draft directory for long enough, better "realease early, release often"
Coming next: the implementation.
Update 23:35
Update 2008-08-25: The do_
functions