Sorry, somehow managed to forget about writing part 2 for multiple weeks
Okay, I'll pick up where part
one
ended. I've added booleans and declarations in order to make type
checking non-trivial ;)
>
>
> import Control.Monad
> import Control.Monad.Error
> import Control.Monad.RWS
> import Control.Monad.Reader
>
> data Expr
> data Stmt
> data Location
> data Decl
>
> type Identifier = String
> type Op = String
> type Symbol = String
> data Type = BoolType | IntegerType
>
>
> data NoAnn a = NoAnn
>
>
> data AST (ann :: * -> *) t where
> If :: ann Stmt -> AST ann Expr -> AST ann Stmt -> AST ann Stmt
>
> Block :: ann Stmt -> [AST ann Decl] -> [AST ann Stmt] -> AST ann Stmt
> Assign :: ann Stmt -> AST ann Location -> AST ann Expr -> AST ann Stmt
>
> Var :: ann Location -> Identifier -> AST ann Location
> Ref :: ann Expr -> AST ann Location -> AST ann Expr
>
> Binary :: ann Expr -> Op -> AST ann Expr -> AST ann Expr -> AST ann Expr
> NumLit :: ann Expr -> Integer -> AST ann Expr
>
>
> BoolLit :: ann Expr -> Bool -> AST ann Expr
> Decl :: Identifier -> Type -> AST ann Decl
Descending from the Tower o' Ivory
I decided to partly break the attribute grammar abstraction by
performing different phases inside different suitable monads. This
increased coding pleasure and clarity in comparison to the generalized
versions presented in the previous post.
> type M = Either String
> type TypeM = ReaderT TEnv (Either String)
> type CodeM = RWST SEnv () SState (Either String)
The Reader
monad is used for passing down symbol tables,
and, in the case of code generation, the label of the current loop for
use in breaking. These are the inherited attributes of the respective
attribute grammars.
> data SEnv = SEnv { symbols :: [Symbol], loopend :: Maybe String}
> data TEnv = TEnv { decls :: [AST NoAnn Decl]}
Additionally code generation needs to keep counters for disambiguating
variables and scopes (labels). The State
monad
corresponds to a top-down left-to-right flowing attribute.
> data SState = SState { var :: Integer, scope :: Integer }
Synthetic attributes are embedded in the form of the recursive
traversal, no fancy tricks are needed to represent them.
As another step down the abstraction ladder I differentiated the type
signatures of different phases to enforce and clarify the flow of
information between the phases.
Typechecking generates an annotated AST from an unannotated one:
> data Attr a where
> EAttr {etyp :: AST Attr Type} :: Attr Expr
> LAttr {ltyp :: AST Attr Type} :: Attr Location
> Empty :: Attr a
>
> class Typecheck a where
> typecheck :: AST NoAnn a -> TypeM (AST Attr a)
>
> do_typecheck :: AST NoAnn Stmt -> M (AST Attr Stmt)
> do_typecheck ast = runReaderT (typecheck ast) (TEnv [])
Code generation takes an annotated AST and transforms it into code:
> data Code a where
>
> ECode {code :: String, out :: String} :: Code Expr
>
> LCode {lcode :: String, ptr :: String} :: Code Location
>
> SCode :: String -> Code Stmt
>
> class Codegen a where
> codegen :: AST Attr a -> CodeM (Code a)
>
> do_codegen :: AST Attr Stmt -> Either String String
> do_codegen ast = do
> (SCode code,_) <- evalRWST (codegen ast) (SEnv [] Nothing) (SState 0 0)
> return code
Now our glorious two phase compiler (still missing the instances ;)
could be written as \s -> parser s >>= do_typecheck >>= do_codegen
.
The do_
functions are restricted to statements on purpose: we know
that the root node of the AST will be a statement.
Notice how the Code
and Attr
types are parametrized with the same
phantom types as nodes in the AST. This is what we had in mind when
designing the GADT.
Now we have managed to present different phases modularily while not
straying too much from the original attribute grammar approach.
Okay, this has been sitting in my draft directory for long enough,
better "realease early, release often"
Coming next: the implementation.
Update 23:35
Update 2008-08-25: The do_
functions