280 likes | 405 Views
Layered Combinator Parsers with a Unique State. Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands. Overview. conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling. parser combinators.
E N D
Layered Combinator Parsers with a Unique State Pieter KoopmanRinus PlasmeijerNijmegen, The Netherlands
Overview • conventional parser combinators • requirements new combinators • system-architecture • new parser combinators • separate scanner and parser • error handling Pieter Koopman
parser combinators • Non deterministic, list of results :: Parser s r :== [s] -> [ ParseResult s r ]:: ParseResult s r :== ([s],r) • fail & yield fail = \ss = []yield r = \ss = [(ss,r)] • recognize symbol satisfy :: (s->Bool) -> Parser s ssatisfy f = pwhere p [s:ss] | f s = [(ss,s)] p _ = [] symbol sym :== satisfy ((==) sym) Pieter Koopman
parser combinators 2 • sequence-combinators (<&>) infixr 6::(Parser s r)(r->Parser s t)->Parser s t(<&>) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1 , tuple <- p2 r1 ss2 ] (<+>)infixl 6::(Parser s(r->t))(Parser s r)->Parser s t (<+>) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1 , (ss3,r) <- p2 ss2 ] • choose-combinator (<||>) infixr 4::(Parser s r) (Parser s r)->Parser s r(<||>) p1 p2 = \ss = p1 ss ++ p2 ss Pieter Koopman
parser combinators 3 • some useful abbreviations (@>) infixr 7(@>) f p :== yield f <+> p (<:>) infixl 6(<:>) p1 p2 :== (\h t=[h:t]) @> p1 <+> p2 Pieter Koopman
parser combinators 4 • Kleene star star p = p <:> star p <||> yield [] plus p = p <:> star p • parsing an identifier identifier :: Parser Char String identifier = toString @> satisfy isAlpha <:> star (satisfy isAlphanum) Pieter Koopman
parser combinators 5 • context sensitive parsers twice the same character doubleChar = satisfy isAlpha <&> \c -> symbol c • arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' Pieter Koopman
parser combinators 5 • context sensitive parsers twice the same character doubleChar = satisfy isAlpha <&> \c -> symbol c • arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' <||> star (satisfy isSpace) +> symbol 'a' <||> symbol 'x' Pieter Koopman
properties of combinators + concise and clear parsers + full power of fpl available + context sensitive + arbitrary look-ahead + can be efficient, continuations IFL '98 - no error handling (messages & recovery) - no unique symbol tables - separate scanner yields problems scan entire file before parser starts Pieter Koopman
Requirements • parse state with • error file • notion of position • user-defined extension e.g. symbol table • possibility to add separate scanner • efficient implementation, continuations • for programming languages we want a single result (deterministic grammar) Pieter Koopman
Uniqueness • files and windows that should be single-threaded are unique in Clean fwritec :: Char *File -> *File • data-structures can be updated destructively when they are unique • only unique arrays can be changed Pieter Koopman
System-architecture • replace the list of symbols by a structure containing • actual input • position • error administration • user defined part of the state • use a type constructor class to allow multiple levels Pieter Koopman
Type constructor class • Reading a symbol class PSread ps s st :: (*ps s *st)->(s, *ps s *st) • Copying the state is not allowed,use functions to manipulate the input class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st) • Minimal parser state requires Clean 2.0 class ParserState ps symbol state | PSread, PSsplit, PSback, PSclear ps symbol state Pieter Koopman
New parser combinators • Parsers have three arguments • success-continuation determines action upon success SuccCont :== Item failCont State -> (Result, State) • fail-continuation specifies what to do if parser fails FailCont :== State -> (Result, State) • current input state State :== (Symbol, ParserState) Pieter Koopman
New parser combinators 2 • yield and fail, apply appropriate continuation yield r = \succ fail tuple = succ r fail tuple failComb = \succ fail tuple = fail tuple • sequence of parsers, change continuation <&> p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t • choice, change continuations (<|>) p1 p2= \succ fail tuple =p1 (\r f t=succ r fail (PSclear t))(\t2 =p2 succ fail (PSback t2))(PSsplit tuple) Pieter Koopman
string input • a very simple instance of ParserState :: *StringInput symbol state = { si_string:: String // string holds input, si_pos:: Int // index of current char, si_hist:: [Int] // to remember old positions, si_state:: state // user-defined extension , si_error :: ErrorState } instance PSread StringInput Char statewhere PSread si=:{si_string,si_pos}= (si_string.[si_pos],{si & si_pos = si_pos+1}) instance PSsplit StringInput Char statewhere PSsplit (c,si=:{si_pos,si_hist})= (c,{si & si_hist = [si_pos:si_hist]}) instance PSback StringInput Char statewhere PSback (_,si=:{si_string,si_hist=[h:t]})= (si_string.[h-1],{si & si_pos = h, si_hist = t}) Pieter Koopman
Separate scanner and parser • sometimes it is convenient to have a separate scannere.g. to implement the offside rule • task of scanner and parser is similar.So, use the same combinators • due to the type constructor class we can nest parser states Pieter Koopman
a simple scanner • use of combinators doesn’t change • produces tokens (algebraic datatype) scanner = skipSpace +> (generateOffsideToken <|> satisfy isAlpha <:> star (satisfy isAlphanum)<@ testReserved o toString <|> plus (satisfy isDigit)<@ IntToken o to_number 0 <|> symbol '=' <@ K EqualToken <|> symbol '(' <@ K OpenToken <|> symbol ')' <@ K CloseToken ) Pieter Koopman
generating offside tokens • use an ordinary parse function generateOffsideToken = pAcc getCol <&> \col -> // get current coloumn pAcc getOffside <&> \os_col -> // get offside position handleOS col os_col where handleOS col os_col | EndGroupGenerated os_col | col < os_col = pApp popOffside (yield EndOfGroupToken) = pApp ClearEndGroup failComb | col <= os_col = pApp SetEndGroup (yield EndOfDefToken) = failComb Pieter Koopman
Parser state for nesting • parser state contains scanner and its state :: *NestedInput token state = E. .ps sym scanState: { ni_scanSt :: (ps sym scanState) , ni_scanner :: (ps sym scanState) -> *(token,ps sym scanState)) , ni_buffer :: [token] , ni_history :: [[token]] , ni_state :: state } • can be nested to any depth • we can, but doesn’t have to, use this Pieter Koopman
Parser state for nesting 2 NestedInput ScanState *File *ErrorState *OffsideState scanner *HashTable Pieter Koopman
Parser state for nesting 3 • apply scanner to read token instance PSread NestedState token state where PSread ns=:{ns_scanner,ns_scanSt} # (tok,state) = ns_scanner ns_scanSt = (tok,{ns & ns_scanSt = state}) • here, we ignored the buffer • define instances for other functions in class ParserState Pieter Koopman
error handling • general error correction is difficult • correct simple errors • skip to new definition otherwise • Good error messages: • location: position in file • what are we parsing: stack of contexts Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of = Pieter Koopman
error handling 2 • basic error generation parseError expected val= \succ fail (t,ps)= let msg = toString expected +++" expected instead of " +++toString t in succ val fail(PSerror msg (PSread ps)) • useful primitives wantSymbol sym = symbol sym <|> parseError sym sym want p msg value = p <|> parseError msg value skipToSymbol sym = symbol sym <|> parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym Pieter Koopman
Parser • Parsing expressions pExpression = "Expression" ::>BV @> match mBasicValue <|> pIdentifier <|> symbol CaseToken +> pDeterCase @> pCompoundExpression <+ wantSymbol OfToken <+> star pCaseAlt <+ skipToSymbol EndOfGroupToken <|> symbol OpenToken +> pCompoundExpression<+ wantSymbol CloseToken Pieter Koopman
identifiers in hashtable • use a parse-function • hashtable is user defined state in ParserState pIdentifier = match mIdentToken <&> \ident =pAccSt (putNameInHashTable ident) <@ \name={app_symb=UnknownSymbol name, app_args=[]} • the function pAccSt applies a function to the user defined state Pieter Koopman
limitations of this approach • syntax specified by parse functions • grammar is not a datastructure • no detection of left recursionruntime error instead of nice message • no automatic left-factoringdo it by hand, or runtime overheadp1 = p <&> q1 <|> p <&> q2p2 = p <&> (q1 <|> q2) Pieter Koopman
discussion • old advantages • concise, fpl-power, arbitrary look ahead, context sensitve • new advantages • unique and extendable parser state • one or more layers • decent error handling,simple error correction can be added • still efficient, overhead < 2 • non-determinism only when needed Pieter Koopman