1 / 28

Layered Combinator Parsers with a Unique State

Layered Combinator Parsers with a Unique State. Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands. Overview. conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling. parser combinators.

nerice
Download Presentation

Layered Combinator Parsers with a Unique State

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Layered Combinator Parsers with a Unique State Pieter KoopmanRinus PlasmeijerNijmegen, The Netherlands

  2. Overview • conventional parser combinators • requirements new combinators • system-architecture • new parser combinators • separate scanner and parser • error handling Pieter Koopman

  3. parser combinators • Non deterministic, list of results :: Parser s r :== [s] -> [ ParseResult s r ]:: ParseResult s r :== ([s],r) • fail & yield fail = \ss = []yield r = \ss = [(ss,r)] • recognize symbol satisfy :: (s->Bool) -> Parser s ssatisfy f = pwhere p [s:ss] | f s = [(ss,s)] p _ = [] symbol sym :== satisfy ((==) sym) Pieter Koopman

  4. parser combinators 2 • sequence-combinators (<&>) infixr 6::(Parser s r)(r->Parser s t)->Parser s t(<&>) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1 , tuple <- p2 r1 ss2 ] (<+>)infixl 6::(Parser s(r->t))(Parser s r)->Parser s t (<+>) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1 , (ss3,r) <- p2 ss2 ] • choose-combinator (<||>) infixr 4::(Parser s r) (Parser s r)->Parser s r(<||>) p1 p2 = \ss = p1 ss ++ p2 ss Pieter Koopman

  5. parser combinators 3 • some useful abbreviations (@>) infixr 7(@>) f p :== yield f <+> p (<:>) infixl 6(<:>) p1 p2 :== (\h t=[h:t]) @> p1 <+> p2 Pieter Koopman

  6. parser combinators 4 • Kleene star star p = p <:> star p <||> yield [] plus p = p <:> star p • parsing an identifier identifier :: Parser Char String identifier = toString @> satisfy isAlpha <:> star (satisfy isAlphanum) Pieter Koopman

  7. parser combinators 5 • context sensitive parsers twice the same character doubleChar = satisfy isAlpha <&> \c -> symbol c • arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' Pieter Koopman

  8. parser combinators 5 • context sensitive parsers twice the same character doubleChar = satisfy isAlpha <&> \c -> symbol c • arbitrary look ahead lookAhead = symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' <||> star (satisfy isSpace) +> symbol 'a' <||> symbol 'x' Pieter Koopman

  9. properties of combinators + concise and clear parsers + full power of fpl available + context sensitive + arbitrary look-ahead + can be efficient, continuations IFL '98 - no error handling (messages & recovery) - no unique symbol tables - separate scanner yields problems scan entire file before parser starts Pieter Koopman

  10. Requirements • parse state with • error file • notion of position • user-defined extension e.g. symbol table • possibility to add separate scanner • efficient implementation, continuations • for programming languages we want a single result (deterministic grammar) Pieter Koopman

  11. Uniqueness • files and windows that should be single-threaded are unique in Clean fwritec :: Char *File -> *File • data-structures can be updated destructively when they are unique • only unique arrays can be changed Pieter Koopman

  12. System-architecture • replace the list of symbols by a structure containing • actual input • position • error administration • user defined part of the state • use a type constructor class to allow multiple levels Pieter Koopman

  13. Type constructor class • Reading a symbol class PSread ps s st :: (*ps s *st)->(s, *ps s *st) • Copying the state is not allowed,use functions to manipulate the input class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st) class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st) • Minimal parser state requires Clean 2.0 class ParserState ps symbol state | PSread, PSsplit, PSback, PSclear ps symbol state Pieter Koopman

  14. New parser combinators • Parsers have three arguments • success-continuation determines action upon success SuccCont :== Item failCont State -> (Result, State) • fail-continuation specifies what to do if parser fails FailCont :== State -> (Result, State) • current input state State :== (Symbol, ParserState) Pieter Koopman

  15. New parser combinators 2 • yield and fail, apply appropriate continuation yield r = \succ fail tuple = succ r fail tuple failComb = \succ fail tuple = fail tuple • sequence of parsers, change continuation <&> p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t • choice, change continuations (<|>) p1 p2= \succ fail tuple =p1 (\r f t=succ r fail (PSclear t))(\t2 =p2 succ fail (PSback t2))(PSsplit tuple) Pieter Koopman

  16. string input • a very simple instance of ParserState :: *StringInput symbol state = { si_string:: String // string holds input, si_pos:: Int // index of current char, si_hist:: [Int] // to remember old positions, si_state:: state // user-defined extension , si_error :: ErrorState } instance PSread StringInput Char statewhere PSread si=:{si_string,si_pos}= (si_string.[si_pos],{si & si_pos = si_pos+1}) instance PSsplit StringInput Char statewhere PSsplit (c,si=:{si_pos,si_hist})= (c,{si & si_hist = [si_pos:si_hist]}) instance PSback StringInput Char statewhere PSback (_,si=:{si_string,si_hist=[h:t]})= (si_string.[h-1],{si & si_pos = h, si_hist = t}) Pieter Koopman

  17. Separate scanner and parser • sometimes it is convenient to have a separate scannere.g. to implement the offside rule • task of scanner and parser is similar.So, use the same combinators • due to the type constructor class we can nest parser states Pieter Koopman

  18. a simple scanner • use of combinators doesn’t change • produces tokens (algebraic datatype) scanner = skipSpace +> (generateOffsideToken <|> satisfy isAlpha <:> star (satisfy isAlphanum)<@ testReserved o toString <|> plus (satisfy isDigit)<@ IntToken o to_number 0 <|> symbol '=' <@ K EqualToken <|> symbol '(' <@ K OpenToken <|> symbol ')' <@ K CloseToken ) Pieter Koopman

  19. generating offside tokens • use an ordinary parse function generateOffsideToken = pAcc getCol <&> \col -> // get current coloumn pAcc getOffside <&> \os_col -> // get offside position handleOS col os_col where handleOS col os_col | EndGroupGenerated os_col | col < os_col = pApp popOffside (yield EndOfGroupToken) = pApp ClearEndGroup failComb | col <= os_col = pApp SetEndGroup (yield EndOfDefToken) = failComb Pieter Koopman

  20. Parser state for nesting • parser state contains scanner and its state :: *NestedInput token state = E. .ps sym scanState: { ni_scanSt :: (ps sym scanState) , ni_scanner :: (ps sym scanState) -> *(token,ps sym scanState)) , ni_buffer :: [token] , ni_history :: [[token]] , ni_state :: state } • can be nested to any depth • we can, but doesn’t have to, use this Pieter Koopman

  21. Parser state for nesting 2 NestedInput ScanState *File *ErrorState *OffsideState scanner *HashTable Pieter Koopman

  22. Parser state for nesting 3 • apply scanner to read token instance PSread NestedState token state where PSread ns=:{ns_scanner,ns_scanSt} # (tok,state) = ns_scanner ns_scanSt = (tok,{ns & ns_scanSt = state}) • here, we ignored the buffer • define instances for other functions in class ParserState Pieter Koopman

  23. error handling • general error correction is difficult • correct simple errors • skip to new definition otherwise • Good error messages: • location: position in file • what are we parsing: stack of contexts Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of = Pieter Koopman

  24. error handling 2 • basic error generation parseError expected val= \succ fail (t,ps)= let msg = toString expected +++" expected instead of " +++toString t in succ val fail(PSerror msg (PSread ps)) • useful primitives wantSymbol sym = symbol sym <|> parseError sym sym want p msg value = p <|> parseError msg value skipToSymbol sym = symbol sym <|> parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym Pieter Koopman

  25. Parser • Parsing expressions pExpression = "Expression" ::>BV @> match mBasicValue <|> pIdentifier <|> symbol CaseToken +> pDeterCase @> pCompoundExpression <+ wantSymbol OfToken <+> star pCaseAlt <+ skipToSymbol EndOfGroupToken <|> symbol OpenToken +> pCompoundExpression<+ wantSymbol CloseToken Pieter Koopman

  26. identifiers in hashtable • use a parse-function • hashtable is user defined state in ParserState pIdentifier = match mIdentToken <&> \ident =pAccSt (putNameInHashTable ident) <@ \name={app_symb=UnknownSymbol name, app_args=[]} • the function pAccSt applies a function to the user defined state Pieter Koopman

  27. limitations of this approach • syntax specified by parse functions • grammar is not a datastructure • no detection of left recursionruntime error instead of nice message • no automatic left-factoringdo it by hand, or runtime overheadp1 = p <&> q1 <|> p <&> q2p2 = p <&> (q1 <|> q2) Pieter Koopman

  28. discussion • old advantages • concise, fpl-power, arbitrary look ahead, context sensitve • new advantages • unique and extendable parser state • one or more layers • decent error handling,simple error correction can be added • still efficient, overhead < 2 • non-determinism only when needed Pieter Koopman

More Related