1 / 13

HaskLex - The Haskell Lexer Senior Seminar Final Project

HaskLex - The Haskell Lexer Senior Seminar Final Project. Chris Lattner April 2000. Outline. Goals Design Example internal code Demonstration!. Hasklex Goals. Write a lexer in Haskell! Provide a usable regex dialect *, +, ?, |, (), [] are required. ! and - are bonuses.

neola
Download Presentation

HaskLex - The Haskell Lexer Senior Seminar Final Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HaskLex - The Haskell LexerSenior Seminar Final Project Chris Lattner April 2000

  2. Outline • Goals • Design • Example internal code • Demonstration! Chris Lattner - Senior Seminar - April 2000

  3. Hasklex Goals • Write a lexer in Haskell! • Provide a usable regex dialect • *, +, ?, |, (), [] are required. ! and - are bonuses. • Provide “adequate” performance • Matches must be very fast, compiling a lexer can be slow, but we prefer it not to be • Provide enough functionality to be useful • Metric: Can an assembler be written with it? Chris Lattner - Senior Seminar - April 2000

  4. Hasklex Design • Four modules layered on top of each other • Each may be used independently • DFA provides ADT and match operations • NFA provides higher level abstraction • RegEx parses regular expression constructs • Lexer ties it all together Chris Lattner - Senior Seminar - April 2000

  5. Previously Presented: FA is a list of nodes Each node contains: “Finality” number List of transitions Ugly to read Easy to implement Structures: type FANode = (Int, [Int]) newtype FA = F [FANode] Example functions: addNode, matchDFA, emptyFA, nukeNode, addFATransition, removeDeadStates DFA Data Structure Chris Lattner - Senior Seminar - April 2000

  6. NFA Data Structure • Identical definition to DFA • Extra transitions on transition lists are considered to be “lambda” transitions () • Unlimited number of  transitions may come from any given node • May be converted to a DFA with the buildDFA function Chris Lattner - Senior Seminar - April 2000

  7. A simple string! Primatives Recognized: “x” - Literal chars “.” - Any character “[a-z]” - Char classes “(aa)*” - Grouping Postfix Operators: “ab” - Juxtaposition “x*” - Klein Enclosure “x+” - Repetition “x?” - Optionalization “x!” - Inversion Infix Operators: “a|b” - Alternation “a-b” - Subtraction Char class provides escaping mechanism buildNFA to convert RegEx Data Structure Chris Lattner - Senior Seminar - April 2000

  8. Contains: Composite DFA Mapping of “finality” states back to user defined tokens First entry of map (list) is error token Uses all other modules Structures: type Token a = (String, a) newtype Lexer a = Lexr (FA, [a]) Example Functions: compileLexer, lexFirstToken, lexIntoList, measureLexer, serializeLexer Lexer Data Structure Chris Lattner - Senior Seminar - April 2000

  9. Example Lexer testLexer = compileLexer [ ("[ \n\t]+", TokIgnore), -- Ignore whitespace (";[^\n]*", TokIgnore), -- Ignore comments... ("if?", TokIf), -- Recognize keywords ("t(hen)?", TokThen), ("e(lse)?", TokElse), ("w(hile)?", TokWhile), ("do?", TokDo), ("[+]", TokPlus), -- Recognize operators ("[-]?[0-9]+", TokInt), -- Recognize numbers ("[-]?[0-9]*[.][0-9]*-[.]", TokFloat), ("[a-zA-Z_][a-zA-Z0-9_]+", TokVar) -- Recognize variables ] TokError -- Error token Chris Lattner - Senior Seminar - April 2000

  10. Lexer Usage • Many functions available to use lexer: lexIntoList :: Eq a => Lexer a -> String -> [Token a] • Lex a string into a list of tokens lexFirstToken :: Lexer a -> String -> Token a • Lex only the first token from the string lexFile :: Eq a => Lexer a -> FilePath -> IO [Token a] • Lex an entire file into a list of tokens Chris Lattner - Senior Seminar - April 2000

  11. But what about performance? • Problem: Lexer is slow to compile • Must parse regexs, build NFA, reduce to DFA • Why not save the finished product? serializeLexer :: Show a => Lexer a -> String -> IO () • Write compiled lexer to file, allow you to “import” it for later use. • Result: “Compile” is very fast! Chris Lattner - Senior Seminar - April 2000

  12. Internal DFA Match Code matchDFALengthState :: FA -> String -> (Int, Int) matchDFALengthState (F dfa) = matchDFAh (dfa!!0) dfa matchDFAh :: FANode -> [FANode] -> String -> (Int, Int) matchDFAh (final, _) _ "" = (0, final) -- Empty string... matchDFAh (final, transitions) dfa (s:str) | blocked = (0, final) | otherwise = (sLen+1, sFin) where -- trans is the transition for the 's' character trans = transitions!!(ord s) (sLen, sFin) = matchDFAh (dfa!!trans) dfa str blocked = trans == (-1) || sFin == (-1) Chris Lattner - Senior Seminar - April 2000

  13. Conclusions • Finite automata can be useful [but let someone else implement them!] • Writing a library isn’t as cool as writing an application • It is possible to write complex programs in Haskell, though painful at times [stack overflows] • Compiled Haskell [GHC] would probably solve many problems Chris Lattner - Senior Seminar - April 2000

More Related