130 likes | 215 Views
HaskLex - The Haskell Lexer Senior Seminar Final Project. Chris Lattner April 2000. Outline. Goals Design Example internal code Demonstration!. Hasklex Goals. Write a lexer in Haskell! Provide a usable regex dialect *, +, ?, |, (), [] are required. ! and - are bonuses.
E N D
HaskLex - The Haskell LexerSenior Seminar Final Project Chris Lattner April 2000
Outline • Goals • Design • Example internal code • Demonstration! Chris Lattner - Senior Seminar - April 2000
Hasklex Goals • Write a lexer in Haskell! • Provide a usable regex dialect • *, +, ?, |, (), [] are required. ! and - are bonuses. • Provide “adequate” performance • Matches must be very fast, compiling a lexer can be slow, but we prefer it not to be • Provide enough functionality to be useful • Metric: Can an assembler be written with it? Chris Lattner - Senior Seminar - April 2000
Hasklex Design • Four modules layered on top of each other • Each may be used independently • DFA provides ADT and match operations • NFA provides higher level abstraction • RegEx parses regular expression constructs • Lexer ties it all together Chris Lattner - Senior Seminar - April 2000
Previously Presented: FA is a list of nodes Each node contains: “Finality” number List of transitions Ugly to read Easy to implement Structures: type FANode = (Int, [Int]) newtype FA = F [FANode] Example functions: addNode, matchDFA, emptyFA, nukeNode, addFATransition, removeDeadStates DFA Data Structure Chris Lattner - Senior Seminar - April 2000
NFA Data Structure • Identical definition to DFA • Extra transitions on transition lists are considered to be “lambda” transitions () • Unlimited number of transitions may come from any given node • May be converted to a DFA with the buildDFA function Chris Lattner - Senior Seminar - April 2000
A simple string! Primatives Recognized: “x” - Literal chars “.” - Any character “[a-z]” - Char classes “(aa)*” - Grouping Postfix Operators: “ab” - Juxtaposition “x*” - Klein Enclosure “x+” - Repetition “x?” - Optionalization “x!” - Inversion Infix Operators: “a|b” - Alternation “a-b” - Subtraction Char class provides escaping mechanism buildNFA to convert RegEx Data Structure Chris Lattner - Senior Seminar - April 2000
Contains: Composite DFA Mapping of “finality” states back to user defined tokens First entry of map (list) is error token Uses all other modules Structures: type Token a = (String, a) newtype Lexer a = Lexr (FA, [a]) Example Functions: compileLexer, lexFirstToken, lexIntoList, measureLexer, serializeLexer Lexer Data Structure Chris Lattner - Senior Seminar - April 2000
Example Lexer testLexer = compileLexer [ ("[ \n\t]+", TokIgnore), -- Ignore whitespace (";[^\n]*", TokIgnore), -- Ignore comments... ("if?", TokIf), -- Recognize keywords ("t(hen)?", TokThen), ("e(lse)?", TokElse), ("w(hile)?", TokWhile), ("do?", TokDo), ("[+]", TokPlus), -- Recognize operators ("[-]?[0-9]+", TokInt), -- Recognize numbers ("[-]?[0-9]*[.][0-9]*-[.]", TokFloat), ("[a-zA-Z_][a-zA-Z0-9_]+", TokVar) -- Recognize variables ] TokError -- Error token Chris Lattner - Senior Seminar - April 2000
Lexer Usage • Many functions available to use lexer: lexIntoList :: Eq a => Lexer a -> String -> [Token a] • Lex a string into a list of tokens lexFirstToken :: Lexer a -> String -> Token a • Lex only the first token from the string lexFile :: Eq a => Lexer a -> FilePath -> IO [Token a] • Lex an entire file into a list of tokens Chris Lattner - Senior Seminar - April 2000
But what about performance? • Problem: Lexer is slow to compile • Must parse regexs, build NFA, reduce to DFA • Why not save the finished product? serializeLexer :: Show a => Lexer a -> String -> IO () • Write compiled lexer to file, allow you to “import” it for later use. • Result: “Compile” is very fast! Chris Lattner - Senior Seminar - April 2000
Internal DFA Match Code matchDFALengthState :: FA -> String -> (Int, Int) matchDFALengthState (F dfa) = matchDFAh (dfa!!0) dfa matchDFAh :: FANode -> [FANode] -> String -> (Int, Int) matchDFAh (final, _) _ "" = (0, final) -- Empty string... matchDFAh (final, transitions) dfa (s:str) | blocked = (0, final) | otherwise = (sLen+1, sFin) where -- trans is the transition for the 's' character trans = transitions!!(ord s) (sLen, sFin) = matchDFAh (dfa!!trans) dfa str blocked = trans == (-1) || sFin == (-1) Chris Lattner - Senior Seminar - April 2000
Conclusions • Finite automata can be useful [but let someone else implement them!] • Writing a library isn’t as cool as writing an application • It is possible to write complex programs in Haskell, though painful at times [stack overflows] • Compiled Haskell [GHC] would probably solve many problems Chris Lattner - Senior Seminar - April 2000