350 likes | 496 Views
Andrew Casey Laurie Hendren McGill University. MetaLexer : A Modular Lexical Specification Language. www.sable.mcgill.ca/metalexer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A A A. Why MetaLexer ?
E N D
Andrew Casey • Laurie Hendren • McGill University MetaLexer: A Modular Lexical Specification Language www.sable.mcgill.ca/metalexer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAAAAAAA
Why MetaLexer? • Why is it relevant to AOSD? • What are the challenges?
Structure of a Compiler Front-End Regular Expressions + State (flex, jflex, ...) Context-free grammars + actions/attributes (yacc, bison, Polyglot, JastAdd, ...)
Given a front-end specification for a language (i.e. Java), current method to implement a front-end for an extension of that language (i.e. AspectJ)? • Grammar rules for extension
Desired Modular MetaLexer Approach • Grammar rules for extension • Lexical rules for extension
We also want to be able to combine lexical specifications for diverse languages. • Java + HTML • Java + Aspects (AspectJ) • Java + SQL • MATLAB + Aspects (AspectMatlab)
Would like to be able to reuse and extend lexical specification modules • Nested C-style comments • Javadoc comments • Floating-point constants • URL • regular expressions • …
First, let’s understand the traditional lexer tools (lex, flex, jflex). • programmer specifies regular expressions + actions • tools generate a finite automaton-based implementation • states are used to handle different language contexts
Current (ugly) method for extending jflex specifications - copy&modify • Copy jflex specification. • Insert new scanner rules into copy. • Order of rules matters! • Introduce new states and action logic for converting between states. • Principled way of weaving new rules into existing rules. • Modular and abstract notion of state and changing between states.
JflexLexing Structure Specificationin one file. • Lexing rules associated with a state. • Changing states associated with action code.
Each component specified in its own file. • MetaLexer Structure Layout specified in its own file. • Components define lexing rules associated with a state and produce meta-tokens. • Layout defines transitions between components, state changes by meta-lexer.
Scanning a properties file Properties Key Value Util_Patterns
MetaLexeris implemented and available: • www.sable.mcgill.ca/metalexer properties.mll properties.jflex util_patterns.mlc MetaLexer key.mlc value.mlc
Key problems to solve: • How to implement the meta-token lexer? • How to allow for insertion of new components, replacing of components, adding new embeddings (metalexer transitions). • How to insert new patterns into components as specific points.
Implementing the meta-token lexer • Recognize a meta-pattern, i.e. when to go to a new component and when to return. • Recognize the matching suffix.
Implementing MetaLexer layout inheritance • Layouts can inherit other layouts • %inherit directive put at the location at which the inherited transition rules (embeddings) should be placed. • each %inherit directive can be followed by: • %unoption • %replace • %unembed • new embeddings
Weaving in inherited component Woven output O New Component adds some rules and inherits original component. Original Component
Results: • Applied to three projects with complex scanners: • AspectJ (abc and extensions) • Matlab (Annotations and AspectMatlab extensions) • MetaLexer
Using MetaLexer for an extensible front end for McLab PLDI 2011 Tutorial on McLab!!!!!
MetaLexer scanner implemented in MetaLexer • 1st version of MetaLexerwritten in JFlex, one for components and one for layouts. • 2nd version implemented in MetaLexer, many shared components between the component lexer and the layout lexer.
Related Work • Ad-hoc systems with separate scanner/ LALR parser • Polyglot • JastAdd • abc • Recursive-descent scanner/parser • ANTLR and systems using ANTLR • Scannerless systems • Rats! (PEGs) • Integrated systems • Copper (modified LALR parser which communicates with DFA-based scanner)
Conclusions • MetaLexer allows one to specify modular and extensible scanners suitable for any system that works with JFlex. • Two main ideas: meta-lexing and component/layout inheritance. • Used in large projects such as abc, McLab and MetaLexer itself. • Available at: www.sable.mcgill.ca/metalexer