280 likes | 457 Views
Automatic Generation of Language-based Tools: The LISA Approach Marjan Mernik. FA CULTY OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE. UNIVER SITY OF MARIBOR. Outline of the Presentation. How to specify a programming language? Formal methods for programming language definition
E N D
Automatic Generation of Language-based Tools: The LISA Approach Marjan Mernik FACULTY OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE UNIVERSITY OF MARIBOR
Outline of the Presentation • How to specify a programming language? • Formal methods for programming language definition • LISA compiler/interpreter generator • Language-based tools generated by LISA
How to specify a programming language? • Using the natural language • Advantages: • descriptions are understandable, • accessible to a wide variety of users. • Disadvantages: • lack of clarity, • ambiguities, • various interpretations.
How to specify a programming language? • Using a formal method • Advantages: • syntax and the semantics are defined in a precise and unambiguous manner, • possibility for automatic generation of compilers or interpreters, • tool for programming language design • Disadvantages: • required detail knowledge
Formal methods for programming language definition • Lexicon (regular definitions, FSA) • Syntax (BNF) • Semantics (axiomatic, attribute grammars, denotational, algebraic, structural operational/natural, action, abstract state machines, ....)
Formal methods for programming language definition • Possibility for automatic generation of compilers or interpreters • Attribute Grammars: Synthesizer Generator • Denotational: PSG • Algebraic: ASF+SDF • Structural operational/Natural: Centaur • Action: ASD • Abstract-state machines: Gem-Mex
Formal methods for programming language definition • From formal language definitions many other language-based tools can be automatically generated, such as: • syntax-directed editors, • type checkers, • dataflow analyzers, • partial evaluators, • debuggers, • test case generators, • animators, etc.
Formal methods for programming language definition • The core language definitions have to be augmented or • Just a part of formal language definitions is enough for automatic tool generation or • Implicit information must be extracted from formal language definition
Formal methods for programming language definition • Automatic generation is possible whenever a tool can be built from a fixed part and a variable part; and also the variable part, language dependent, has to be systematically derivable from the language specifications (Table 1).
Generated Tool Formal Specification Fixed Part Variable part Lexer regular definitions algorithm which interpret action table action table: StateState Parser (LR) BNF algorithm which interpret action table and goto table action table: StateTAction goto table: State(TN)State Evaluator Attribute Grammar tree walk algorithm semantic functions Language knowledgeable editor regular definitions (extracted from AG) matching algorithm same as lexer Table 1
Generated Tool Formal Specification Fixed Part Variable part FSA visualization regular definitions (extracted from AG) FSA layout algorithm same as lexer Syntax tree visualization BNF (extracted from AG) Syntax tree layout algorithm syntax tree Dependency graph visualization extracted from AG DG layout algorithm dependency graph Semantic evaluator animation extracted from AG Semantic tree layout algorithm decorated syntax tree & semantic functions Table 1 (cont.)
Regular definitions • An example – arithmetic expressions: e.g. (23+2)*3 integer [0-9]+ operator + | * separator ( | )
0..9 +,* (,) \t, \n, ‘ ‘ 0 1 2 3 4 1 1 0..9 2 1 0..9 3 tInteger 4 4 0 +,* \t,\n 2 (,) 4 tOperator 3 tIgnore \t,\n tSeparator Regular definitions (variable part) action table:StateState
Regular definitions (variable part) void initAutomata() { for (int i = 0; i<=maxState; i++) { for (int j = 0; j<256; j++) automata[i][j] = noEdge; } for (int i = '0'; i<='9'; i++) automata[0][i] = automata[1][i] = 1; automata[0]['+'] = automata[0]['*'] = 2; automata[0]['('] = automata[0][')'] = 3; automata[0]['\n']=automata[0][' ']=automata[0]['\t']=4; automata[4]['\n']=automata[4][' ']=automata[4]['\t']=4; finite[0] = tLexError; finite[1] = tInteger; finite[2] = tOperator; finite[3] = tSeparator; finite[4] = tIgnore; }
Regular definitions (fixed part) Token nextToken() { int currentState = startState; string lexem; int startColumn = column; int startRow = row; do { int tempState = getNextState(currentState, peek()); if (tempState!=noEdge){ currentState = tempState; lexem += (char)read();} else{ if (isFiniteState(currentState)){ Token token(lexem, startColumn, startRow, getFiniteState(currentState), eof()); if (token.getToken()==tIgnore) return nextToken(); elsereturn token; } else{ return Token("", startColumn, startRow, tLexError, eof());} } } while (true); }
BNF • An example – arithmetic expressions: e.g. (23+2)*3 E ::= T EE EE ::= + T EE | T ::= F TT TT ::= * F TT | F ::= (E) | #integer
a A A N call A; 1 if token IN FIRST(1) then T(1) elseif token IN FIRST(2) then T(2) ... elseif token IN FIRST(n) then T(n) else error 2 ... n 1 2 n . . . T(1);T(2);...;T(n) a T if token = 'a' then nextToken else error LL(1) Parser (fixed part)
LL(1) Parser (variable part) bool EE() { if (scanner->currentToken().getLexem()=="+"){ scanner->nextToken(); return T() && EE(); } return true; } bool F(){ if (scanner->currentToken().getToken()==Scanner::tInteger) { scanner->nextToken(); return true; }else if (scanner->currentToken().getLexem()=="("){ scanner->nextToken(); bool zac = E(); if (zac && scanner->currentToken().getLexem()==")"){ scanner->nextToken(); return true; }else return false; }elsereturn false; }
i ( ) + - # S E T i ( ) + - # 0 s s 1 2 3 4 1 s s s 6 7 5 2 r2 r2 r2 r2 r2 r2 3 r5 r5 r5 r5 r5 r5 4 s s 10 2 3 4 5 a 6 s s 8 3 4 7 s s 9 3 4 8 r3 r3 r3 r3 r3 r3 9 r4 r4 r4 r4 r4 r4 10 s s s 11 6 7 11 r6 r6 r6 r6 r6 r6 SLR(1) Parser (variable part) F:StateTAction G: State(TN)State
SLR(1) Parser (fixed part) PUSH(stack,0) current := nextToken while true do if F(TOPV(stack), current) = “s” then{shift} T := G(TOPV(stack), current) PUSH(stack,T) current := nextToken elseif F(TOPV(stack), current) = “r k” then{reduce k-th production} for j = 1 to SIZE(k) POP(stack) T := G(TOPV(stack),LHS(k)) PUSH(stack,T) elseif F(TOPV(stack),current) = “a” then accept else error endif enddo
Attribute Grammars Productions (p P): Semantic functions (fp,a, a S(X0)): E T EE E.val = EE.val EE.inval = T.val EE0 + T EE1 EE0.val = EE1.val EE1.inval = EE0.inval + T.val EE0 EE0.val = EE0.inval T F TT T.val = TT.val TT.inval = F.val TT0 * F TT1 TT0.val = TT1.val TT1.inval = TT0.inval * F.val TT0 TT0.val = TT0.inval F ( E ) F.val = E.val F Integer F.val = Str2Int(Integer.lexVal)
Attribute Grammars bool EE(int inVal, int &val) { if (scanner->currentToken().getLexem()=="+") { scanner->nextToken(); int tempVal; bool ok = T(tempVal); return ok && EE(inVal+tempVal,val); } else { val = inVal; return true; } } bool T(int &val) { int tempVal; bool ok = F(tempVal); return ok && TT(tempVal, val); }
LISA ver. 1 • Developed 1994 • Mernik, Korbar, Žumer. LISA: A Tool for Automatic Language Implementation, ACM Sigplan Notices, Vol. 30, No. 4, pp. 71 – 79, 1995.
LISA ver. 2.0 • LISA ver. 2.0 (joint work with M. Lenič, E. Avdičaušević, V. Žumer) • started in summer 1997 • finished in summer 2000 • Incremental language development • Educational tool
LISA ver. 2.0 • LISA generates also other language-based tools • Editors, • Inspectors, • visualizers/animators(Slovene-Portugal Project)
LISA ver. 2.0 • LISA tool demonstration More info: http://marcel.uni-mb.si/lisa