컴파일러 4 장 하향식 파싱 (Top-Down Parsing) (1)

컴파일러4장 하향식 파싱(Top-Down Parsing) (1) 순천향대학교 컴퓨터공학과 2018. 10. 29 하 상 호

요약 • 하향식 파싱 • 재귀적 순환 파싱

하향식 파싱 A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation. Such an algorithm is called top-down because the implied traversal of the parse tree is a preorder traversal.

하향식 파서의 2가지 유형 • 예측 파서(Predictive parsers) • 몇 개(일반적으로 한 개)의 lookahead 토큰을 사용하여 아랫 부분의 트리 구조를 미리 결정하는 방법 • 이 방법은 예측 결정이이루어지기 전에는 프로그램 구조를 알 수 없기 때문에 취약함 • 되추적 파서(Backtracking parsers) • 트리를 구성하면서 결정이 잘못된 경우에 되추적하여 다른 선택을함으로써 lookahead 문제를 해결하는 방법 • 이 방법은 비효율적(일반적으로 지수 시간의 복잡도)

하향식 파서 • 예측 파서의 lookahead 문제를 극복하는 여러 방법이 개발되었음 • 예측 파서인 재귀적 순환(recursive-descent)은단순하여 핸디 코딩에 적절한 방법이다. • 그러나 하향식 파싱은 본질적으로 취약하여 기계-생성 파싱 방법으로는 적절하지 않다. • 더 강력한 상향식 파싱 방법이 고려되어야 한다.(5장)

재귀적 순환 파싱 • 단순하면서 우아한 아이디어 • 문법 규칙을 프로시저 코드에 대한 지침으로 사용 • 논터미널은 프로시저에 상응 • 규칙의 RHS에 대해서 • 각 터미널에 대해서 토큰을 매칭 • 각 논터미널은 연관된 프로시저의 호출에 상응

예제: 재귀적 순환 파싱 Grammar rule: factor (exp ) | number Code: void factor(void) { if (token == number) match(number); else { match(‘(‘); exp(); match(‘)’); } } void match(expectedToken) { if (token == expecedToken) getToken(); else error; }

예제: 재귀적 순환 파싱 • 재귀적 순환 파서는 값을 계산하거나 구문 트리를 구성할 수 있다. int factor(void) { if (token == number) { int temp = atoi(tokStr); match(number); return temp; } else { match(‘(‘); int temp = exp(); match(‘)’); return temp; } } factor (exp ) | number

좌순환 규칙 존재는? exp expaddopterm | term void exp(void) { if (token == ??) { exp(); addop(); term(); } else term(); } 코드의 문제점은?

EBNF가 해결책 exp term { addopterm } void exp(void) { term(); while (token is an addop) { addop(); term(); } }

우순환 규칙 존재는? exp term [ addopexp ] void exp(void) { term(); if (token is an addop) { addop(); exp(); } } No problem!!

예제: 재귀적 순환 파싱 • 문법 • 위의 식에 대한 값을 평가하거나 구문 트리를 구성하는 재귀적 순환 파서를 작성하라. exp exp addop term | term addop  + | - term  term multop factor | factor multop  * factor  ( exp ) | number

exp exp addop term | term addop  + | - term  term multop factor | factor multop  * factor  ( exp ) | number Example int exp () // compute values { var temp : integer; temp := term(); while (token = ‘+’ or token = ‘-’ ) do case token of ‘+’ : match(+); temp := temp + term(); ‘-’ : match(-); temp := temp – term(); end case; } return temp; } 연산의 좌결합 규칙이 유지되는가?

Simple integer arithmetic calculator (1) /* EBNF: <exp> -> <term> {<addop><term>} <addop> -> + | - <term> -><factor> {<mulop> <factor>} <mulop> -> * <factor> -> (<exp)> | number */ main() { int result; token = getToken(); result = exp(); if (token == ‘\n’) printf(“Result = %d\n”, result); else error(); return 0; } int exp(void) { int tmp = term(); while((token == ‘+’) || (token== ‘-’)) switch (token) { case ‘+’: match(‘+’); temp += term(); break; case ‘-’: match(‘+’); temp -= term(); break; } return temp; }

Simple integer arithmetic calculator (2) int factor(void) { int temp if (token == ‘(‘) { match(‘(‘); temp = exp(); match(‘)’); } else if (isdigit(token)) { ungetc(token, stdin)); scanf(“%d”, &temp); token = getchar(); } else error(); return temp; } /* EBNF: <exp> -> <term> {<addop><term> <addop> -> + | - <term> <factor> {<mulop> <factor>} <mulop> -> * <factor> -> (<exp)> | Number */ int term(void) { int tmp = factor(); while(token == ‘*’) { match(‘*’); temp *= factor(); } return temp; }

Example • 수식에 대한 구문 트리를 어떻게 구성할 것인가? exp exp addop term | term addop  + | - term  term multop factor | factor multop  * factor  ( exp ) | number + 3 + 4 + 5 5 + 4 3

exp exp addop term | term addop  + | - term  term multop factor | factor multop  * factor  ( exp ) | number Example: 수식 구문트리 typedef enum {Plus,Minus,Times} OpKind; typedef enum {OpK,ConstK} ExpKind; typedef struct streenode { ExpKind kind; OpKind op; struct streenode *lchild,*rchild; int val; } STreeNode; typedef STreeNode *SyntaxTree; syntaxTree exp () // construct syntax tree { var temp, newtemp: syntaxTree; temp := term(); while (token = + or token = - ) do newtemp := makeOpNode(token); match(token); leftChild(newtemp) := temp; rightChild(newtemp) := term(); temp := newtemp; } return temp; } 3 + 4 + 5 + 5 + 4 3

재귀적 순환 파서의 문제점 • 재귀하강 파서는 • quite powerful but still ad hoc • 규모가 작고 주의깊게 설계된 언어의 경우에 적절

재귀적 순환 파서의 문제점 α,β,… 가 논터미널로 시작되면?

TINY 문법의 EBNF 버전 program stmt-sequence stmt-sequence statement { ;statement } statement if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt if-stmt ifexp thenstmt-sequence [ elsestmt-sequence ] end repeat-stmt repeatstmt-sequence untilexp assign-stmt identifier := exp read-stmtread identifier write-stmtwrite exp exp simple-exp [ comparison-op simple-exp ] comparison-op < | = simple-exp term { addop term } addop + | - term factor { mulop factor } mulop* | / factor  ( exp ) | number | identifier program stmt-sequence stmt-sequence stmt-sequence ;statement | statement statement if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt if-stmt ifexp thenstmt-sequence end | ifexp thenstmt-sequence elsestmt-sequence end repeat-stmt repeatstmt-sequence untilexp assign-stmt identifier := exp read-stmtread identifier write-stmtwrite exp exp simple-exp comparison-op simple-exp | simple-exp comparison-op < | = simple-exp simple-exp addop term | term addop + | - term term mulop factor | factor mulop* | / factor(exp ) | number | identifier

TINY 파서 • 구문 트리를 생성하는 재귀적 하강 파서를 작성하라.

Recall: TINY 구문 트리 구조 typedef enum {StmtK,ExpK} NodeKind; typedef enum {IfK,RepeatK,AssignK,ReadK,WriteK} StmtKind; typedef enum {OpK,ConstK,IdK} ExpKind; /* ExpType is used for type checking */ typedef enum {Void,Integer,Boolean} ExpType; #define MAXCHILDREN 3 typedef struct treeNode { struct treeNode * child[MAXCHILDREN]; struct treeNode * sibling; int lineno; NodeKind nodekind; union {StmtKind stmt; ExpKind exp;} kind; union {TokenType op; int val; char * name; } attr; ExpType type; /* for type checking of exps */ } TreeNode;

Recall: sample.tny read x; if 0 < x then fact := 1; repeat fact := fact * x; x := x – 1; until x = 0; write fact end

TINY 파서의 재귀적 순환 파서 코드(구문 트리 생성) statement if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt TreeNode * statement(void) { TreeNode * t = NULL; switch (token) { case IF : t = if_stmt(); break; case REPEAT : t = repeat_stmt(); break; case ID : t = assign_stmt(); break; case READ : t = read_stmt(); break; case WRITE : t = write_stmt(); break; default : syntaxError("unexpected token -> "); printToken(token,tokenString); token = getToken(); break; } /* end case */ return t; } statement if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt if-stmt ifexp thenstmt-sequence end | ifexp thenstmt-sequence elsestmt-sequence end repeat-stmt repeatstmt-sequence untilexp assign-stmt identifier := exp read-stmtread identifier write-stmtwrite exp

TINY 파서의 재귀적 순환 파서 코드 (2) if-stmt ifexp thenstmt-sequence [ elsestmt-sequence ] end TreeNode * if_stmt(void) { TreeNode * t = newStmtNode(IfK); match(IF); if (t!=NULL) t->child[0] = exp(); match(THEN); if (t!=NULL) t->child[1] = stmt_sequence(); if (token==ELSE) { match(ELSE); if (t!=NULL) t->child[2] = stmt_sequence(); } match(END); return t; } statement if-stmt | repeat-stmt | assign-stmt | read-stmt | write-stmt if-stmt ifexp thenstmt-sequence end | ifexp thenstmt-sequence elsestmt-sequence end repeat-stmt repeatstmt-sequence untilexp assign-stmt identifier := exp read-stmtread identifier write-stmtwrite exp

컴파일러 4 장 하향식 파싱 (Top-Down Parsing) (1)