1 / 24

컴파일러 입문

컴파일러 입문. 제 2 장 형식 언어. Language. ◈ Basic definitions (1) alphabet - a finite set of symbols. - ex) T1 = { ㄱ , ㄴ , ㄷ ,..., ㅎ , ㅏ , ㅑ , … , ㅡ , ㅣ } T2 = {A,B,C, … ,Z} T3 = {begin,integer, … ,end}

lael
Download Presentation

컴파일러 입문

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 컴파일러 입문 제 2 장 형식 언어

  2. Language ◈ Basic definitions (1) alphabet - a finite set of symbols. - ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ} T2 = {A,B,C, … ,Z} T3 = {begin,integer, … ,end} (2) string(or sentence, word) - a sequence of symbols from some alphabet T. (3) length - the number of symbols in the string. - denoted by |ω| 꼭 기억해야 할 세 가지 개념 1. 언어의 정의 2. 문법의 정의 및 개념 3. 인식기의 의미

  3. (4) empty string - a string consisting of no symbols. - denoted by ε or λ. (5) T* denotes the set of all strings of symbols over the alphabet T, including the empty string. T+ = T* - {ε} ☞ T* : T star T+ : T dagger (6) Language is any set of strings over an alphabet.(Text p.35) (or A Language L over the alphabet T is a subset of T*.) L ⊆ T*

  4. ◈ Two problems 1) How do we represent a language ? If the language is finite, the answer is easy. If the language is infinite, we are faced with the problem of finding a finite representation for the language. - Set description - Grammar : Generating Scheme - Recognizer : Recognition Scheme 2) Does there exist a finite representation for every language ? No ! - This is not always possible.

  5. Text p.40 ~ 41 ◈ More definitions (1) concatenation - u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm - u • v를 보통 uv로 표기. - uε= u = εu - ∀u,v ∈ T*, uv ∈ T*. - |uv| = |u| + |v| (2) anrepresents n a's. a0 = ε (3) the reversal of a string ω, denoted ωR is the string ω written in reverse order: i.e., if ω = a1a2...an then ωR = anan-1...a1. (ωR)R=ω

  6. (4) language product LL' = {xy| x ∈ L and y ∈ L'} (5) The powers of a language L are defined recursively by: L0 = {ε} Ln = LLn-1 for n 1. (6) L* : reflexivetransitiveclosure = L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… = (7) L+ : transitive closure = L1 ∪ L2 ∪... ∪ Ln ∪ ... = L* - L0

  7. Grammar ◈ Language • 문장(sentence)들을 원소로 갖는 집합 • 언어를 어떻게 표현할 것인가 ? ◈ Grammar • terminal : 정의된 언어의 알파벳 • nonterminal : • 스트링을 생성하는 데 사용되는 중간 과정의 심볼 • 언어의 구조를 정의하는데 사용 • grammar symbol (V)

  8. Text p.45 ◈ G = (VN, VT, P, S) • VN : a finite set of nonterminal symbols • VT : a finite set of terminal symbols VN VT =  , VN∪ VT = V • P : a finite set of production rules α β, α∈ V+, β∈ V* lhsrhs • S : start symbol(sentence symbol)

  9. Text p.45 ◈ [예] G = ( {S, A}, {a, b}, P, S ) [예 8] • P : S aAS S a A SbA A ba A SS => S aAS | a A SbA | ba | SS

  10. ◈ Derivation 1. => : "directly produce" or "directly derive" if α β∈ P and  , δ∈ V* then  αδ =>  βδ 2. => * : Suppose α1,α2,...,αn ∈ V* and α1 =>α2 =>… => αn, then α1 =>* αn (zero or more derivations) 3. => + : one or more derivations. cf) : production rule에서 사용. "may be replaced by" => : derivation할 때 사용한다.

  11. ◈ L(G) : Language generated by grammar G L(G) = { ω | S =>* ω, ω ∈ VT*} ☞ ω is a sentential form of G if S =>* ω and ω ∈ V*. ω is a sentence of G if S =>* ω and ω ∈ VT*. P : S aA | bB | ε A bS B aS S =>* abba 유도 과정 S => aA (생성규칙 S aA) => abS (생성규칙 A bS) => abbB (생성규칙 S bB) => abbaS (생성규칙 B aS) => abba (생성규칙 S )

  12. Text p.49 generation Grammar Language design ◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS L (G1) = { an | n  1 } ◈ Language design

  13. ◈ G3 = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa L(G3) = { anbncn | n  1 }

  14. (===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b ex4) A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa

  15. ◈ Grammar Design L = { an | n  0 }일 때 문법 : A aA | ε (<===) GrammarDesign ex1) L1= { anbn | n  0 } ex2) L2 = { 0i1j | i  j, i,j  1 } ex3) Constructs of Conventional PL

  16. 1) 파스칼 언어의 상수정의 부분 : 상수정의 부분은 CONST라는 예약어로 시작하며 하나의 상수 정의는 a=b의 형태를 갖는다. 여기서, a는 identifier를 b는 상수를 나타내는 terminal 심벌이다. 상수정의부분은 선택적이며 각각의 상수정의는 ;으로 구분한다. 다음은 상수정의 부분의 예이다. CONST ON = TRUE; OFF = FALSE; EPSILON = 1.0E-10;

  17. 2) C 언어의 정수선언 부분 : 정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의 identifier를 나타낸다. 그리고;으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;과 같다. ※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다.

  18. Text p.54 ◈ In order to prove that a grammar generates a language L i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. [예16] G = ( { S } , { ( , ) } , {S -> (S)S |ε}, S ) <=> All strings of balanced parentheses. proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.

  19. (=>) Every sentence derivable from S is balanced. (i.e., S =>* ω, ω: balanced) By induction on the number of steps in a derivation. i) n = 1 일 때, S => ε, ε is surely balanced. ii) Suppose that all derivations of fewer than n steps produce balanced sentences. iii) Consider a leftmost derivation of exactly n steps. S => (S)S =>* (x)S =>* (x)y By the hypothesis x,y : balanced. Thus (x)y balanced.

  20. (<=) Every balanced string is derivable from S. By induction on the length of a string. i) |ω| = 0, S => ε (the empty string is derivable from S.) ii) Assume that every balanced string of length less than 2n is derived from S. iii) Consider a balanced string ω of length 2n. Let (x) : shortest prefix of ω being balanced. Thus ω = (x)y, where x,y : balanced. Since |x|, |y|<2n, they are derivable from S by inductive hypothesis. Thus S => (S)S =>* (x)S =>* (x)y = ω Therefore, (x)y is also derivable from S.

  21. Chomsky Hierarchy ◈ Noam Chomsky ◈ According to the form of the productions. α -> β ∈ P - Type 0 : No restrictions(unrestricted grammar) - Type 1 : Context-sensitive grammar(CSG).  -> β, |  |  | β| - Type 2 : Context-free grammar(CFG). A -> , where A : nonterminal,  ∈ V*. - Type 3 : Regular grammar(RG). A -> tB or A -> t, (right-linear) A -> Bt or A -> t, (left-linear) where, A, B: nonterminal, t ∈ VT*.

  22. ◈ REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language) ◈ Examples of Formal Language  simple matching language : Lm = {anbn | n ≥ 0} CFL  double matching language : Ldm = {anbncn | n ≥ 0} CSL  mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL  palindrome language : Lr = {ω | ω = ωR } CFL  parenthesis language : Lp = {ω | ω: balenced parenthesis} CFL

  23. regular language context-free language context-sensitive language unrestricted language ◈ The Chomsky Hierarchy of Languages

  24. Grammar Language Recognizer ◈ Languages & Recognizers Type 0 (unrestricted) recursively enumerable set Turing machine context-sensitive language linear bounded automata type 1 context-sensitive context-free language pushdown automata type 2 context-free type 3 (regular) regular language finiteautomata

More Related