2.1 Language

꼭 기억해야 할 세 가지 개념 1. 언어의 정의 2. 문법의 정의 및 개념 3. 인식기의 의미 2.1 Language • Basic definitions (1) alphabet • a finite set of symbols. • ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ} T2 = {A,B,C, … ,Z} T3 = {auto, break, case, … , while} (2) string(or sentence, word) • a sequence of symbols from some alphabet T. (3) length • the number of symbols in the string. • denoted by |ω|

(4) empty string • a string consisting of no symbols. • denoted by ε or λ. (5) T* denotes the set of all strings of symbols over the alphabet T, including the empty string. T+ = T* - {ε} T* : T star T+ : T dagger (6) Language is any set of strings over an alphabet.(Text p.40) (or A Language L over the alphabet T is a subset of T*.) L ⊆ T*

(ωR)R=ω • More definitions (1) concatenation • u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm • u • v를 보통 uv로 표기. • uε= u = εu • ∀u,v ∈ T*, uv ∈ T*. • |uv| = |u| + |v| (2) anrepresents n a's. a0 = ε (3) the reversalof a string ω, denoted ωR is the string ω written in reverse order: i.e., if ω = a1a2...an then ωR = anan-1...a1.

(4) language product LL' = {xy| x ∈ L and y ∈ L'} (5) The powersof a language L are defined recursively by: L0 = {ε} Ln = LLn-1 for n  1. (6) L* : reflexivetransitiveclosure = L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… = (7) L+ : transitive closure = L1 ∪ L2 ∪... ∪ Ln ∪ ... = L* - L0

2.2 Grammar • Language • 문장(sentence)들을 원소로 갖는 집합 • 언어를 어떻게 표현할 것인가 ? • Grammar • terminal : 정의된 언어의 알파벳 • nonterminal : • 스트링을 생성하는 데 사용되는 중간 과정의 심볼 • 언어의 구조를 정의하는데 사용 • grammar symbol (V)

G = (VN, VT, P, S) • VN : a finite set of nonterminal symbols • VT : a finite set of terminal symbols VN∩VT =  , VN∪ VT = V • P : a finite set of production rules α → β, α∈ V+, β∈ V* lhs rhs • S : start symbol(sentence symbol)

[예] G = ( {S, A}, {a, b}, P, S )Text p.47 [예제 2.8] • P : S → aAS S → a A → SbA A → ba A → SS ⇒ S → aAS | a A → SbA | ba | SS

Inroduction to FL theory * * + • Derivation 1. ⇒: "directly produce" or "directly derive" if α → β∈ P and  , δ∈ V* then  αδ ⇒ βδ 2. ⇒: Suppose α1,α2,...,αn ∈ V* and α1 ⇒α2 ⇒… ⇒αn, then α1⇒ αn (zero or more derivations) 3. ⇒ : one or more derivations. cf) → : production rule에서 사용. “may be replaced by” ⇒ : derivation할 때 사용한다.

* * * * • L(G) : Language generated by grammar G L(G) = {ω | S ⇒ ω, ω ∈ VT*} ☞ ω is a sentential form of G if S ⇒ ω and ω ∈ V*. ω is a sentence of G if S ⇒ ω and ω ∈ VT*. P : S → aA | bB | ε A → bS B → aS S ⇒abba 유도 과정 S ⇒aA (생성규칙 S → aA) ⇒abS (생성규칙 A → bS) ⇒abbB (생성규칙 S → bB) ⇒abbaS (생성규칙 B → aS) ⇒abba (생성규칙 S → )

G1 = ( {S}, {a}, P, S )을 이용하여 L(G1) P : S → a | aS L (G1) = { an | n  1 } • Language design

G = ( {A, B, C}, {a, b, c}, P, A) P : A → abc A → aBbc Bb → bB Bc → Cbcc bC → Cb aC → aaB aC → aa L(G) = { anbncn | n  1 }

(===>) ex1) S → 0S1 | 01 ex2) S → aSb | c ex3) A → aB B → bB | b ex4) A → abc A → aBbc Bb → bB Bc → Cbcc bC → Cb aC → aaB aC → aa

Grammar Design • L = { an | n  0 } 일 때 문법 : A → aA | ε • L = { an | n  1 } 일 때 문법 : A → aA | a • Embedded production A → aAb ex1) L1 = { anbn | n  0 } ex2) L2 = { 0i1j | i  j, i,j  1 } ex3) Constructs of Conventional PL

1) 파스칼 언어의 상수 정의 부분: • 상수정의 부분은 CONST라는 예약어로 시작하며 하나의 상수 정의는 a=b의 형태를 갖는다. 여기서, a는 identifier를 b는 상수를 나타내는 terminal심벌이다. 상수정의부분은 선택적이며 각각의 상수정의는; 으로 구분한다. • 다음은 상수정의 부분의 예이다. CONST ON = TRUE; OFF = FALSE; EPSILON = 1.0E-10;

2) C 언어의 정수 선언 부분: • 정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의 identifier를 나타낸다. • 그리고; 으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;과 같다. ※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다.

In order to prove that a grammar generates a language L i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 55쪽 [예제 2.16] proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S. G = ( { S }, { ( , ) }, {S → (S)S |ε}, S ) ⇔ All strings of balanced parentheses.

* * * (=>) Every sentence derivable from S is balanced. (i.e., S ⇒ ω, ω: balanced) By induction on the number of steps in a derivation. i) n = 1일 때, S ⇒ ε, ε is surely balanced. ii) Suppose that all derivations of fewer than n steps produce balanced sentences. iii) Consider a leftmost derivation of exactly n steps. S ⇒ (S)S ⇒ (x)S ⇒ (x)y By the hypothesis x, y : balanced. Thus (x)y balanced.

* * (<=) Every balanced string is derivable from S. By induction on the length of a string. i) |ω| = 0, S ⇒ ε (the empty string is derivable from S.) ii) Assume that every balanced string of length less than 2n is derived from S. iii) Consider a balanced string ω of length 2n. Let (x) : shortest prefix of ω being balanced. Thus ω = (x)y, where x, y : balanced. Since |x|, |y | < 2n, they are derivable from S by inductive hypothesis. Thus S ⇒ (S)S ⇒ (x)S ⇒ (x)y = ω Therefore, (x)y is also derivable from S.

2.3 Chomsky Hierarchy • Noam Chomsky • According to the form of the productions. α → β ∈ P • Type 0 : No restrictions(unrestricted grammar) • Type 1 : Context-sensitive grammar(CSG). → β, |  |  | β| • Type 2 : Context-free grammar(CFG). A →, where A : nonterminal,  ∈ V*. • Type 3 : Regular grammar(RG). A → tB or A → t, (right-linear) A → Bt or A → t, (left-linear) where, A, B : nonterminal, t ∈ VT*.

REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language) • Examples of Formal Language • simple matching language : Lm = {anbn | n ≥ 0} CFL • double matching language : Ldm = {anbncn | n ≥ 1} CSL • mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL • palindrome language : Lr = {ω | ω = ωR } CFL • parenthesis language : Lp = {ω | ω: balanced parenthesis} CFL

The Chomsky Hierarchy of Languages

Languages & Recognizers

2.1 Language

2.1 Language

Presentation Transcript

2.1

2.1

2.1

Topic 2.1 The Language of Symbol

2.1

2.1

E.2.1

2.1

2.1

2.1

2.1

Output 2.1 产出 2.1

2.1 The definition of language

§ 2.1

2.1

2.1

Lesson 2.1

Wilson 2.1

2.1

2.1

2.1

2.1