Una introducciÃ³n a los algoritmos del Parsing

Una introducción a los algoritmos del Parsing

Pregunta inicial… ¿Cómo se puede determinar si un código escrito en un lenguaje de programación tiene sintaxis correcta?

Leftmost Derivations • Una derivación a izquierda de una cadena sólo permite resolver en cada paso la variable más a la izquierda • aaBA => aaBa NO hace parte de una derivación a izquiereda. • Si w está en L(G) entonces w admite una derivación a izquierda (Teorema 4.1.1).

Ambiguedad • Una gramática G es ambigua si existe w en L(G) que admite dos derivaciones a izquierda. s Es ambigua porque aa admite dos derivaciones a izquierda: aS | Sa | a S => aS =>aa S => Sa =>aa Un Lenguaje es inherentemente ambiguo, si todas las gramáticas que lo generan son ambiguas.

Ejemplo 4.1.2 s Genera el lenguaje b*ab*. Es ambigua porque bab admite dos derivaciones a izquierda: bS | Sb | a S => bS =>bSb=>bab S => Sb =>bSb=>bab b*ab* se puede generar por las gramáticas no ambiguas: S bS | aA A bA |  S bS | A A Ab | a Existe una correspondencia biyectiva entre los árboles de derivación y las derivaciones a izquierda (a derecha).

Grafo de una gramática. S aS | bB |  B aB | bS | bC C aC |  S aS bB  aaS a baB abB bbC bbS aaaS aa baaB abbS bb bbaS abbC aabB bbaC abaB babS babC bb bbbB

Recorrido transversal descendente

EJEMPLO Dada la gramática: AE: V = {S, A, T} Σ = {b, +, (, )} P: 1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) analizar la cadena (b + b)

(b) b (T) ((A)) (b+T) ( b+b) T (T+T) ((A)+T) (A) (A+T) (A+T+T) (T+T+T) S A (A+T+T+T) b+T (b)+T T+T (A)+T (T)+T ((A))+T A+T (A+T)+T (T+T)+T (A+T+T)+T b+T+T A+T+T T+T+T (A)+T+T (T)+T+T (A+T)+T+T b+T+T+T A+T+T+T T+T+T+T (A)+T+T+T 1 A+T+T+T+T T+T+T+T+T A+T+T+T+T+T T+T,A+T+T,(T) (A),T+T,A+T+T T+T,A+T+T,(T),(A+T)

Breadth-First Top-down Parsing Algorithm input: context-free grammar G = (V, Σ, P, S) string p Σ* queue Q • initialize T with root S INSERT(S, Q) 2. repeat 2.1. q≔REMOVE(Q) 2.2. i≔ 0 2.3. done≔false Let q = uAv where A is the leftmost variable in q. 2.4. repeat 2.4.1. if there is no A rule numbered greater than ithen done ≔ true 2.4.2. if not done then Let A → w be the first A rule with number grater than i. Let j be the number of this rule. 2.4.2.1. ifuwv∉Σ* and the terminal prefix or uwv matches a prefix of pthen 2.4.2.1.1. INSERT(uwv, Q) 2.4.2.1.2. Add node uwv to T. Set a pointer from uwv to q. end if end if 2.4.3 i ≔ j until done orp = uwv until EMPTY( Q) orp = uwv 3. ifp = uwvthen accept else reject

Recorrido descendente profundo

S AE: V = {S, A, T} Σ = {b, +, (, )} P:1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) p= (b + b) A T b (A) [S,1] [A,2] (A+T) (T) [T,5] [T,4] [(A),2] [(A),3] (T+T) [(T),4] [(T),5] [(A+T),2] [(T+T),4] (b) ((A)) (b+T) [(b+T),4] (b+b) (b+b)

Depth-First Top-down Algorithm input: context-free grammar G = (V, Σ, P, S) string p Σ* stack S 1. PUSH([S, 0], S) 2. repeat 2.1 [q, i] = POP(S) 2.2 dead-end = false 2.3 repeat Let q = uAv where A is the leftmost variable in q. 2.3.1 if u is not a prefix of p then dead-end = true 2.3.2 if there are no A rules numbered greater ithen dead-end = true 2.3.3 if not dead-end then Let A → w be the first A rule with number greater than i. Let j be the number of this rule. 2.3.3.1 PUSH([q, j], S) 2.3.3.2 q = uwv 2.3.3.3 i = 0 end if until dead-end or q  Σ* untilq = p or EMPTY(S) 3. ifq = pthen accept else reject

S AE: V = {S, A, T} Σ = {b, +, (, )} P:1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) p= (b )+ b A T [S,1] b (A) [A,2] [T,5] [T,4] [(A),2] [(A),3] (A+T) (T) [(T),4] [(T),5] [(A+T),2] [(T+T),4] [(T+T),5] (T+T) (A+T+T) [(A+T),3] (b) ((A)) (b+T) [(A+T+T),2] ((A)+T)

Algoritmos Ascendentes • Reducción: Dado w encontrar las w’ tales que w’=>w. En este caso w’ es una reducción de w. • Pattern Matching Scheme: Se descompone w en w=uv, los sufijos de u se comparan con los lados derechos de las reglas. • Un “matching” se obtiene cuando se encuentra u=u1q y una regla Aq entonces w se reduce a u1Av.

Reducción de (A+T) u v Regla Reducción  (A+T) ( A+T) ( A +T) SA (S+T) ( A+ T) ( A+T ) AA+T (A) (A+T ) AT (A+A) ( A+T)

(b+b) Algoritmo ascendente transversal (T+b) (b+T) AE: V = {S, A, T} Σ = {b, +, (, )} P:1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) p= (b + b) (T+T) (A+b) (b+A) (S+b) (T+A) (A+T) (b+S) (T+S) (A+A) (S+T) (A) (A+S) (S+A) T (S) (A+b), (T+T), (b+A) A (S+S) (S+T), (A+A),(A), (T+S) (T+T), (b+A), (S+b) (A+A),(A), (T+S), (S+A) (T+T), (b+A), (S+b), (A+T) (A), (T+S), (S+A),(A+S) S (b+A), (S+b), (A+T), (T+A) (T+S), (S+A), (A+S), (S), T (S+b), (A+T), (T+A), (b+S) (S+S), A (A+T), (T+A), (b+S), (S+T) (S+A), (A+S), (S), T (T+A), (b+S), (S+T), (A+A),(A) (A+S), (S), T, (S+S) A (b+S), (S+T), (A+A),(A), (T+S) S T, (S+S)

Breadth-First Bottom-up Parser Input: context-free grammar G = (V, Σ, P, S) string p Σ* queue Q 1. Initialize T with root p INSERT(p,Q) 2. repeat q ≔ REMOVE(Q) 2.1. for each rule A→ w in P do 2.1.1. if q = uwv with v Σ* then 2.1.1.1 INSERT(uAv, Q) 2.1.1.2 Add node uAv to T. Set a pointer from uAv to q. end if end for untilq = SorEMPTY(Q) 3. Ifq = Sthen accept else reject

(T+b) [ (T+b , 4 , ) ] [ (T , 2 , +b) ] AE: V = {S, A, T} Σ = {b, +, (, )} P:1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) (A+b) (T+T) (A+b) (T+A) (A+b) (T+A) [ (A+b , 2 ,) ] (A+T)

Depth-Bottom-up Parsing Algorithm input: context-free grammar G = (V, Σ, P, S) with nonrecursive start symbol string p Σ* stack S • PUSH([λ, 0, p], S) • repeat 2.1 [u, i, v] ≔ POP(S) 2.2 dead-end ≔ false 2.3 repeat Find the first j > i with rule number j that satisfies i) A → w with u = qw and A ≠ Sor ii) S → w with u = w and v = λ 2.3.1. if there is such a j then 2.3.1.1. PUSH ([u, j, v], S) 2.3.1.2. u≔ qA 2.3.1.3. i≔ 0 end if 2.3.2 if there is no such j and v ≠ λthen 2.3.2.1. shift(u, v) 2.3.2.2. i≔ 0 end if 2.3.3 if there is no such j and v = λthen dead-end ≔ true until (u = S) or dead-end until (u = S) orEMPTY(S) 3. ifEMPTY(S) then reject else accept

AE: V = {S, A, T} Σ = {b, +, (, )} P:1. S → A 2. A → T 3. A → A + T 4. T → b 5. T → (A) u i v  0 (b+b) ( 0 b+b) (b 0 +b) (T 0 +b) (A 0 +b) (A+ 0 b) (A+b 0 ) [ A, 2 , ] (A+T 0 ) [ T, 2 , ] (A+A 0 ) [ (A), 5 , ] (A+A) 0  [ (A+T, 3 , ) ] (A+T 2 ) [ (A+T, 2 , ) ] (A 0 ) [ (A+b, 4 , ) ] (A ) 0  [ (T, 2 , +b) ] T 0  [ (b, 4 , +b) ] A 0  [, 0 , (b+b) ] S 0 

Notas Bibliográficas • Ambigüedad: Floyd[1962], Cantor[1962], Chomsky and Schutzenberger [1963]. • Lenguajes Inherentemente ambiguos: Harrison[1978], Ginsburg and Ullian[1966]. • Depth-first: Dennig, Dennis and Qualitz[1978]. • Referencia Clásica: Knuth: “The Art of Computer programing: Vol I Fundamental Algorithms”

Una introducciÃ³n a los algoritmos del Parsing

Una introducciÃ³n a los algoritmos del Parsing

Presentation Transcript

Lexical Analysis

Chapter 4 - Part 3: Bottom-Up Parsing

CS 388: Natural Language Processing: Semantic Parsing

Information Communication Theory

Grammatical processing with LFG and XLE

Introduction to Natural Language Processing (600.465) Parsing: Introduction

COMPILER CONSTRUCTION

Introduction to SAX: a standard interface for event-based XML parsing

Natural Language Processing COMPSCI 423/723 Rohit Kate

Lecture 3 Introduction to Parsing and Top-Down Parsing

Theory of Compilation 236360 Erez Petrank

Some Observations on Hindi Dependency Parsing

Trees, Grammars, and Parsing