1 / 29

PATR II Compiler

PATR II Compiler. Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf. Notationskonventionen. Instantiierungsmodus von Argumenten Blau : Input-Argumente Rot : Output-Argumente Cut roter Cut ! grüner Cut ! Prädikatsdefinitionen abgeschlossen

Download Presentation

PATR II Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PATR II Compiler Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf

  2. Notationskonventionen • Instantiierungsmodus von Argumenten • Blau: Input-Argumente • Rot: Output-Argumente • Cut • roter Cut ! • grüner Cut ! • Prädikatsdefinitionen • abgeschlossen • wird fortgesetzt PATR II Compiler

  3. Direktiven % external resources :- [tokenize]. % load tokenizer % operators :- op(510, xfy, : ). % attr:val :- op(600, xfx, ===). % path equation :- op(1100,xfx,'--->'). % syntax rule, lexical entry :- op(1200,xfx,'::'). % description annotation PATR II Compiler

  4. 3 Compiler-Komponenten • Tokenizer • Input: PATR II-Grammatik • Output: Token-Zeilen • Präprozessor • Input: Token-Zeilen • Output: Token-Sätze • Syntax-Compiler • Input: Token-Sätze • Output: Prolog-Klauseln compile_grammar(File):- clear_grammar, tokenize_file(File), read_sentences, compile_sentences. PATR II Compiler

  5. Tokenizer-Input ; Shieb1.ptr ; Sample grammar one from Shieber 1986 ; Grammar Rules ; ------------------------------------------------------------ Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>. Rule {trivial verb phrase} VP --> V: <VP head> = <V head>. ; Lexicon ; ---------------------------------------------------------------- Word uther: <cat> = NP <head agreement gender> = masculine <head agreement person> third <head agreement number> = singular. PATR II Compiler

  6. Tokenizer Output = Präprozessor Input line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]). line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1), ... line(3,[ ]). line(4,[ ]). line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]). line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$), ... line(7,[ ]). line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]). line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]). line(10,[b(1),o($<$),u($S$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($VP$),b(1), ... line(11,[b(1),o($<$),u($VP$),b(1),l($head$),b(1),l($subject$),o($>$),b(1),o($=$),b(1), ... line(12,[b(1)]). line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]). line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]). line(15,[b(1),o($<$),u($VP$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($V$),b(1),... ... line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),... line(42,[eof]). PATR II Compiler

  7. Präprozessor Output = Compiler Input Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren. sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),... sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),... sentence(16,24,[u($Word$),l($uther$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(25,30,[u($Word$),l($knights$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(37,41,[u($Word$),l($sleep$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(42,42,[eof]). PATR II Compiler

  8. Präprozessor: Main Loop read_sentences:- abolish(cnt/1), write('preprocessing...'), nl, repeat, count(I), read_sentence(N,M,S), assert(sentence(N,M,S)), put(13), tab(3), write(I), write(' sentences preprocessed'), S = [eof], !,nl. read_sentence(N,M,S):- retract(line(N,L)), read_sentence(L,N,M,S), !. Backtracking PATR II Compiler

  9. Präprozessor: Satz lesen read_sentence([eof],N,N,[eof]):- !. % end of file read_sentence([o($.$)|_],N,N,[]):- !. % end of sentence read_sentence([o($;$)|_],N,M,S):- !, % skip comment N1 is N+1, retract(line(N1,L)), % next line read_sentence(L,N1,M,S). read_sentence([],N,M,S):- !, % end of line N1 is N+1, retract(line(N1,L)),% next line read_sentence(L,N1,M,S). read_sentence([b(_)|T1],N,M,T2):- !, % skip blanks read_sentence(T1,N,M,T2). read_sentence([H|T1],N,M,[H|T2]):- % collect tokens read_sentence(T1,N,M,T2). PATR II Compiler

  10. Compiler: Main Loop compile_sentences:- abolish(cnt/1), write('compiling...'), nl, retract(sentence(N,M,S)), compile_sentence((N,M),C,S,[]), assert(C), count(I), put(13), tab(3), write(I), write(' sentences compiled'), S = [eof], !, nl. Backtracking PATR II Compiler

  11. Compiler: Satztypen % compile_sentence(Position,Clause,Sentence,Rest) compile_sentence(_,C) --> [eof], !, {C = finished}. compile_sentence(_,C) --> syntax_rule(C),!. compile_sentence(_,C) --> lex_entry(C),!. compile_sentence(_,C) --> template(C),!. compile_sentence(P,_,_,_):- P = (N,M), nl, write(' error in sentence between lines '), write(N), write(' and '), write(M), nl, fail. PATR II Compiler

  12. Syntax-Regeln syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C). syntax_rule_cont((Expansion :: Descr)) --> rule_name, sr_expansion(Expansion,Sugar), rs(:), !, sr_path_equations(Equations,Sugar), {sr_sugar_cats(Sugar,Equations,Descr)}. PATR II Compiler

  13. Reservierte Symbole rs(=) --> [o($=$)], !. rs(:) --> [o($:$)], !. rs(<) --> [o($<$)], !. rs(>) --> [o($>$)], !. rs('{') --> [o(${$)], !. rs('}') --> [o($}$)], !. rs('Rule') --> [u($Rule$)], !. rs('Word') --> [u($Word$)], !. rs('Let') --> [u($Let$)], !. rs('be') --> [l($be$)], !. rs('-->') --> [o($-$),o($-$),o($>$)], !. Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:). PATR II Compiler

  14. Weitere Terminalsymbole uatom(A) --> [u(S)],{atom_string(A,S)}. latom(A) --> [l(S)],{atom_string(A,S)}. satom(A) --> [s(S)],{atom_string(A,S)}. int(I) --> [i(I)]. atom(A) --> uatom(A), !. atom(A) --> latom(A), !. atom(A) --> satom(A), !. atomic(A) --> atom(A), !. atomic(A) --> int(A), !. PATR II Compiler

  15. Regelnamen Regelnamen werden überlesen und nicht in die Prolog-Repräsentation der Regeln übernommen. rule_name --> rs('{'), !, % start of rule name curley_braces_terminated_string. rule_name --> []. % rule names are optional curley_braces_terminated_string --> rs('}'), !. % end of rule name curley_braces_terminated_string --> [_], % read any symbol curley_braces_terminated_string. PATR II Compiler

  16. Regelexpansion sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar), rs('-->'), sr_rhs(RHS,RSugar). sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar). sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar). ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars). ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar). fsd(Var,(FSD,Var)) --> uatom(FSD). PATR II Compiler

  17. Syntax-Regeln: Pfadgleichungen sr_path_equations((E,Es),Sugar) --> sr_path_equation(E,Sugar), sr_path_equations(Es,Sugar). sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar). sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=), sr_val(RHS,Sugar). sr_val(V,Sugar) --> sr_path(V,Sugar). sr_val(V,_) --> atomic(V). PATR II Compiler

  18. Syntax-Regeln: Pfade sr_path(Var,Sugar) --> rs(<), fsd(FSD), rs(>), {member((FSD,Var),Sugar)}, !. sr_path(Var:P,Sugar) --> rs(<), fsd(FSD), ne_feature_seq(P), rs(>), {member((FSD,Var),Sugar)}, !. ne_feature_seq(F) --> feature(F). ne_feature_seq(F:P) --> feature(F), ne_feature_seq(P). fsd(FSD) --> uatom(FSD). feature(F) --> atomic(F). PATR II Compiler

  19. Syntaktischer Zucker sr_sugar_cats([(Cat,Var)|Sugar],Equations, ((Var:cat === Cat),Descr)):- sr_sugar_cats(Sugar,Equations,Descr). sr_sugar_cats([],Descr,Descr). Rule {sentence formation} X0 --> X1 X2: <X0 cat> = S <X1 cat> = NP <X2 cat> = VP <X0 head> = < X2 head> <X2 head subject> = <X1 head>. Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>. PATR II Compiler

  20. Lexikalische Einträge lex_entry(C) --> rs('Word'), !, lex_entry_cont(C). lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L), rs(:), !, lex_definition(FS, Descr). lexeme(L) --> atom(L). PATR II Compiler

  21. Lexikon: Merkmalsstrukturen lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef), lex_definition(FS,LDefs). lex_definition(FS,LDef) --> lexdef(FS,LDef). lexdef(FS,LDef) --> template_name(FS,LDef), !. lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !. PATR II Compiler

  22. Lexikon: Pfadgleichungen lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !, lex_val(FS, RHS). lex_path(FS,FS:P) --> rs(<), ne_feature_seq(P), rs(>), !. lex_val(FS,V) --> lex_path(FS,V). lex_val(_,V) --> atomic(V). PATR II Compiler

  23. Templates template(C) --> rs('Let'), !, template_cont(C). template_cont((N :- TDef)) --> template_name(FS,N), rs('be'), template_definition(FS,TDef), {assert(template(N))}. PATR II Compiler

  24. Templates: Head & Body template_name(FS,N) --> atom(A), {N =.. [A,FS]}. template_definition(FS,TDef) --> lex_definition(FS,TDef). PATR II Compiler

  25. Löschen einer Grammatik clear_templates:- template(T), T =.. [F,_], abolish(F/1), fail. clear_templates:- abolish(template/1). clear_grammar:- abolish('::'/2), abolish(line/2), abolish(sentence/3), clear_templates. PATR II Compiler

  26. Compiler Output A ---> B , C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head. A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular. PATR II Compiler

  27. Grammatiken PATR II / Prolog shieb1.ptr / shieb1.ari shieb2.ptr / shieb2.ari shieb3.ptr / shieb3.ari shieb4.ptr / shieb4.ari Tokens shieb1.tok (Tokenizer) shieb1.snt (Präprozessor) PATR II Interpreter patrlcl.ari: Left-corner mit Linking patrlclc.ari: Left-corner mit Linking und Syntaxbäumen patr-ii.ari: DCG PATR II Compiler patrcomp.ari patr-ii.ari: DCG Resourcen PATR II Compiler

  28. Offene Probleme und Erweiterungen • Syntaktischer Zucker der Form VP_1  VP_2 X • Lexikalische Regeln • Templates in Syntaxregeln • Negation und Disjunktion • Default Vererbung (Priority Union) • ... PATR II Compiler

  29. Literatur • Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes. • Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley. • Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall. PATR II Compiler

More Related