290 likes | 429 Views
PATR II Compiler. Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf. Notationskonventionen. Instantiierungsmodus von Argumenten Blau : Input-Argumente Rot : Output-Argumente Cut roter Cut ! grüner Cut ! Prädikatsdefinitionen abgeschlossen
E N D
PATR II Compiler Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf
Notationskonventionen • Instantiierungsmodus von Argumenten • Blau: Input-Argumente • Rot: Output-Argumente • Cut • roter Cut ! • grüner Cut ! • Prädikatsdefinitionen • abgeschlossen • wird fortgesetzt PATR II Compiler
Direktiven % external resources :- [tokenize]. % load tokenizer % operators :- op(510, xfy, : ). % attr:val :- op(600, xfx, ===). % path equation :- op(1100,xfx,'--->'). % syntax rule, lexical entry :- op(1200,xfx,'::'). % description annotation PATR II Compiler
3 Compiler-Komponenten • Tokenizer • Input: PATR II-Grammatik • Output: Token-Zeilen • Präprozessor • Input: Token-Zeilen • Output: Token-Sätze • Syntax-Compiler • Input: Token-Sätze • Output: Prolog-Klauseln compile_grammar(File):- clear_grammar, tokenize_file(File), read_sentences, compile_sentences. PATR II Compiler
Tokenizer-Input ; Shieb1.ptr ; Sample grammar one from Shieber 1986 ; Grammar Rules ; ------------------------------------------------------------ Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>. Rule {trivial verb phrase} VP --> V: <VP head> = <V head>. ; Lexicon ; ---------------------------------------------------------------- Word uther: <cat> = NP <head agreement gender> = masculine <head agreement person> third <head agreement number> = singular. PATR II Compiler
Tokenizer Output = Präprozessor Input line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]). line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1), ... line(3,[ ]). line(4,[ ]). line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]). line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$), ... line(7,[ ]). line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]). line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]). line(10,[b(1),o($<$),u($S$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($VP$),b(1), ... line(11,[b(1),o($<$),u($VP$),b(1),l($head$),b(1),l($subject$),o($>$),b(1),o($=$),b(1), ... line(12,[b(1)]). line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]). line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]). line(15,[b(1),o($<$),u($VP$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($V$),b(1),... ... line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),... line(42,[eof]). PATR II Compiler
Präprozessor Output = Compiler Input Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren. sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),... sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),... sentence(16,24,[u($Word$),l($uther$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(25,30,[u($Word$),l($knights$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(37,41,[u($Word$),l($sleep$),o($:$),o($<$),l($cat$),o($>$),o($=$),... sentence(42,42,[eof]). PATR II Compiler
Präprozessor: Main Loop read_sentences:- abolish(cnt/1), write('preprocessing...'), nl, repeat, count(I), read_sentence(N,M,S), assert(sentence(N,M,S)), put(13), tab(3), write(I), write(' sentences preprocessed'), S = [eof], !,nl. read_sentence(N,M,S):- retract(line(N,L)), read_sentence(L,N,M,S), !. Backtracking PATR II Compiler
Präprozessor: Satz lesen read_sentence([eof],N,N,[eof]):- !. % end of file read_sentence([o($.$)|_],N,N,[]):- !. % end of sentence read_sentence([o($;$)|_],N,M,S):- !, % skip comment N1 is N+1, retract(line(N1,L)), % next line read_sentence(L,N1,M,S). read_sentence([],N,M,S):- !, % end of line N1 is N+1, retract(line(N1,L)),% next line read_sentence(L,N1,M,S). read_sentence([b(_)|T1],N,M,T2):- !, % skip blanks read_sentence(T1,N,M,T2). read_sentence([H|T1],N,M,[H|T2]):- % collect tokens read_sentence(T1,N,M,T2). PATR II Compiler
Compiler: Main Loop compile_sentences:- abolish(cnt/1), write('compiling...'), nl, retract(sentence(N,M,S)), compile_sentence((N,M),C,S,[]), assert(C), count(I), put(13), tab(3), write(I), write(' sentences compiled'), S = [eof], !, nl. Backtracking PATR II Compiler
Compiler: Satztypen % compile_sentence(Position,Clause,Sentence,Rest) compile_sentence(_,C) --> [eof], !, {C = finished}. compile_sentence(_,C) --> syntax_rule(C),!. compile_sentence(_,C) --> lex_entry(C),!. compile_sentence(_,C) --> template(C),!. compile_sentence(P,_,_,_):- P = (N,M), nl, write(' error in sentence between lines '), write(N), write(' and '), write(M), nl, fail. PATR II Compiler
Syntax-Regeln syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C). syntax_rule_cont((Expansion :: Descr)) --> rule_name, sr_expansion(Expansion,Sugar), rs(:), !, sr_path_equations(Equations,Sugar), {sr_sugar_cats(Sugar,Equations,Descr)}. PATR II Compiler
Reservierte Symbole rs(=) --> [o($=$)], !. rs(:) --> [o($:$)], !. rs(<) --> [o($<$)], !. rs(>) --> [o($>$)], !. rs('{') --> [o(${$)], !. rs('}') --> [o($}$)], !. rs('Rule') --> [u($Rule$)], !. rs('Word') --> [u($Word$)], !. rs('Let') --> [u($Let$)], !. rs('be') --> [l($be$)], !. rs('-->') --> [o($-$),o($-$),o($>$)], !. Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:). PATR II Compiler
Weitere Terminalsymbole uatom(A) --> [u(S)],{atom_string(A,S)}. latom(A) --> [l(S)],{atom_string(A,S)}. satom(A) --> [s(S)],{atom_string(A,S)}. int(I) --> [i(I)]. atom(A) --> uatom(A), !. atom(A) --> latom(A), !. atom(A) --> satom(A), !. atomic(A) --> atom(A), !. atomic(A) --> int(A), !. PATR II Compiler
Regelnamen Regelnamen werden überlesen und nicht in die Prolog-Repräsentation der Regeln übernommen. rule_name --> rs('{'), !, % start of rule name curley_braces_terminated_string. rule_name --> []. % rule names are optional curley_braces_terminated_string --> rs('}'), !. % end of rule name curley_braces_terminated_string --> [_], % read any symbol curley_braces_terminated_string. PATR II Compiler
Regelexpansion sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar), rs('-->'), sr_rhs(RHS,RSugar). sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar). sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar). ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars). ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar). fsd(Var,(FSD,Var)) --> uatom(FSD). PATR II Compiler
Syntax-Regeln: Pfadgleichungen sr_path_equations((E,Es),Sugar) --> sr_path_equation(E,Sugar), sr_path_equations(Es,Sugar). sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar). sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=), sr_val(RHS,Sugar). sr_val(V,Sugar) --> sr_path(V,Sugar). sr_val(V,_) --> atomic(V). PATR II Compiler
Syntax-Regeln: Pfade sr_path(Var,Sugar) --> rs(<), fsd(FSD), rs(>), {member((FSD,Var),Sugar)}, !. sr_path(Var:P,Sugar) --> rs(<), fsd(FSD), ne_feature_seq(P), rs(>), {member((FSD,Var),Sugar)}, !. ne_feature_seq(F) --> feature(F). ne_feature_seq(F:P) --> feature(F), ne_feature_seq(P). fsd(FSD) --> uatom(FSD). feature(F) --> atomic(F). PATR II Compiler
Syntaktischer Zucker sr_sugar_cats([(Cat,Var)|Sugar],Equations, ((Var:cat === Cat),Descr)):- sr_sugar_cats(Sugar,Equations,Descr). sr_sugar_cats([],Descr,Descr). Rule {sentence formation} X0 --> X1 X2: <X0 cat> = S <X1 cat> = NP <X2 cat> = VP <X0 head> = < X2 head> <X2 head subject> = <X1 head>. Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>. PATR II Compiler
Lexikalische Einträge lex_entry(C) --> rs('Word'), !, lex_entry_cont(C). lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L), rs(:), !, lex_definition(FS, Descr). lexeme(L) --> atom(L). PATR II Compiler
Lexikon: Merkmalsstrukturen lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef), lex_definition(FS,LDefs). lex_definition(FS,LDef) --> lexdef(FS,LDef). lexdef(FS,LDef) --> template_name(FS,LDef), !. lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !. PATR II Compiler
Lexikon: Pfadgleichungen lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !, lex_val(FS, RHS). lex_path(FS,FS:P) --> rs(<), ne_feature_seq(P), rs(>), !. lex_val(FS,V) --> lex_path(FS,V). lex_val(_,V) --> atomic(V). PATR II Compiler
Templates template(C) --> rs('Let'), !, template_cont(C). template_cont((N :- TDef)) --> template_name(FS,N), rs('be'), template_definition(FS,TDef), {assert(template(N))}. PATR II Compiler
Templates: Head & Body template_name(FS,N) --> atom(A), {N =.. [A,FS]}. template_definition(FS,TDef) --> lex_definition(FS,TDef). PATR II Compiler
Löschen einer Grammatik clear_templates:- template(T), T =.. [F,_], abolish(F/1), fail. clear_templates:- abolish(template/1). clear_grammar:- abolish('::'/2), abolish(line/2), abolish(sentence/3), clear_templates. PATR II Compiler
Compiler Output A ---> B , C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head. A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular. PATR II Compiler
Grammatiken PATR II / Prolog shieb1.ptr / shieb1.ari shieb2.ptr / shieb2.ari shieb3.ptr / shieb3.ari shieb4.ptr / shieb4.ari Tokens shieb1.tok (Tokenizer) shieb1.snt (Präprozessor) PATR II Interpreter patrlcl.ari: Left-corner mit Linking patrlclc.ari: Left-corner mit Linking und Syntaxbäumen patr-ii.ari: DCG PATR II Compiler patrcomp.ari patr-ii.ari: DCG Resourcen PATR II Compiler
Offene Probleme und Erweiterungen • Syntaktischer Zucker der Form VP_1 VP_2 X • Lexikalische Regeln • Templates in Syntaxregeln • Negation und Disjunktion • Default Vererbung (Priority Union) • ... PATR II Compiler
Literatur • Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes. • Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley. • Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall. PATR II Compiler