190 likes | 294 Views
Natural Language Processing. CS480/580. Levels of Linguistic Analysis. Phonology---recognize speech sounds Morphology---analysis of word forms (e.g., adding s to make a plural etc.) Syntax---sentence structure Semantics---meaning Pragmatics---relation of language to context. Tokenization.
E N D
Natural Language Processing CS480/580
Levels of Linguistic Analysis • Phonology---recognize speech sounds • Morphology---analysis of word forms (e.g., adding s to make a plural etc.) • Syntax---sentence structure • Semantics---meaning • Pragmatics---relation of language to context
Tokenization • A string broken into words, punctuations removed, and key information represented as a sequence of words or tokens. • E.g., “How are you today?” is converted to [how, are, you, today].
Tokenize.pl lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). tokenize("This is CS480/580 course", X). X = [this, is, cs480580, course]. name(john,X). X = [106, 111, 104, 110].
Template System • Templates --- stored sentence patterns • Each template is accompanied by a translation schema • E.g., [X, is, a , Y] is translated to Y(X). • process([X, is, a, Y]) :- Fact =.. [Y, X], assert(Fact). • Process([is, X, a T]) :- Query =.. [Y, X], call(Query).
Template.pl grab_word([32|A], [], A) :- !. grab_word([], [], []). grab_word([A|B], C, D) :- punctuation_mark(A), !, grab_word(B, C, D). grab_word([D|A], [E|B], C) :- grab_word(A, B, C), lower_case(D, E). punctuation_mark(A) :- A=<47. punctuation_mark(A) :- A>=58, A=<64. punctuation_mark(A) :- A>=91, A=<96. punctuation_mark(A) :- A>=123. lower_case(A, B) :- A>=65, A=<90, !, B is A+32. lower_case(A, A). write_str([A|B]) :- put(A), write_str(B). write_str([]). read_str_aux(-1, []) :- !. read_str_aux(10, []) :- !. read_str_aux(13, []) :- !. read_str_aux(A, [A|B]) :- read_str(B). do_one_sentence :- write(>), read_str(A), tokenize(A, B), process(B). note(A) :- asserta(A), write('OK'), nl. read_atom(A) :- read_str(B), name(A, B). start :- write('TEMPLATE.PL at your service.'), nl, write('Terminate by pressing Break.'), nl, repeat, do_one_sentence, fail. check(A) :- call(A), !, write('Yes.'), nl. check(_) :- write('Not as far as I know.'), nl. read_num(A) :- read_str(B), name(A, B).
remove_s(A, C) :- name(A, B), remove_s_list(B, D), name(C, D). read_str(B) :- get0(A), read_str_aux(A, B). remove_s_list([115], []). remove_s_list([A|B], [A|C]) :- remove_s_list(B, C). process([B, is, a, A]) :- !, C=..[A, B], note(C). process([A, is, an, B]) :- !, process([A, is, a, B]). process([is, B, a, A]) :- !, C=.. [A, B], check(C). process([is, A, an, B]) :- !, process([is, A, a, B]). process([A, are, B]) :- !, remove_s(A, D), remove_s(B, C), F=..[C, E], G=..[D, E], note((F:-G)). process([does, B, A]) :- !, C=..[A, B], check(C). process([A, B]) :- \+ remove_s(A, _), remove_s(B, C), !, D=..[C, A], note(D). process([A, B]) :- remove_s(A, C), \+ remove_s(B, _), !, E=..[B, D], F=..[C, D], note((E:-F)). process(_) :- write('I do not understand.'), nl. tokenize([], []) :- !. tokenize(A, [B|E]) :- grab_word(A, C, D), name(B, C), tokenize(D, E). start. TEMPLATE.PL at your service. Terminate by pressing Break. >CS480 is a course. OK >is CS480 a course? Yes. >is cs471 a course? Not as far as I know. >cs471 is a course. OK >is cs471 a course? Yes.
Generative Grammars • Templates are inadequate to describe human language (in the last example only sentences that were allowed was X is a Y.) • John arrived • Max said John arrived • Bill claimed Max said John arrived • Mary thought Bill claimed Max said John arrived • Chomsky’s suggestion: Treat syntax as a problem in set theory---express infinite set as a finite description
Context Free Grammars • Phrase Structure Rules • S NP VP • NP Det N • N N PP • N N N • PP P NP • VP IV VP TV NP VP DV NP NP • Lexical Entries • N book, cow, course, … • P in, on, with, … • Det the, every, … • IV ran, hid, … • TV likes, hit, … • DV gave, showed Noam Chomsky
Context-Free Derivations • S NP VP Det N VP the N VP the kid VP the kid IV the kid ran • Penn TreeBank bracketing notation (Lisp-like) • (S (NP (Det the) (N kid)) (VP (IV ran))) • Theorem: A sequence has a derivation if and only if it has a parse tree
A simple Parser verb_phrase(A, C) :- verb(A, B), noun_phrase(B, C). verb_phrase(A, C) :- verb(A, B), sentence(B, C). determiner([the|A], A). determiner([a|A], A). sentence(A, C) :- noun_phrase(A, B), verb_phrase(B, C). noun_phrase(A, C) :- determiner(A, B), noun(B, C). noun([dog|A], A). noun([cat|A], A). noun([boy|A], A). noun([girl|A], A). verb([chased|A], A). verb([saw|A], A). verb([said|A], A). verb([believed|A], A). 2 ?- sentence([the, cat, saw, the, dog], []). true . 3 ?- sentence([the, dog, saw, the, dog], []). true . 4 ?- sentence([a, dog, chased, the, cat], []). true . 5 ?- sentence([that, dog, chased, the, cat], []). false.
Definite Clause Grammar (DCG) • This is a Prolog notation to provide an easy way to write grammar rules. • E.g., sentence non_phrase, verb_phrase. • This is equivalent to the rule: • sentence(X,Z) :- noun_phrase(X,Y), verb_phrase(Y,Z). • Also, noun [dog] or noun [dog] [cat]; [boy]; [girl] • or verb [gives, up] where “gives up” is a single verb. • A query to the above sentence rule will be sentence/2 E.g., sentence([the dog, chased, the, cat],[]). Try sentence([A,B,C,D,E],[]) or sentence([the, A, B, C, cat|E],[]). Non-terminal symbols can also take arguments: e.g., sentence(N) noun_phrase(N), verb_phrase(N).
Parser2.pl based on DCG sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. verb_phrase --> verb, noun_phrase. verb_phrase --> verb, sentence. determiner --> [the]. determiner --> [a]. noun --> [dog]; [cat]; [boy]; [girl]. verb --> [chased]; [saw]; [said]; [believed]. verb --> [saw]. verb --> [said]. verb --> [believed].
Grammatical Features • How to handle agreement in tense and number between the noun and the verb? sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(singular) --> [dog];[cat];[boy];[girl]. noun(plural) --> [dogs];[cats];[boys];[girls]. verb(singular) --> [chases];[sees];[says];[believes]. verb(plural) --> [chase];[see];[say];[believe].
sentence(plural, [the, dogs, A, B, C],[]). A = chase, B = a, C = dog ; A = chase, B = a, C = cat ; A = chase, B = a, C = boy ; A = chase, B = a, C = girl ; A = chase, B = the, C = dog
Morphology • How to generate plural nouns from singular? • How to generate third person singular verbs from plural verbs? • Mostly by adding: s
Sentence(N) --> noun_phrase(N), verb_phrase(N). noun_phrase(N) --> determiner(N), noun(N). verb_phrase(N) --> verb(N), noun_phrase(_). verb_phrase(N) --> verb(N), sentence. determiner(singular) --> [a]. determiner(_) --> [the]. determiner(plural) --> []. noun(N) --> [X], { morph(noun(N),X) }. verb(N) --> [X], { morph(verb(N),X) }. morph(noun(singular),dog). % Singular nouns morph(noun(singular),cat). morph(noun(singular),boy). morph(noun(singular),girl). morph(noun(singular),child). morph(noun(plural),children). % Irregular plural nouns morph(noun(plural),X) :- % Rule for regular plural nouns remove_s(X,Y), morph(noun(singular),Y). morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).
morph(verb(plural),chase). % Plural verbs morph(verb(plural),see). morph(verb(plural),say). morph(verb(plural),believe). morph(verb(singular),X) :- % Rule for singular verbs remove_s(X,Y), morph(verb(plural),Y). % remove_s(+X,-X1) [lifted from TEMPLATE.PL] % removes final S from X giving X1, % or fails if X does not end in S. remove_s(X,X1) :- name(X,XList), remove_s_list(XList,X1List), name(X1,X1List). remove_s_list("s",[]). remove_s_list([Head|Tail],[Head|NewTail]) :- remove_s_list(Tail,NewTail).