490 likes | 631 Views
Chapter 4. Natural Language Processing. NLP. Language translation / multilingual translation Language understanding Figure 14.5 p. 365 Interaction among component Figure 14.6 p. 366 A speech Waveform. S. NP. VP. John. V. NP. PP. with a telescope. saw. DET. N. PP. boy. the.
E N D
Chapter 4 Natural Language Processing
NLP • Language translation / multilingual translation • Language understanding • Figure 14.5 p. 365 Interaction among component • Figure 14.6 p. 366 A speech Waveform Chapter 4
S NP VP John V NP PP with a telescope saw DET N PP boy the in the park John saw the boy in the park with a telescope. Figure14.5: More Interaction among Components Chapter 4
S NP VP John V NP saw DET N PP PP boy the in the park with a dog John saw the boy in the park with a dog. Figure14.5: More Interaction among Components Chapter 4
S NP VP NP John V N PP saw DET the boy in the park with a statue John saw the boy in the park with a statue. Figure14.5: More Interaction among Components Chapter 4
The cat scares all the birds away. k a t s k a r s A cat’s cares are few. Figure14.6: Local Ambiguity in a Speech Problem Chapter 4
The Problem: English sentences are incomplete descriptions of the information that they are intended to convey: Some dogs are outside. I called Lynda to ask her to the movies. She said she’ d love to go. Somedogs are on the lawn. She was home when I called. Three dogs are on the lawn. She answered the phone. Rover, Tripp, and Spot are I actually asked her. on the lawn. The Good Side:Language allows speakers to be as vague or precise as they like. It also allows speakers to leave out things they believe their hearers already know. Chapter 4
The Problem: The same expression means different things in different contexts: Where’s the water? (in a chemistry lab, it must be pure) Where’s the water? (when you are thirsty, it must be potable) Where’s the water? (dealing with a leaky roof, it can be filthy) The Good Side: Language lets us communicate about an infinite world using a finite (and thus earnable) number of symbols. Chapter 4
The Problem: No natural language program can be complete because new words, expressions, and meanings can be generated Quite freely: I’ll fax it to you. The Good Side: Language can evolve as the experiences that we want to communicate about evolve. Chapter 4
The problem: There are lots of ways to say the same thing: Mary was born on October 11. Mary’s birthday is October 11. The Good Side: When you know a lot, facts imply each other. Language is intended to be used by agents who know a lot. Figure 15.1: Features of Language That Mark It Both Difficult and Useful Chapter 4
NLP Problems • Figure 15.1 P. 378 • English sentences are incomplete descriptions of the information that are intended to convey. • The same expression means different things in different context. • No natural language program can be complete because of new words, expression, and meaning can be generated quite freely. • There are lots of ways to say the same thing. Chapter 4
NLP Problems 1) Processing written text • using lexical, syntactic, and semantic knowledge of the language • require the real world information 2) Processing spoken language • using all information needed above plus additional knowledge about phonology • handle ambiguities in speech Chapter 4
Step in NLP 1) Morphological Analysis 2) Syntactic Analysis 3) Semantic Analysis 4) Discourse Integration 5) Pragmatic Analysis • boundaries between these five phrases are often fuzzy. Chapter 4
1. Morphological Analysis • Individual words are analyzed into components • Nonword tokens such as punctuation are separated from the words • I want to print Bill’s.int file. file extension proper noun possessive suffix Chapter 4
2. Syntactic Analysis • linear sequence of words are transformed into structures • show how words relate to each other • English syntactic analyzer • If do not pass the syntactic analyzer reject e.g. (Boy the go to store the) Chapter 4
2. Syntactic Analysis • Example of syntactic analysis Figure 15.2 p. 382 RM2, RM5, RM5 • A knowledge base Fragment Figure 15.3 p. 383 User073, F1, Printing, File_Structure, Waiting Mental Event/ Physical Event Animate/Event • Partial meaning for a sentence Figure 15.4 p. 384 Chapter 4
The dog bites the man. Syntax Chapter 4
Apply rule Chapter 4
The man bits the dog. Parse Tree Chapter 4
The dog likes a man. Parse Tree Chapter 4
Internal Representative Chapter 4
Syntactic Processing (2) • Top-down Parsing • Begin with start symbol and apply the grammar rules forward until the symbols at the terminals of the tree correspond to the components of the sentence being parsed. • Bottom-up Parsing • Begin with the sentence to be parsed and apply the grammar rules backward until a single tree whose terminals are the words of the sentence and whose top node is the start symbol has been produced. Chapter 4
Transition Network The man bits the dog. Chapter 4
ATN : Augmented Transition Network • similar to finite state machine Figure 15.8 p.392 An ATN network Figure 15.9 p.393 An ATN Grammar in List Form • sentence “The long file has printed.” S NP Q1 AUX Q3 V Q4 (F) halt NP Det Q6 Adj Q6 N Q7 (F) • (S DCL (NP (FILE (LONG) DEFINITE)) • HAS • (VP PRINTED)) p.394 Chapter 4
3. Semantic Analysis • the structures created by the syntactic analyzer are assign meanings • mapping between the syntactic structure and objects in the task domain • If no mapping reject (colorless green ideas sleep furiously) • 1) It must map individual words into appropriate objects in the knowledge base or database. • 2) It must create the correct structures to correspond to the meanings of the individual words combine with each other. Chapter 4
รูปแสดงผลการวิเคราะห์ทางวากยสัมพันธ์ของประโยค “I want to print Bill’s .init file.” Chapter 4
ผลการวิเคราะห์ทางความหมายแสดงดังรูปผลการวิเคราะห์ทางความหมายแสดงดังรูป Chapter 4
ผลสุดท้ายที่จากการวิเคราะห์ทางปฏิบัติคือคำสั่งในยูนิกซ์ที่ใช้สั่งยูนิกซ์พิมพ์ไฟล์ที่ต้องการผลสุดท้ายที่จากการวิเคราะห์ทางปฏิบัติคือคำสั่งในยูนิกซ์ที่ใช้สั่งยูนิกซ์พิมพ์ไฟล์ที่ต้องการ lpr /wsmith/stuff.init Chapter 4
4. Discourse Integration • the meaning of the individual sentence may depend on the sentences that precede it and may influence the meanings of the sentences that follow it. • (Ex. John want it.) “It”depends on the previous sentence. • Current user who type word “I” is • User068 = Susan_Black • We get F1 with filename in /wsmith/ directory Chapter 4
5. Pragmatic Analysis • The structure representing what was said is reinterpreted to determine what was actually meant. • (Ex. Do you know what time it is?) we should understand what to do.... Understand to decide what to do as a result • Representing the intended meaning • Figure 15.5 P. 385 Chapter 4
TURBO PROLOG ftp://172.28.80.6/older/DosProgram/TPROLOG Alt + Enter = Big Screen F1 : Help F2 : Save F3 : Load F6 : Next/Switch F8 : Previous Goal F9 : Compile F10 : Step (For trace) / End Alt + T : Trace ON/OFF Set up window size edit Use arrow key to adjust the size Chapter 4
TURBO PROLOG Use the example from the EXAMPLE directory to try to program. Start with EX03EX01.PRO predicates likes(symbol,symbol) clauses likes(ellen, tennis). likes(john, football). likes(tom, baseball). likes(eric, swimming) likes(mark, tennis). likes(bill, Activity) if likes(tom, Activity). likes(mark, Activity) :-likes(ellen, Activity). FACTS RULES Chapter 4
PROLOG.HELP ARITHMETIC Arithmetic operators: +, -, *, /, mod, div Relational operators: >, <, =, >=, <=, <>, >< Functions: sin, cos, tan, arctan, ln, log, exp, sqrt, round, trunc, abs EX: 1 + 2 = 2 + 1, X = 5/2, X = 5 mod 2, 5 <> 9 Chapter 4
PREDEFINED DOMAINS char1 byte characters integer 2 byte integer numbers real 8 byte floating point numbers symbol strings inserted in the internal symbol table string sequences of chars "hello world\n" Chapter 4
SUMMARY OF PROGRAM SECTIONS CONSTANTS const1 = definition const2 = definition [GLOBAL] DOMAINS dom [,dom] = [reference] declaration1; declaration2 listdom = dom* dom = <basisdom> [GLOBAL] DATABASE [ - <databasename> ] [determ] pred1(....) pred2(.....) GLOBAL PREDICATES [determ|nondeterm] pred1(.........) -(i,i,o,..)(i,o,i,..) [ language c|pascal|fortran ] [ as "name" ] pred2(........) PREDICATES [determ|nondeterm] pred1(.........) pred2(........) CLAUSES p(....):-p1(...), p2(.....), ... . p(....):-p1(...), p2(.....), ... . include"filename" Include a file during compilation. Chapter 4
MISCELLANEOUS random(RealVariable) (real) - (o) random(MaxValue,RandomInt) (integer,integer) - (i,o) sound(Duration,Frequency) (integer,integer) - (i,i) beep date(Year,Month,Day) (integer,integer,integer) - (o,o,o) (i,i,i) time(Hours,Minutes,Seconds,Hundredths) (integer,integer,integer,integer) - (o,o,o,o) (i,i,i,i) trace(on/off) (string) - (i) (o) Chapter 4
ERROR & BREAK CONTROL trap (PredicateCall,ExitCode,Predicate ToCallOnError) exit exit (ExitCode) (integer) - (i) if exit to DOS then the DOS errorlevel task processing variable will contain the value given to the exit predicate. break (on/off) (string) - (i) (o) Chapter 4
EDITOR display(String) (string) - (i) edit(InputString,OutputString) (string,string) - (i,o) edit(InputString,OutputString,Headstr,Headstr2,Msg,Pos,Helpfilename, EditMode,Indent,Insert,TextMode,RetPos,RetStatus) (string,string,string,string,string,integer,string,integer,integer,integer,integer,integer,integer) - (i,o,i,i,i,i,i,i,i,i,i,o,o) If the user saves the text from the editor, HeadStr2 will be used as the file name. editmsg(InputString,OutputString,Headstr,Headstr2,Msg,Pos,Helpfilename,RetStatus) (string,string,string,string,string,integer,string,integer) - (i,o,i,i,i,i,i,o) Chapter 4
WINDOW SYSTEM makewindow(WindowNo,ScrAtt,FrameAtt,Framestr,Row,Column,Height,Width) (integer,integer,integer,string,integer,integer,integer,integer) shiftwindow(WindowNo) (integer) - (i) (o) gotowindow(WindowNo) (integer) - (i) resizewindow(StartRow,NoOfRows,StartCol,NoOfCols) (integer,integer,integer,integer) - (i,i,i,i) colorsetup(Main_Frame) (integer) - (i) Chapter 4
INPUT readln(StringVariable) (string) - (o) readint(IntgVariable) (integer) - (o) readreal(RealVariable) (real) - (o) readchar(CharVariable) (char) - (o) keypressed unreadchar(CharToBePushedBack) (Char) - (i) readterm( Domain, Variable ) (DomainName,Domain) - (i,_) Chapter 4
OUTPUT write( Variable|Constant * ) nl writef( FormatString, Variable|Constant* ) In the format string the following options are known after a percentage sign: %d Normal decimal number. (chars and integers) %u As an unsigned integer. (chars and integers) %R As a database reference number. (database reference numbers) %X As a long hexadecimal number. (strings, database reference numb). %x As a hexadecimal number. (chars and integers). %s Strings. (symbols and strings). %c As a char. (chars and integers). %g Reals in shortest posible format (default for reals) %e Reals in exponetial notation %f Reals in fixed notation %lf Only for C compatibility (fixed reals) \n - newline \t - tabulator \nnn - character with code nnn Chapter 4
Natural Language Processing using prolog Sentence :- Noun_phrase, Verb_phrase. Noun_phrase :- Det, Noun. Noun_phrase :- Noun. Verb_phrase :- Verb, Noun_phrase. Verb_phrase :- verb. EX : The cat eats the fish. A man likes an apple. Chapter 4
EX13EX04.pro NLP.pro domains sentence = s(noun_phrase,verb_phrase) noun_phrase = noun(noun) ; noun_phrase(detrm,noun) noun = string verb_phrase = verb(verb) ; verb_phrase(verb,noun_phrase) verb = string detrm = string predicates s_sentence(string,sentence) s_noun_phrase(string,string,noun_phrase) s_verb_phrase(string,verb_phrase) d(string) n(string) v(string) start goal start. goal: Please enter the sentence > Bill eats apple Chapter 4
EX13EX04.pro NLP.pro (cont) clauses start :- write("\n Please enter a sentence > "), readln(Str), s_sentence(Str,s(_,_)). s_sentence(Str, s(N_Phrase,V_Phrase) ):- s_noun_phrase(Str, Rest, N_Phrase), s_verb_phrase(Rest, V_Phrase). s_noun_phrase(Str, Rest, noun_phrase(Detr,Noun)):- fronttoken(Str,Detr,Rest1), d(Detr), fronttoken(Rest1,Noun,Rest), n(Noun). s_noun_phrase(Str,Rest,noun(Noun)):- fronttoken(STR,Noun,Rest), n(Noun). s_verb_phrase(Str, verb_phrase(Verb,N_Phrase)):- fronttoken(Str,Verb,Rest1), v(Verb), s_noun_phrase(Rest1,"",N_Phrase). s_verb_phrase(Str,verb(Verb)):- fronttoken(STR,Verb,""), v(Verb). Chapter 4
EX13EX04.pro NLP.pro (cont) The cat likes fish A man takes a bus /* determiner */ d("the"). d("a"). d("an"). /* nouns */ n(“Bill"). n("dog"). n("cat"). n("fish"). n("ant"). n("apple"). n("man"). n("bus"). /* verbs */ v("is"). v("eats"). v("likes"). v("takes"). Chapter 4
The End Chapter 4