370 likes | 609 Views
A* Search. Uses evaluation function f ( n ) = g(n ) + h(n ) where n is a node. g is a cost function Total cost incurred so far from initial state at node n For 8-puzzle, each move has equal cost h is an heuristic that estimates cost to goal. (Hamming distance) . A* Pseudocode.
E N D
A* Search • Uses evaluation function f (n)=g(n)+ h(n) where n is a node. • gis a cost function • Total cost incurred so far from initial state at node n • For 8-puzzle, each move has equal cost • his an heuristic that estimates cost to goal
A* Pseudocode create the open list of nodes, initially containing only our starting node create the closed list of nodes, initially empty while (we have not reached our goal) { consider the best node in the open list (the node with the lowest f value) if (this node is the goal) { then we're done } else { move the current node to the closed list and consider all of its successors for (each successor) { if (this successor is in the closed list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else if (this successor is in the open list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else this successor is not in either the open or closed list { add the successor to the open list and set its g value } } } }
Go over HW 3 Nim Demo: http://www.math.uri.edu/~bkaskosz/flashmo/marien.html
Example of statistical language models:n-grams • Estimates probability distribution of a word w, given n-1 words that have come before in the sequence. P(w|w1, w2, …, wn-1) • Purpose: to guess next word from previous words to disambiguate: • “Students have access to a list of course requirements” • “Would you like a drink of [garbled]? • “He loves going to the [bark].”
N grams: Applications throughout natural language processing: • text classification • speech recognition • machine translation • intelligent spell-checking • handwriting recognition • playing “Jeopardy!”
What is P(bark|he loves going to the) ? • What is P(park|he loves going to the) ? • Can estimate from a large corpus: • P (w|w1, .., wn-1 )= frequency of w1…wn-1w divided by frequency of w1…wn-1 • Example: Use Google
Problem! Web doesn’t give us enough examples to get good statistics. One solution: • Approximate P (w|w1, .., wn-1) by using small n(e.g., n=2: bigrams). Bigram example: P (bark|the) vs. P(park|the) (calculate using Google) Trigram: P (bark|tothe) vs. P (park|to the)
Now can calculate probability of utterance: P (he loves going to the bark) P(he|<s>) P (loves| he) P (going| loves) P (to| going) P (the| to) P (bark |the) P(</s> bark) <s> = sentence start marker </s> = sentence end marker
Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 1 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67
Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1 P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67 New sentence 1:“They are the egg man”
Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67 New sentence 1:“They are the egg man” New sentence 2: “Goo goo g’joob”
N-gram approximation to Shakespeare(Jurafsky and Martin, 2000) • Trained unigram, bigram, trigram, and quadrigram model on complete corpus of Shakespeare’s works (including punctuation). • Use these models to generate random sentences by choosing new unigram/bigram/trigram/quadrigram probabilistically
Unigram model • To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have. • Every enter now severally so, let • Hill he late speaks; or! a more to leg less first you enter • Are where exeunt and sighs have rise excellency took of...Sleep knave we. near; vile like.
Bigram model • What means, sir. I confess she? then all sorts, he is trim, captain. • Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow. • What we, hath got so she that I rest and sent to scold and nature bankrupt, nor the first gentleman? • Thou whoreson chops. Consumption catch your dearest friend, well, and I know where many mouths upon my undoing all but be, how soon, then; we’ll execute upon my love’s bonds and we do you will?
Trigram model • Sweet prince, Falstaff shall die. Harry of Monmouth’s grave. • This shall forbid it should be branded, if renown made it empty. • Indeed the duke; and had a very good friend. • Fly, and will rid me these news of price. Therefore the sadness of parting, as they say, ‘tis done.
Quadrigram model • King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in; • Will you not tell me who I am? • Indeed the short and long. Marry, ‘tis a noble Lepidus. • Enter Leonato’s brother Antonio, and the rest, but seek the weary beds of people sick.
From Cavnar and TrenkleN-gram-based text categorization(1994) • Early paper, but clearly lays out main ideas of n-gram text classification. • Categorization of USENET newsgroups • by language • by topic
N-grams (in this paper) • N-character slice (rather than N-word slice) • Examples:
Advantages of using character n-grams versus word n-grams • Less sensitive to errors (e.g., in OCR documents) • Helps deal with limited statistics problem (some words might not appear in document)
Frequency distribution of n-grams • Zipf’s law: Frequency (n-gram) ≅ 1 / rank(n-gram) Also true for words
Generate profile of document Can also do this for entire category by putting all n-grams from all category documents in a single “bag of n-grams”
Measure profile distance • Given profile for entire category (e.g., “cryptography”), can calculate distance from a new document to that category by comparing their profiles. • For each n-gram in document profile, calculate how “out of place” it is in rank compared with its rank in the category profile.
Classifying document • To classify document D, calculate its distance from each category, and choose the category with minimum distance (must be below some threshold distance). • If no category is below threshold distance, then class of D is “not known”.
Cavnar and Trenkle’s Results • Used newsgroup FAQs as “category” documents from which to “learn” n-gram models Results (Confusion Matrix)
Smoothing Needed to overcome problem of sparse data. E.g., even in a large corpus, can get zero probability for valid bigrams. Laplace smoothing (or “add-one” smoothing): Add 1 to all the bigram counts