1 / 36

A* Search

A* Search. Uses evaluation function f ( n ) = g(n ) + h(n ) where n is a node. g is a cost function Total cost incurred so far from initial state at node n For 8-puzzle, each move has equal cost h is an heuristic that estimates cost to goal. (Hamming distance) . A* Pseudocode.

kura
Download Presentation

A* Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A* Search • Uses evaluation function f (n)=g(n)+ h(n) where n is a node. • gis a cost function • Total cost incurred so far from initial state at node n • For 8-puzzle, each move has equal cost • his an heuristic that estimates cost to goal

  2. (Hamming distance)

  3. A* Pseudocode create the open list of nodes, initially containing only our starting node create the closed list of nodes, initially empty while (we have not reached our goal) { consider the best node in the open list (the node with the lowest f value) if (this node is the goal) { then we're done } else { move the current node to the closed list and consider all of its successors for (each successor) { if (this successor is in the closed list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else if (this successor is in the open list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else this successor is not in either the open or closed list { add the successor to the open list and set its g value } } } }

  4. Go over HW 2

  5. Go over HW 3 Nim Demo: http://www.math.uri.edu/~bkaskosz/flashmo/marien.html

  6. Presentations on Wednesday:Nathan NifongMatthausLitteken

  7. Example of statistical language models:n-grams • Estimates probability distribution of a word w, given n-1 words that have come before in the sequence. P(w|w1, w2, …, wn-1) • Purpose: to guess next word from previous words to disambiguate: • “Students have access to a list of course requirements” • “Would you like a drink of [garbled]? • “He loves going to the [bark].”

  8. N grams: Applications throughout natural language processing: • text classification • speech recognition • machine translation • intelligent spell-checking • handwriting recognition • playing “Jeopardy!”

  9. What is P(bark|he loves going to the) ? • What is P(park|he loves going to the) ? • Can estimate from a large corpus: • P (w|w1, .., wn-1 )= frequency of w1…wn-1w divided by frequency of w1…wn-1 • Example: Use Google

  10. Problem! Web doesn’t give us enough examples to get good statistics. One solution: • Approximate P (w|w1, .., wn-1) by using small n(e.g., n=2: bigrams). Bigram example: P (bark|the) vs. P(park|the) (calculate using Google) Trigram: P (bark|tothe) vs. P (park|to the)

  11. Typically, bigrams are used:

  12. Now can calculate probability of utterance: P (he loves going to the bark) P(he|<s>) P (loves| he) P (going| loves) P (to| going) P (the| to) P (bark |the) P(</s> bark) <s> = sentence start marker </s> = sentence end marker

  13. Text classification using N-grams

  14. Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 1 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67

  15. Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1 P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67 New sentence 1:“They are the egg man”

  16. Mini-example(Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): <s> I am he as you are he</s> <s> I am the Walrus </s> <s> I am the egg man</s> Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | <s>) = 1 P(am | I) = 1 P( man | egg) = 1P(are | you) = 1 P (egg | the) = .5 P(the | am) = .67 New sentence 1:“They are the egg man” New sentence 2: “Goo goo g’joob”

  17. N-gram approximation to Shakespeare(Jurafsky and Martin, 2000) • Trained unigram, bigram, trigram, and quadrigram model on complete corpus of Shakespeare’s works (including punctuation). • Use these models to generate random sentences by choosing new unigram/bigram/trigram/quadrigram probabilistically

  18. Unigram model • To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have. • Every enter now severally so, let • Hill he late speaks; or! a more to leg less first you enter • Are where exeunt and sighs have rise excellency took of...Sleep knave we. near; vile like.

  19. Bigram model • What means, sir. I confess she? then all sorts, he is trim, captain. • Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow. • What we, hath got so she that I rest and sent to scold and nature bankrupt, nor the first gentleman? • Thou whoreson chops. Consumption catch your dearest friend, well, and I know where many mouths upon my undoing all but be, how soon, then; we’ll execute upon my love’s bonds and we do you will?

  20. Trigram model • Sweet prince, Falstaff shall die. Harry of Monmouth’s grave. • This shall forbid it should be branded, if renown made it empty. • Indeed the duke; and had a very good friend. • Fly, and will rid me these news of price. Therefore the sadness of parting, as they say, ‘tis done.

  21. Quadrigram model • King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in; • Will you not tell me who I am? • Indeed the short and long. Marry, ‘tis a noble Lepidus. • Enter Leonato’s brother Antonio, and the rest, but seek the weary beds of people sick.

  22. From Cavnar and TrenkleN-gram-based text categorization(1994) • Early paper, but clearly lays out main ideas of n-gram text classification. • Categorization of USENET newsgroups • by language • by topic

  23. Categorization requirements

  24. N-grams (in this paper) • N-character slice (rather than N-word slice) • Examples:

  25. Advantages of using character n-grams versus word n-grams • Less sensitive to errors (e.g., in OCR documents) • Helps deal with limited statistics problem (some words might not appear in document)

  26. Frequency distribution of n-grams • Zipf’s law: Frequency (n-gram) ≅ 1 / rank(n-gram) Also true for words

  27. Generate profile of document Can also do this for entire category by putting all n-grams from all category documents in a single “bag of n-grams”

  28. Observations

  29. Measure profile distance • Given profile for entire category (e.g., “cryptography”), can calculate distance from a new document to that category by comparing their profiles. • For each n-gram in document profile, calculate how “out of place” it is in rank compared with its rank in the category profile.

  30. Classifying document • To classify document D, calculate its distance from each category, and choose the category with minimum distance (must be below some threshold distance). • If no category is below threshold distance, then class of D is “not known”.

  31. Cavnar and Trenkle’s Results • Used newsgroup FAQs as “category” documents from which to “learn” n-gram models Results (Confusion Matrix)

  32. Smoothing Needed to overcome problem of sparse data. E.g., even in a large corpus, can get zero probability for valid bigrams. Laplace smoothing (or “add-one” smoothing): Add 1 to all the bigram counts

More Related