1 / 5

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

Explore NLTK and Brown corpus to build word-based unigram model with Laplace smoothing. Calculate probabilities for words, apply tricks for text generation, and delve into information theory for code breaking.

thanson
Download Presentation

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 7373: Artificial IntelligenceLecture 13: Natural Language Processing Jiang Bian, Fall 2012 University of Arkansas at Little Rock

  2. NLP Assignment 2 • NLTK + Brown corpus • Word-based unigram model + Laplace smoothing. e.g., • P(“this”) = 0.00424935611437 • log(P(“this”)) = -5.46 • P(“het”) = 8.2575905837e-07 • log(P(“het”)) = -14.01

  3. Step 1 • First run on the last row: • |hi| | |in| | | t| | | | |ye| |ar| |s | | |. | • (-17.015945081500426, [3, 6, 0, 15, 11, 13, 18], 'in this year. ') • (-17.015945081500426, [3, 6, 0, 15, 18, 11, 13], 'in this . year') • (-17.015945081500426, [3, 18, 6, 0, 15, 11, 13], 'in. this year') • (-17.015945081500426, [3, 18, 11, 13, 6, 0, 15], 'in. year this ') • (-17.015945081500426, [6, 0, 15, 3, 18, 11, 13], ' this in. year') • (-17.015945081500426, [11, 13, 18, 3, 6, 0, 15], 'year. in this ') • (-17.015945081500426, [18, 3, 6, 0, 15, 11, 13], '. in this year') • Other tricks?

  4. Step 2 • Based on: (4.074449568896497e-08, [3, 6, 0, 15, 11, 13, 18], 'in this year. ') • Collapse [3, 6, 0, 15, 11, 13, 18] into one fixed string ('in this year. ’) • Randomly pick 3 strips from the rest [1, 2, 4, 5, 7, 8, 9, 10, 12, 14, 16, 17]; and • Randomly pick another row (e.g., 4th row) • |he| |ea|of|ho| m| t|et|ha| | t|od|ds|e |ki| c|t |ng|br| • (-27.422343645594154, [3, 6, 0, 15, 11, 13, 18, 2, 14, 17], 'of the code breaking') • Recursion… until…

  5. Final Claude Shannon founded information theory, which is the basis of probabilistic language models and of the code breaking methods that you would use to solve this problem with the paper titled "A Mathematic Theory of Communication," published in this year.

More Related