CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing

CPSC 7373: Artificial IntelligenceLecture 13: Natural Language Processing Jiang Bian, Fall 2012 University of Arkansas at Little Rock

NLP Assignment 2 • NLTK + Brown corpus • Word-based unigram model + Laplace smoothing. e.g., • P(“this”) = 0.00424935611437 • log(P(“this”)) = -5.46 • P(“het”) = 8.2575905837e-07 • log(P(“het”)) = -14.01

Step 1 • First run on the last row: • |hi| | |in| | | t| | | | |ye| |ar| |s | | |. | • (-17.015945081500426, [3, 6, 0, 15, 11, 13, 18], 'in this year. ') • (-17.015945081500426, [3, 6, 0, 15, 18, 11, 13], 'in this . year') • (-17.015945081500426, [3, 18, 6, 0, 15, 11, 13], 'in. this year') • (-17.015945081500426, [3, 18, 11, 13, 6, 0, 15], 'in. year this ') • (-17.015945081500426, [6, 0, 15, 3, 18, 11, 13], ' this in. year') • (-17.015945081500426, [11, 13, 18, 3, 6, 0, 15], 'year. in this ') • (-17.015945081500426, [18, 3, 6, 0, 15, 11, 13], '. in this year') • Other tricks?

Step 2 • Based on: (4.074449568896497e-08, [3, 6, 0, 15, 11, 13, 18], 'in this year. ') • Collapse [3, 6, 0, 15, 11, 13, 18] into one fixed string ('in this year. ’) • Randomly pick 3 strips from the rest [1, 2, 4, 5, 7, 8, 9, 10, 12, 14, 16, 17]; and • Randomly pick another row (e.g., 4th row) • |he| |ea|of|ho| m| t|et|ha| | t|od|ds|e |ki| c|t |ng|br| • (-27.422343645594154, [3, 6, 0, 15, 11, 13, 18, 2, 14, 17], 'of the code breaking') • Recursion… until…

Final Claude Shannon founded information theory, which is the basis of probabilistic language models and of the code breaking methods that you would use to solve this problem with the paper titled "A Mathematic Theory of Communication," published in this year.

CPSC 7373: Artificial Intelligence Lecture 13: Natural Language Processing