1 / 21

CS626-449: Speech, NLP and the Web/Topics in AI

CS626-449: Speech, NLP and the Web/Topics in AI. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG. Recap: Two Views of NLP. Classical View: Layered Procssing;Various Ambiguities (already discussed) Statistical/Machine Learning View.

Download Presentation

CS626-449: Speech, NLP and the Web/Topics in AI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG

  2. Recap: Two Views of NLP Classical View: Layered Procssing;Various Ambiguities (already discussed) Statistical/Machine Learning View

  3. What is the output of an ML-NLP System (1/2) • Option 1: A set of rules, e.g., • If the word to the left of the verb is a noun and has animacy feature, then it is the likely agent of the action denoted by the verb. • The child broke the toy (child is the agent) • The window broke (window is not the agent; inanimate)

  4. What is the output of an ML-NLP System (2/2) • Option 2: a set of probability values • P(agent|word is to the left of verb and has animacy) > P(object|word is to the left of verb and has animacy)> P(instrument|word is to the left of verb and has animacy) etc.

  5. How is this different from classical NLP Classical NLP Linguist rules Computer rules/probabilities Text data corpus Statistical NLP

  6. Classification appears as sequence labeling

  7. A set of Sequence Labeling Tasks: smaller to larger units • Words: • Part of Speech tagging • Named Entity tagging • Sense marking • Phrases: Chunking • Sentences: Parsing • Paragraphs: Co-reference annotating

  8. Example of word labeling: POS Tagging <s> Come September, and the UJF campus is abuzz with new and returning students. </s> <s> Come_VB September_NNP ,_, and_CC the_DT UJF_NNP campus_NN is_VBZ abuzz_JJ with_IN new_JJ and_CC returning_VBG students_NNS ._. </s>

  9. Example of word labeling: Named Entity Tagging <month_name> September </month_name> <org_name> UJF </org_name>

  10. Example of word labeling: Sense Marking WordSynsetWN-synset-no come {arrive, get, come} 01947900 . . . abuzz {abuzz, buzzing, droning} 01859419

  11. Example of phrase labeling: Chunking Come July, and is abuzz with . the UJF campus new and returning students

  12. Example of Sentence labeling: Parsing [S1[S[S[VP[VBCome][NP[NNPJuly]]]] [,,] [CC and] [S [NP [DT the] [JJ UJF] [NN campus]] [VP [AUX is] [ADJP [JJ abuzz] [PP[IN with] [NP[ADJP [JJ new] [CC and] [ VBG returning]] [NNS students]]]]]] [..]]]

  13. Handling labeling through the Noisy Channel Model w t (wn, wn-1, … , w1) (tm, tm-1, … , t1) Noisy Channel Sequence w is transformed into sequence t.

  14. Bayesian Decision Theory and Noisy Channel Model are close to each other • Bayes Theorem : Given the random variables A and B, Posterior probability Prior probability Likelihood

  15. Corpus • A collection of text called corpus, is used for collecting various language data • With annotation: more information, but manual labor intensive • Practice: label automatically; correct manually • The famous Brown Corpus contains 1 million tagged words. • Switchboard: very famous corpora 2400 conversations, 543 speakers, many US dialects, annotated with orthography and phonetics

  16. Discriminative vs. Generative Model W* = argmax (P(W|SS)) W Generative Model Discriminative Model Compute directly from P(W|SS) Compute from P(W).P(SS|W)

  17. Notion of Language Models

  18. Language Models • N-grams: sequence of n consecutive words/chracters • Probabilistic / Stochastic Context Free Grammars: • Simple probabilistic models capable of handling recursion • A CFG with probabilities attached to rules • Rule probabilities  how likely is it that a particular rewrite rule is used?

  19. PCFGs • Why PCFGs? • Intuitive probabilistic models for tree-structured languages • Algorithms are extensions of HMM algorithms • Better than the n-gram model for language modeling.

  20. Formal Definition of PCFG • A PCFG consists of • A set of terminals {wk}, k = 1,….,V {wk} = { child, teddy, bear, played…} • A set of non-terminals {Ni}, i = 1,…,n {Ni} = { NP, VP, DT…} • A designated start symbol N1 • A set of rules {Ni j}, where j is a sequence of terminals & non-terminals NP  DT NN • A corresponding set of rule probabilities

  21. Rule Probabilities • Rule probabilities are such that E.g., P( NP  DT NN) = 0.2 P( NP  NN) = 0.5 P( NP  NP PP) = 0.3 • P( NP  DT NN) = 0.2 • Means 20 % of the training data parses use the rule NP  DT NN

More Related