1 / 53

COMP 791A: Statistical Language Processing

This text explores the different levels of study in natural language processing, including lexical analysis, phonetics and phonology, morphology, syntax, semantics, pragmatics, discourse, and world knowledge.

aronr
Download Presentation

COMP 791A: Statistical Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

  2. Levels of study of NLP • Lexical • Possible words in a given language • rose ?gellapou • Phonetics & phonology • How words are related to sounds • rose [roz] • Parts-of-speech & Morphology • How words are constructed from basic meaning units (morphemes) • friend + ly--> friendlyfriend + s--> friends • rose + ly ≠ roselywoman + s ≠ womans • Phrase Structure and Syntax • How words can be ordered to form correct sentences • ?Red the is rose / adj det verb noun • The rose is red / det noun verb adj

  3. Levels of study of NLP (con’t) • Semantics • What words mean (lexical semantics, word sense disambiguation) • chair--> furniture / person • How word meanings are combined into the meaning of sentences. • The chair is broken. • The chair is sick. • Pragmatics • How language conventions affects the literal meaning (interpretation) • Do you have the time? • Do you have the children? • Discourse • How surrounding sentences affect interpretation • The chair’s leg is broken. He went skiing last week-end. • The chair’s leg is broken. Someone placed a 500kg package on it. • World-Knowledge • How general knowledge about the world affects interpretation • The prof sent the student to see the chair because he was fed up with his behavior. • The prof sent the student to see the chair because he wanted to see him. • The prof sent the student to see the chair because he was taking in class.

  4. Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge

  5. Parts of Speech and Morphology • Parts of Speech (POS) • word/lexical/syntactic/grammatical categories/tag/class • Ex: noun, verb, adjectives, prepositions, … • Morphology • study and description of word formation in a language • modification of a root form (stem) by affixes • affix: prefixes, suffixes, infixes, circumfixes • and exceptions… thief -->thieveschief --> chiefs • Word categories are systematically related by morphological processes

  6. Morphological processes • Inflection • to indicate case, gender, number, tense, person, mood, or voice • does not change the word’s grammatical class or meaning significantly • car -->cars • talk -->talking • Derivation • creation of a new word • may have different meaning and/or grammatical class • infect -->disinfect • grateful -->ungrateful • wide (adjective) -->widely(adverb) • teach(verb)-->teacher(noun) • Compounding • merging 2 or more words into a single one • written as separate words • but pronounced as a single word / denotes 1 single concept • so merits an entry in lexicon • tea kettle, disk drive, mad cow disease

  7. Classes of POS • Open (lexical) class • things, actions, events, … • ex. cat, John, eat • new words can be added easily • nouns, verbs, adjectives, adverbs • some languages do not have all these categories • Closed (functional) class • generally function/grammatical words • ex. the, in, and, for • relatively fixed membership • prepositions, determiners, pronouns, conjunctions, particles, numerals, auxiliary verbs

  8. Main POS • Open class • Noun – refers to entities like people, places, things or ideas. • Adjective – describes the properties of nouns or pronouns. • Verb – describes actions, activities and states. • Adverb – describes a verb, an adjective or another adverb. • Closed class • Pronoun – word that take the place of a noun or other. • Determiner – describes the particular reference of a noun. • Preposition - expresses spatial or time relationships. • …

  9. Nouns (open) • Entities like people, places, things or ideas • ex: dog, tree, Mary, idea • Typical inflections: • number (singular, plural), • gender (masculine, feminine, neuter), • case (nominative, genitive, accusative, dative) • Sub-categories: • proper nouns (John) • adverbial nouns (today, home)

  10. Verbs (open) • Actions, activities, and states The men work in the field. The men are working in the field. The men are in the field. • Typical inflections: • tenses: present, past, future • other inflection: number, person • aspect: progressive, perfective • voice: active, passive • Sub-category: • auxiliaries (considered closed-class words) • ex:be, do, will • modal verbs (considered closed-class words) • ex:can, should, could • main verbs

  11. Main verbs • Transitive • requires a direct object (found with questions: what? or whom?) • ?The child broke. • The child broke a glass. • Intransitive • does not require a direct object. • The train arrived. • Some verbs can be both transitive and intransitive • The ship sailed the seas. (transitive) • The ship sails at noon. (intransitive) • I met my friend at the airport. (transitive) • The delegates met yesterday. (intransitive)

  12. Adjectives (open) • Properties and attributes • long road • rainy day • attractive hat Typical inflections: • number, gender, case • Sub-categories: • comparative (richer) • superlative (richest)

  13. Adverbs (open) • words added to a verb, adjective, adverbs or other to expand its meaning • You must set up the copy now. • Mary walks gracefully. • Sometimes I take a walk in the woods. • Jack usually leaves the house at seven. • I have always admired her. • sub-categories: • locative (here) • degree (very) • manner (slowly) • temporal (late, yesterday (noun?))

  14. Closed class categories • Determiners: • words that makes specific the denotation of a noun phrase • articlesthe hat, a hat • demonstrativethis hat, that hat • possessiveJohn‘s hat, my hat, her book • wh-determinerwhich hat, whose hat • quantifiersome hat, every hat • Prepositions: • words that show the relationship between certain words in a sentence • The accident occurred under the bridge. • by, to, at,… • Conjunctions: • words used to join other words or group of words • or, when, but, and,… • Auxiliary & modal verbs: • be, do, can , may, should,…

  15. Closed class categories (con’t) • Particles: • words that are added to main verbs to construct different verbs • check+out = check out, make+up = make up • Ex: • She made up a story • She made it up • particles vs. prepositions • she <ran up> a bill / she <ran> <up> a hill • Numerals: • one, third

  16. Closed class categories (con’t) • Pronouns: • a word that replaces a noun or even another sentence • ex: she, ourselves, mine, that • subcategories: • Personal: • You are very nice. • Possessive: • Mine is nicer. • Interrogative: used to ask questions: who?, what?, which? • Who is that girl ? • Demonstrative: point out definite persons, places or things: this, these, that • This is my book. • He said he was busy, but that was a lie. • Relative: joins the clause which is introduced its own attachment: who, which, that • She is the girl who won the race. • ...

  17. Other parts of speech • Interjections: • Ouch! • Negatives: • no, not • Politeness markers: • Hello, bye • Existential: • There are 3 students sleeping.

  18. Summary • Open class • nouns cat, spirit • verbs eat, cook • adjectives slow, large • adverbs slowly • Closed class • prepositions on, under, at • determiners a, the, some • pronouns she, who, I, other • conjunctions and, but, or • auxiliary verbs can, may, should • particles up, on, off • numerals one, two, first

  19. The substitution test • Basic test to determine if 2 words belong to the same POS class intelligent The sad one is in the corner. green fat …

  20. POS Tagging • Automatically assign POS tags to words in a text. • Children/NOUN eat/VERBsweet/ADJECTIVEcandy/NOUN • The/ARTICLE children/NOUN ate/VERB the/ARTICLE cake/NOUN • The/ARTICLE news/NOUN has/AUXILIARY been/MAINVERB quite/ADVERB sad/ADJECTIVE in/PREPOSITIONfact/NOUN./PERIOD

  21. Why do POS Tagging? • 1st step towards NLU • easier then full NLU (results > 95% accuracy) • Useful for: • speech recognition/ synthesis (better accuracy) • how to recognize/pronounce a word • CONtent/noun VS conTENT/adj • stemming in IR • which morphological affixes the word can take • adverb - ly = noun (friendly - ly = friend) • Indexing in IR • pick out nouns which may be more important than other words in indexing documents

  22. Tag Sets • A tag indicates the various conventional parts of speech. • Different Tag Sets have been used • Ex. Brown Tag Set, Penn Treebank Tag Set • Tag examples: • NP Proper noun • NN Singular noun • AT Article • DET Determinant • More on this later

  23. Penn Treebank tag Set

  24. Ambiguities in POS tagging • Children eat sweet candy/ noun. • Too much boiling will candy/ adjective the molasses. • Fruit flies / ? like / ? a banana.

  25. Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge

  26. Syntax or Phrase Structure • Syntax • study of the regularities and constrains of word order and phrase structure • the book is red vs red book is the • Grammar • expresses the relations among the constituents of a sentence

  27. Constituents • also called, syntactic structures • Main Constituents: • S: sentence The boy is happy. • NP: noun phrase the little boy Sam Smith I three boy from Montreal • VP: verb phrase eat an apple sing leave Boston in the morning • PP: prepositional phrase in the morning about my ticket • AdjP: adjective phrase really funny rather clear very large • AdvP: adverb phrases slowly really slowly

  28. Sentence Moods/Types • Declarative • Mary eats. • S --> NP VP • Imperative • Eat! • S --> VP • Yes-No Question • Did Mary eat? • S --> Aux NP VP • Wh-Question • When did Mary eat? • S --> WH-pro Aux NP VP

  29. Noun Phrases • NP --> pre-modifiers head post-modifiers • head: central noun in NP • the little boy, the boy from Montreal • pre-modifiers: • determiners, cardinal, ordinal, quantifier • the boy, two boys, first boy, several boys • AdjP • funny boy, really funny boy • post-modifiers: • PP • flights from Montreal • non-finite clause • gerundive (-ing) • flights arriving from Montreal • -ed • dinner served on board, jewels stolen from thequeen • infinitive form • flight to arrive from Montreal • relative clause • flight that arrives from Montreal, girl who won the race

  30. Verb Phrases • VP --> head-verb complements adjuncts • Some VPs: • Verb eat. • Verb NPleaveMontreal. • Verb NP PPleave Montreal in the morning. • Verb PP leave in the morning. • Verb Sthink I would like the fish. • Verb VP want to leave. want to leaveMontreal. want to leave Montreal in the morning. want to want to leave Montreal in the morning.

  31. Subcategorisation frames • Some verbs can take complements that others cannot I want to fly. * I find to fly. • Verbs are subcategorized according to the complements they can take --> subcategorisation frames • traditionally: transitive vs intransitive • nowadays: up to 100 subcategories / frames

  32. Prepositional phrases • PP --> Preposition NP • from Japan • inside my blue bag

  33. Adjective Phrases • AdjP --> Adj Modifiers • tall • very tall • taller than Mary

  34. Adverb Phrases • AdvP --> Adv Modifiers • affirmatively • very graciously • rather secretively

  35. Context Free Grammars • set of non-terminal symbols • constituents & parts-of-speech • S, NP, VP, PP, Det, N, V, ... • set of terminal symbols • lexicon of words & punctuation • cat, mouse, nurses, eat, ... • a non-terminal designated as the starting symbol • sentence S • a set of re-write rules • having a single non-terminal on the LHS and one or more terminal or non-terminal in the RHS • S --> NP VP • NP --> Pro | PN | Det Nominal

  36. S --> NP VP NP --> AT NNS NP --> AT NN NP --> NP PP VP --> VP PP VP --> VBD VP --> VBD NP P --> IN NP NNS --> children NNS --> students NNS --> mountains VBD --> slept VBD --> ate VBD --> saw AT --> the IN --> in IN --> of NN --> cake A simple context-free grammar The Grammar The Lexicon

  37. S NP VP AT NNS VBD NP The children ateAT NN the cake A parse tree • a tree representation of the application of the grammar to a specific sentence.

  38. Stochastic Grammars • Grammars obtained by adding probabilities to “algebraic” (i. e., non-probabilistic) grammars. • 1 S --> NP VP • 0.4 NP --> AT NNS • 0.4 NP --> AT NN • 0.2 NP --> NP PP • 0.1 VP --> VP PP • 0.1 VP --> VBD • 0.8 VP --> VBD NP • 1 P --> IN NP

  39. Syntactic Dependencies • Local dependency • dependency between two words expressed within the same syntactic rule. • The 3/plural books/plural. • n-grams models this very well. • Non-local dependency • two words can be syntactically dependent even though they occur far apart in a sentence • Ex: subject-verb agreement • The children who found a wallet on the street yesterday while walking their dog were given a reward. • challenge for certain statistical NLP approaches (ex. n-grams) that model local dependencies.

  40. Difficulties in parsing • Attachment ambiguity • The children ate the cake with a spoon. • The children ate (the cake with a spoon).?? • The children (ate with a spoon).??

  41. Other difficulties • NP bracketing • plastic cat food can cover --> ? (plastic cat) (food can) cover --> ? plastic (cat food can) cover --> ? (plastic cat food) (can cover) • Conjunctions and appositives • Maddy, my dog, and Samy --> ?(Maddy, my dog), and (Samy) --> ?(Maddy), (my dog), and (Samy)

  42. Another Ambiguity: Garden-Path Sentences • well-studied class of syntactic ambiguity • sentence is re-analysed when the last word in encountered • humans have difficulty analysing such sentences • Example: The horse raced past the barn fell. (the horse that was raced past the barn) fell.

  43. Garden Path: Wrong Parse [S [NP The horse][VP raced past the barn]]fell dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase

  44. Garden Path: Right Parse [S [NP The horse [PAP raced past the barn]][VP fell]] dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase PAP: passive phrase

  45. Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge

  46. Semantics • the study of the meaning of words, constructions, and utterances • can be divided into two parts: • lexical semantics • meaning of words • compositional semantics • Meaning of sentences and discourse • the meaning of the whole often differs from the meaning of the parts.

  47. Lexical Semantics • Meaning of individual words • I went to the bank of Montreal and deposited 50$. • I went to the bank of the river and dangled my feet. • Word Sense Disambiguation • Determining which sense of a word is used in a specific sentence • Semantic relations between words: • hypernymy, hyponymy, synonymy, antonymy, meronymy, holonymy, polysemy, homonymy and homophony.

  48. Meaning of sentences • The cat eats the mouse = The mouse is eaten by the cat. • Goal: • built a representation of the meaning of the sentence • attach semantic roles to constituents • Some characteristics of a sentence that influence semantic interpretation: • Type declarative, interrogative, imperative, exclamatory • Polarity positive, negative • Tense past, present, future • Voice Active, passive • Some semantic roles (different from syntactic roles): • Agent the doer of a volitional act • Patient the thing that is affected by an act • Recipient the receiver of an object • Instrument the instrument used to perform an act. • Time the time the act is performed. • Location the location of an act or object. • …

  49. Semantic Roles • Ex: • JohnAGENT hit PeterPATIENT with a ballINSTRUMENT. • Ex: • I ate spaghetti with meatballsINGREDIENT_OF_SPAGUETTI • I ate spaghetti with saladSIDE DISH_OF_SPAGUETTI • I ate spaghetti with a forkINSTRUMENT • I ate spaghetti with a friendACOMPANIER_OF_EATING • Important for machine translation… • I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED • ?Je PATIENT: PERSON_MISSED teAGENT: PERSON_LACKING_SOMEONE manque. • Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques.

  50. Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge

More Related