530 likes | 544 Views
This text explores the different levels of study in natural language processing, including lexical analysis, phonetics and phonology, morphology, syntax, semantics, pragmatics, discourse, and world knowledge.
E N D
COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3
Levels of study of NLP • Lexical • Possible words in a given language • rose ?gellapou • Phonetics & phonology • How words are related to sounds • rose [roz] • Parts-of-speech & Morphology • How words are constructed from basic meaning units (morphemes) • friend + ly--> friendlyfriend + s--> friends • rose + ly ≠ roselywoman + s ≠ womans • Phrase Structure and Syntax • How words can be ordered to form correct sentences • ?Red the is rose / adj det verb noun • The rose is red / det noun verb adj
Levels of study of NLP (con’t) • Semantics • What words mean (lexical semantics, word sense disambiguation) • chair--> furniture / person • How word meanings are combined into the meaning of sentences. • The chair is broken. • The chair is sick. • Pragmatics • How language conventions affects the literal meaning (interpretation) • Do you have the time? • Do you have the children? • Discourse • How surrounding sentences affect interpretation • The chair’s leg is broken. He went skiing last week-end. • The chair’s leg is broken. Someone placed a 500kg package on it. • World-Knowledge • How general knowledge about the world affects interpretation • The prof sent the student to see the chair because he was fed up with his behavior. • The prof sent the student to see the chair because he wanted to see him. • The prof sent the student to see the chair because he was taking in class.
Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge
Parts of Speech and Morphology • Parts of Speech (POS) • word/lexical/syntactic/grammatical categories/tag/class • Ex: noun, verb, adjectives, prepositions, … • Morphology • study and description of word formation in a language • modification of a root form (stem) by affixes • affix: prefixes, suffixes, infixes, circumfixes • and exceptions… thief -->thieveschief --> chiefs • Word categories are systematically related by morphological processes
Morphological processes • Inflection • to indicate case, gender, number, tense, person, mood, or voice • does not change the word’s grammatical class or meaning significantly • car -->cars • talk -->talking • Derivation • creation of a new word • may have different meaning and/or grammatical class • infect -->disinfect • grateful -->ungrateful • wide (adjective) -->widely(adverb) • teach(verb)-->teacher(noun) • Compounding • merging 2 or more words into a single one • written as separate words • but pronounced as a single word / denotes 1 single concept • so merits an entry in lexicon • tea kettle, disk drive, mad cow disease
Classes of POS • Open (lexical) class • things, actions, events, … • ex. cat, John, eat • new words can be added easily • nouns, verbs, adjectives, adverbs • some languages do not have all these categories • Closed (functional) class • generally function/grammatical words • ex. the, in, and, for • relatively fixed membership • prepositions, determiners, pronouns, conjunctions, particles, numerals, auxiliary verbs
Main POS • Open class • Noun – refers to entities like people, places, things or ideas. • Adjective – describes the properties of nouns or pronouns. • Verb – describes actions, activities and states. • Adverb – describes a verb, an adjective or another adverb. • Closed class • Pronoun – word that take the place of a noun or other. • Determiner – describes the particular reference of a noun. • Preposition - expresses spatial or time relationships. • …
Nouns (open) • Entities like people, places, things or ideas • ex: dog, tree, Mary, idea • Typical inflections: • number (singular, plural), • gender (masculine, feminine, neuter), • case (nominative, genitive, accusative, dative) • Sub-categories: • proper nouns (John) • adverbial nouns (today, home)
Verbs (open) • Actions, activities, and states The men work in the field. The men are working in the field. The men are in the field. • Typical inflections: • tenses: present, past, future • other inflection: number, person • aspect: progressive, perfective • voice: active, passive • Sub-category: • auxiliaries (considered closed-class words) • ex:be, do, will • modal verbs (considered closed-class words) • ex:can, should, could • main verbs
Main verbs • Transitive • requires a direct object (found with questions: what? or whom?) • ?The child broke. • The child broke a glass. • Intransitive • does not require a direct object. • The train arrived. • Some verbs can be both transitive and intransitive • The ship sailed the seas. (transitive) • The ship sails at noon. (intransitive) • I met my friend at the airport. (transitive) • The delegates met yesterday. (intransitive)
Adjectives (open) • Properties and attributes • long road • rainy day • attractive hat Typical inflections: • number, gender, case • Sub-categories: • comparative (richer) • superlative (richest)
Adverbs (open) • words added to a verb, adjective, adverbs or other to expand its meaning • You must set up the copy now. • Mary walks gracefully. • Sometimes I take a walk in the woods. • Jack usually leaves the house at seven. • I have always admired her. • sub-categories: • locative (here) • degree (very) • manner (slowly) • temporal (late, yesterday (noun?))
Closed class categories • Determiners: • words that makes specific the denotation of a noun phrase • articlesthe hat, a hat • demonstrativethis hat, that hat • possessiveJohn‘s hat, my hat, her book • wh-determinerwhich hat, whose hat • quantifiersome hat, every hat • Prepositions: • words that show the relationship between certain words in a sentence • The accident occurred under the bridge. • by, to, at,… • Conjunctions: • words used to join other words or group of words • or, when, but, and,… • Auxiliary & modal verbs: • be, do, can , may, should,…
Closed class categories (con’t) • Particles: • words that are added to main verbs to construct different verbs • check+out = check out, make+up = make up • Ex: • She made up a story • She made it up • particles vs. prepositions • she <ran up> a bill / she <ran> <up> a hill • Numerals: • one, third
Closed class categories (con’t) • Pronouns: • a word that replaces a noun or even another sentence • ex: she, ourselves, mine, that • subcategories: • Personal: • You are very nice. • Possessive: • Mine is nicer. • Interrogative: used to ask questions: who?, what?, which? • Who is that girl ? • Demonstrative: point out definite persons, places or things: this, these, that • This is my book. • He said he was busy, but that was a lie. • Relative: joins the clause which is introduced its own attachment: who, which, that • She is the girl who won the race. • ...
Other parts of speech • Interjections: • Ouch! • Negatives: • no, not • Politeness markers: • Hello, bye • Existential: • There are 3 students sleeping.
Summary • Open class • nouns cat, spirit • verbs eat, cook • adjectives slow, large • adverbs slowly • Closed class • prepositions on, under, at • determiners a, the, some • pronouns she, who, I, other • conjunctions and, but, or • auxiliary verbs can, may, should • particles up, on, off • numerals one, two, first
The substitution test • Basic test to determine if 2 words belong to the same POS class intelligent The sad one is in the corner. green fat …
POS Tagging • Automatically assign POS tags to words in a text. • Children/NOUN eat/VERBsweet/ADJECTIVEcandy/NOUN • The/ARTICLE children/NOUN ate/VERB the/ARTICLE cake/NOUN • The/ARTICLE news/NOUN has/AUXILIARY been/MAINVERB quite/ADVERB sad/ADJECTIVE in/PREPOSITIONfact/NOUN./PERIOD
Why do POS Tagging? • 1st step towards NLU • easier then full NLU (results > 95% accuracy) • Useful for: • speech recognition/ synthesis (better accuracy) • how to recognize/pronounce a word • CONtent/noun VS conTENT/adj • stemming in IR • which morphological affixes the word can take • adverb - ly = noun (friendly - ly = friend) • Indexing in IR • pick out nouns which may be more important than other words in indexing documents
Tag Sets • A tag indicates the various conventional parts of speech. • Different Tag Sets have been used • Ex. Brown Tag Set, Penn Treebank Tag Set • Tag examples: • NP Proper noun • NN Singular noun • AT Article • DET Determinant • More on this later
Ambiguities in POS tagging • Children eat sweet candy/ noun. • Too much boiling will candy/ adjective the molasses. • Fruit flies / ? like / ? a banana.
Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge
Syntax or Phrase Structure • Syntax • study of the regularities and constrains of word order and phrase structure • the book is red vs red book is the • Grammar • expresses the relations among the constituents of a sentence
Constituents • also called, syntactic structures • Main Constituents: • S: sentence The boy is happy. • NP: noun phrase the little boy Sam Smith I three boy from Montreal • VP: verb phrase eat an apple sing leave Boston in the morning • PP: prepositional phrase in the morning about my ticket • AdjP: adjective phrase really funny rather clear very large • AdvP: adverb phrases slowly really slowly
Sentence Moods/Types • Declarative • Mary eats. • S --> NP VP • Imperative • Eat! • S --> VP • Yes-No Question • Did Mary eat? • S --> Aux NP VP • Wh-Question • When did Mary eat? • S --> WH-pro Aux NP VP
Noun Phrases • NP --> pre-modifiers head post-modifiers • head: central noun in NP • the little boy, the boy from Montreal • pre-modifiers: • determiners, cardinal, ordinal, quantifier • the boy, two boys, first boy, several boys • AdjP • funny boy, really funny boy • post-modifiers: • PP • flights from Montreal • non-finite clause • gerundive (-ing) • flights arriving from Montreal • -ed • dinner served on board, jewels stolen from thequeen • infinitive form • flight to arrive from Montreal • relative clause • flight that arrives from Montreal, girl who won the race
Verb Phrases • VP --> head-verb complements adjuncts • Some VPs: • Verb eat. • Verb NPleaveMontreal. • Verb NP PPleave Montreal in the morning. • Verb PP leave in the morning. • Verb Sthink I would like the fish. • Verb VP want to leave. want to leaveMontreal. want to leave Montreal in the morning. want to want to leave Montreal in the morning.
Subcategorisation frames • Some verbs can take complements that others cannot I want to fly. * I find to fly. • Verbs are subcategorized according to the complements they can take --> subcategorisation frames • traditionally: transitive vs intransitive • nowadays: up to 100 subcategories / frames
Prepositional phrases • PP --> Preposition NP • from Japan • inside my blue bag
Adjective Phrases • AdjP --> Adj Modifiers • tall • very tall • taller than Mary
Adverb Phrases • AdvP --> Adv Modifiers • affirmatively • very graciously • rather secretively
Context Free Grammars • set of non-terminal symbols • constituents & parts-of-speech • S, NP, VP, PP, Det, N, V, ... • set of terminal symbols • lexicon of words & punctuation • cat, mouse, nurses, eat, ... • a non-terminal designated as the starting symbol • sentence S • a set of re-write rules • having a single non-terminal on the LHS and one or more terminal or non-terminal in the RHS • S --> NP VP • NP --> Pro | PN | Det Nominal
S --> NP VP NP --> AT NNS NP --> AT NN NP --> NP PP VP --> VP PP VP --> VBD VP --> VBD NP P --> IN NP NNS --> children NNS --> students NNS --> mountains VBD --> slept VBD --> ate VBD --> saw AT --> the IN --> in IN --> of NN --> cake A simple context-free grammar The Grammar The Lexicon
S NP VP AT NNS VBD NP The children ateAT NN the cake A parse tree • a tree representation of the application of the grammar to a specific sentence.
Stochastic Grammars • Grammars obtained by adding probabilities to “algebraic” (i. e., non-probabilistic) grammars. • 1 S --> NP VP • 0.4 NP --> AT NNS • 0.4 NP --> AT NN • 0.2 NP --> NP PP • 0.1 VP --> VP PP • 0.1 VP --> VBD • 0.8 VP --> VBD NP • 1 P --> IN NP
Syntactic Dependencies • Local dependency • dependency between two words expressed within the same syntactic rule. • The 3/plural books/plural. • n-grams models this very well. • Non-local dependency • two words can be syntactically dependent even though they occur far apart in a sentence • Ex: subject-verb agreement • The children who found a wallet on the street yesterday while walking their dog were given a reward. • challenge for certain statistical NLP approaches (ex. n-grams) that model local dependencies.
Difficulties in parsing • Attachment ambiguity • The children ate the cake with a spoon. • The children ate (the cake with a spoon).?? • The children (ate with a spoon).??
Other difficulties • NP bracketing • plastic cat food can cover --> ? (plastic cat) (food can) cover --> ? plastic (cat food can) cover --> ? (plastic cat food) (can cover) • Conjunctions and appositives • Maddy, my dog, and Samy --> ?(Maddy, my dog), and (Samy) --> ?(Maddy), (my dog), and (Samy)
Another Ambiguity: Garden-Path Sentences • well-studied class of syntactic ambiguity • sentence is re-analysed when the last word in encountered • humans have difficulty analysing such sentences • Example: The horse raced past the barn fell. (the horse that was raced past the barn) fell.
Garden Path: Wrong Parse [S [NP The horse][VP raced past the barn]]fell dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase
Garden Path: Right Parse [S [NP The horse [PAP raced past the barn]][VP fell]] dt: determiner n: noun v: verb p: preposition S: sentence NP: noun phrase VP: verb phrase PP: prepositional phrase PAP: passive phrase
Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge
Semantics • the study of the meaning of words, constructions, and utterances • can be divided into two parts: • lexical semantics • meaning of words • compositional semantics • Meaning of sentences and discourse • the meaning of the whole often differs from the meaning of the parts.
Lexical Semantics • Meaning of individual words • I went to the bank of Montreal and deposited 50$. • I went to the bank of the river and dangled my feet. • Word Sense Disambiguation • Determining which sense of a word is used in a specific sentence • Semantic relations between words: • hypernymy, hyponymy, synonymy, antonymy, meronymy, holonymy, polysemy, homonymy and homophony.
Meaning of sentences • The cat eats the mouse = The mouse is eaten by the cat. • Goal: • built a representation of the meaning of the sentence • attach semantic roles to constituents • Some characteristics of a sentence that influence semantic interpretation: • Type declarative, interrogative, imperative, exclamatory • Polarity positive, negative • Tense past, present, future • Voice Active, passive • Some semantic roles (different from syntactic roles): • Agent the doer of a volitional act • Patient the thing that is affected by an act • Recipient the receiver of an object • Instrument the instrument used to perform an act. • Time the time the act is performed. • Location the location of an act or object. • …
Semantic Roles • Ex: • JohnAGENT hit PeterPATIENT with a ballINSTRUMENT. • Ex: • I ate spaghetti with meatballsINGREDIENT_OF_SPAGUETTI • I ate spaghetti with saladSIDE DISH_OF_SPAGUETTI • I ate spaghetti with a forkINSTRUMENT • I ate spaghetti with a friendACOMPANIER_OF_EATING • Important for machine translation… • I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED • ?Je PATIENT: PERSON_MISSED teAGENT: PERSON_LACKING_SOMEONE manque. • Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques.
Levels of study of NLP • Lexical • Phonetics & phonology • Parts-of-speech & Morphology • Phrase Structure and Syntax • Semantics • Pragmatics • Discourse • World-Knowledge