1 / 52

74.406 Natural Language Processing

74.406 Natural Language Processing. Christel Kemke Department of Computer Science University of Manitoba. 74.406 Natural Language Processing, 1st term 2004/5. Evolution of Human Language. communication for "work" social interaction basis of cognition and thinking (Whorff & Saphir).

Download Presentation

74.406 Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 74.406 Natural Language Processing Christel Kemke Department of Computer Science University of Manitoba 74.406 Natural Language Processing, 1st term 2004/5

  2. Evolution of Human Language • communication for "work" • social interaction • basis of cognition and thinking (Whorff & Saphir)

  3. Communication "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651]

  4. Natural Language - General • Natural Language is characterized by • a common or shared set of signsalphabeth; lexicon • a systematic procedure to produce combinations of signs syntax • a shared meaning of signs and combinations of signs (constructive) semantics

  5. Natural Language Processing Overview • Speech Recognition • Natural Language Processing • Syntax • Semantics • Pragmatics • Spoken Language

  6. Natural Language and Speech • Speech Recognition • acoustic signal as input • conversion into phonemes and written words • Natural Language Processing • written text as input; sentences (or 'utterances') • syntactic analysis: parsing; grammar • semantic analysis: "meaning", semantic representation • pragmatics: dialogue; discourse; metaphors • Spoken Language Processing • transcribed utterances • Phenomena of spontaneous speech

  7. Words

  8. Morphology A morphological analyzer determines (at least) • the stem + ending of a word, and usually delivers related information, like • the word class, • the number and • the person of the word. The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system. eatseat + sverb,singular,3rd pers dogdog noun, singular

  9. Lexicon The Lexiconcontains information on words, as • inflected forms (e.g. goes, eats) or • word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category, • the word class or Part-of-Speech category Sometimes also • further syntactic information (see Morphology); • semantic information (e.g. semantic classifications like ‘agent’); • syntactic-semantic information, e.g. on verb complements like ‘give’ requires a direct object.

  10. Example contents: eats verb; singular, 3rd person; can have direct object dog dog, noun, singular; animal semantic annotation Lexicon

  11. POS (Part-of-Speech) Tagging POS Tagging determines word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems. The det (determiner) dog noun eat, eats verb (3rd singular) the det bone noun

  12. NLP - Syntactic Analysis Part-of-Speech (POS) Tagging Morphological Analyzer Parser Grammar Rules Lexicon eat + s eat – verbVerbVP → Verb NounVP recognized 3rd sing VP Verb Noun parse tree

  13. Syntax

  14. Language and Grammar • Natural Language described as Formal LanguageL using a Formal Grammar G: • start-symbol S ≡sentence • non-terminals NT ≡syntactic constituents • terminals T ≡lexical entries/ words • production rules P ≡grammar rules • Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules from G. • Overgeneration / undergeneration: accept/generate sentences not in L / not all sentences from L.

  15. Grammar • Terminalscan be words, part-of-speech categories, or more complex lexical items (including additional syntactic/semantic information related to the word). • dog • noun • dog: noun, singular; animal • Non-Terminals represent (higher level) ‘syntactic categories’. • noun • NP (noun phrase) • S (sentence)

  16. Grammar Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence). det the noun dog | bone verb  eat | eats NPdet noun (NPnoun phrase) VPverb (VPverb phrase) VP  verb NP S NP VP (Ssentence) Here, POS Tagging is included in the grammar.

  17. Parsing (here: LR, bottom-up) Determine the syntactic structure of the sentence: “the dog eats the bone” the det POS Tagging dog noun det noun  NP Rule application eats verb the det bone noun det noun  NP verb NP  VP NP VP S

  18. Syntax Analysis / Parsing Syntactic Structure often represented as Parse Tree. Connect symbols according to applied grammar rules (like Rewrite Systems).

  19. VP NP S verb NP det noun NP VP Parse Tree

  20. Lexical Ambiguity Several word senses or word categories e.g. chase – noun or verb e.g. plant - ????

  21. Syntactic Ambiguity Several parse trees: • “The dog eats the bone in the park.” • “The dog eats the bone in the package.” Who/what is in the park and who/what is in the package? Syntactically speaking: How do I bind the Prepositional Phrase "in the ..." ?

  22. Semantics

  23. Semantic Representation Represent the meaning of a sentence. Generate, e.g. • a logic-based representation or • a frame-based representation Fillmore’s case frames based on the syntactic structure, lexical entries, and particularly the head-verb, which determines how to arrange parts of the sentence and relate them to each other in the semantic representation.

  24. Semantic Representation Verb-centered representation: Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillersof caseslots. (cf. also Schank’s CD Theory) Typing of case roles is possible (e.g. 'agent' refers to a specific sort or concept, like “humans”)

  25. General Frame for eat Agent: animate Action: eat Patiens: food Manner: {e.g. fast} Location: {e.g. in the yard} Time: {e.g. at noon}

  26. Frame with fillers for sample sentence Agent: the dog Action: eat Patiens: the bone / the bone in the package Location: in the park

  27. General Frame for driveFrame with fillers Agent: animate Agent: she Action: drive Action: drives Patiens: vehicle Patiens: the convertible Manner:{the way it is done} Manner: fast Location: Location-spec Location: [in the] Rocky Mountains Source: Location-spec Source: [from] home Destination: Location-spec Destination: [to the] ASIC conference Time: Time-spec Time: [in the] summer holiday

  28. Pragmatics

  29. Pragmatics Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, … anaphoric references – refer to items mentioned before deictic expressions – simulate pointing gestures elliptic expressions – incomplete expression; relate to item mentioned before

  30. Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” deictic expression anaphoric reference

  31. Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” anaphoric reference

  32. Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” deictic expression

  33. Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” “The candy-box?” deictic expression anaphoric reference elliptic expression

  34. Pragmatics “I know that. But I can’t find itthere.” “The candy-box?” elliptic expression

  35. Intentions One philosophical assumption is that natural language is used to achieve something: “Do things with words.” The meaning of an utterance is essentially determined by the intention of the speaker.

  36. Intentionality - Examples What was said:What was meant: “There is a terrible "Can you please draft here.” close the window." “How does it look "I am really mad; here?” clean up your room." "Will this ever end?" "I would prefer to be with my friends than to sit in class now."

  37. Metaphors The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area, for example, seeing time as a line (in space) or seeing friendship / life as a journey.

  38. Metaphors - Examples “This car eats a lot of gas.” “She devoured the book.” “He was tied up with his clients.” “Marriage is like a journey.” “Their marriage was a one-way road into hell.” (see also George Lakoff, e.g. Women, Fire and Dangerous Things)

  39. Dialogue and Discourse

  40. Discourse / Dialogue Structure Grammar for various sentence types (speech acts): dialogue, discourse, story grammar Distinguish questions, commands,andstatements: • Where is the remote-control? • Bring the remote-control! • The remote-control is on the brown table. Dialogue Grammars describe possible sequences of Speech Acts in communication, e.g. that a question is followed by an answer/statement. Similar for Discourse (like continuous texts).

  41. Speech

  42. Speech Processing SystemsTypes and Characteristics • Speech Recognition vs. Speaker Recognition (Voice Recognition; Speaker Identification) • speaker-dependent vs. speaker-independent • training? • unlimited vs. large vs. small vocabulary • single word vs. continuous speech

  43. Speech Recognition Phases • acoustic signal as input • signal analysis - spectrogram • feature extraction • phoneme recognition • word recognition • conversion into written words

  44. Speech Recognizer Architecture

  45. Video of glottis and speech signal in lingWAVES (from http://www.lingcom.de)

  46. Spoken Language

  47. Spoken Language • Output of Speech Recognition System as input "text". • Can be associated with probabilities for different word sequences. • Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.

  48. Spoken Language - Examples • no [s-] straight southwest • right to [my] my left • [that is] that is correct Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html

  49. Spoken Language - Disfluency Reparandum and Repair [come to] ... walk right to [the] ... the right-hand side of the page Reparandum Repair

  50. Spoken Language - Example • we're going to [g-- ]... turn straight back around for testing. • [come to] ... walk right to the ... right-hand side of the page. • right [up ... past] ... up on the left of the ... white mountain walk ... right up past. • [i'm still] ... i've still gone halfway back round the lake again.

More Related