820 likes | 982 Views
Introduction to Natural Language Processing (aka, Computational Linguistics). Slides by me, Martha Palmer, Eleni Miltsakaki, Dan Jurafsky, Tarkan Kacmaz, and others. Overview. NLP without linguistics (4-5 weeks) Information Retrieval (search) Text Classification
E N D
Introduction to Natural Language Processing (aka, Computational Linguistics) Slides by me, Martha Palmer, Eleni Miltsakaki, Dan Jurafsky, Tarkan Kacmaz, and others
Overview • NLP without linguistics (4-5 weeks) • Information Retrieval (search) • Text Classification • Pattern Matching and Information Extraction • NLP with sequence structure (~3 weeks) • HMMs, CRFs • Sequence labeling tasks • NLP with more structure (~3 weeks) • Grammars and parsing • Learning grammars • Semantic role labeling • Selected topics (~2 weeks) • Learning representations and domain adaptation • Knowledge-based language processing
Practical Matters • Prereqs: General understanding of probability and statistics • Grading: • 20% quizzes and in-class participation • 25% Midterm • 20% Project • 35% Final • I will supply some ideas for projects later • Projects to start after the midterm. • You’re welcome and encouraged to suggest your own project ideas.
When we study human language, we are approaching what some might call the “human essence”, the distinctive qualities of mind that are, so far as we know, unique to man. Noam Chomsky
WHAT IS LANGUAGE? Definition with respect to form: Language is a system of speech symbols. It is realized acoustically (sound waves), visually-spatially (sign language) and in written form. Definition with respect to function: Language is the most important means of human communication. It is used to convey and exchange information (informative function) Multiplicity of languages: We know of about 7000 languages, which is estimated to be about 1% of all the languages that ever existed.
THEORIES OF LANGUAGE Noam Chomsky claims that language is innate. B. F. Skinner claims that language is learned; it is basically a stimulus-response mechanism.
WHAT IS GRAMMAR? When we learn a language we also learn the rules that govern how language elements, such as words, are combined to produce meaningful language. These elements and rules constitute the Grammar of a language. The Grammar is “what we know” Grammar represents our linguistic competence.
Prescriptive Descriptive DESCRIPTIVE vs PRESCRIPTIVEGRAMMAR (is) (should be)
Areas of Linguistics phonetics - the study of speech sounds phonology - the study of sound systems morphology- the rules of word formation syntax - the rules of sentence formation semantics - the study of word meanings pragmatics – the study of discourse meanings sociolinguistics - the study of language in society applied linguistics –the application of the methods and results of linguistics to such areas as language teaching, national language policies, lexicography, translation, language in politics etc.
What is phonetics? • Phonetics is the science of speech. • We all speak. • But how many of us know how we speak? • Or what speech is like? • Phonetics seeks to answer those questions.
Orthography and Sounds • The English language is not phonetic. • Words are not spelled as they are pronounced • There is no one-to-one correspondence between the letters and the sounds or phonemes.
Orthography and Sounds • Did he believe that Caesar could see the people seize the seas. • The silly amoeba stole the key to the machine
Articulatory Phonetics • The production of any speech sound involves the movement of an air stream. • Most speech sounds are produced by pushing the air out of the lungs through the mouth (oral) and sometimes through the nose (nasal).
Phonology • Phonology deals with the system and pattern of speech sounds in a language. • Phonology of a language is the system and pattern of speech sounds.
Phonology Phonological knowledge permits us to: • produce sounds which form meaningful utterances, • to recognize a “foreign” accent, • to make up new words, • To know what is or is not a sound in one’s language • to know what different sound strings may represent
Phonetics vs Phonology Phonology Phonetics The study of the way speech sounds form patterns. The study of speech sounds.
Sequences of Phonemes b l ı kI b k ı k l ı bı l b k b ı l k b k ı l k ı l b ı b l k k b l ı possible impossible • “I just bought a beautiful new blick” What is a blick? • “I just bought a beautiful new bkli” WHAT!!
Sequences of Phonemes • Your knowledge of English “tells” you that certain strings of phonemes are permissible and others are not. • That’s why /bkli/ does not sound like an English word. • It violates the restrictions on the sequencing of phonemes; i.e. it violates the phonological rules of English.
Rules of Phonology • Delete a word-final /b/ when it occurs after a /m/as in: But not! bomb crumb lamb tomb bombard crumble limber tumble
Morphology & Syntax • Morphology deals with the combination of morphemes into words. • Syntax deals with the combination of words into sentences.
What is the meaning of ‘meaning’? • Learning a language includes learning the “agreed upon” meanings of certain strings of sounds and, • Learning how to combine these meaningful units into larger units which also convey meaning.
Morphemes • Morpheme is the smallest linguistic unit that has meaning. • Morpheme is a grammatical unit in which there is an arbitrary union of sound and a meaning and, • which cannot be further analysed (broken down into parts that have meaning).
Morphemes • A morpheme may be represented by a single sound: • e.g. the plural morpheme [s] in cat+s • A morpheme may be represented by a syllable (monosyllabic): • e.g. child+ish
Morphemes A morpheme may be represented by more than one syllable (polysyllabic): • e.g. lady, water or three syllables: • e.g. crocodile or four syllables: • e.g. salamander
Words • Two basic ways to form words • Inflectional (e.g. English verbs) • Open + ed = opened • Open + ing = opening • Derivational (e.g. adverbs from adjectives, nouns from adjectives) • Happy happily • Happy happiness (nouns from adjectives)
Syntax The study of classes of words and the rules that govern how the words can combine to make phrases and sentences.
Basic classes of words • Classes of words aka parts of speech(POS) • Nouns • Verbs • Adjectives • Adverbs • The above classes of word belong to the type open class words • We also haveclosed class words • Articles, pronouns, prepositions, particles, quantifiers, conjunctions
Basic phrases • A word from an open class can be used to form the basis of a phrase • The basis of a phrase is called the head
Examples of phrases • Noun phrases • The manager of the institute • Her worry to pass the exams • Several students from the English Department • Adjective phrases • easy to understand • mad as a dog • glad that he passed the exam
Examples of phrases • Adverb phrases • fast like the wind • outside the building • Verb phrases • ate her sandwich • went to the doctor • believed what I told him
“Complements” • Notice that to be meaningful the verb “go”, for example requires a phrase for “location” • *John went • John went home • Such phrases “complete” the meaning of the verb (or other type of head) and are called complements
Inside the noun phrase • NPs are used to refer to things: objects, places, concepts, events, qualities, etc • NPs may consist of: • A single pronoun (he, she, etc) • A name or proper noun (John, Athens, etc) • A specifier and a noun • A qualifier and a noun • A specifier and a qualifier and a noun (e.g., the first three winners)
Specifiers • Specifiers indicate how many objects are describedand also how these objects relate to the speaker • Basis types of specifiers • Ordinals (e.g., first, second) • Cardinals (e.g., one, two) • Determiners (see next slide)
Determiners • Basic types of determiners • Articles (the, a, an) • Demonstratives (this, that, these, those) • Possessives (‘s, her, my, whose, etc) • Wh-determiners (which, what –in questions) • Quantifying determiners (some, every, most, no, any, etc.)
Qualifiers • Basic types of qualifiers • Adjectives • Happy cat • Angry feelings • Noun modifiers • Cook book • University hospitals
Inside the verb phrase • A simple VP • Adverbial modifier + head verb + complements • Types of verbs • Auxiliary (be, do, have) • Modal (will, can, could) • Main (eat, work, think)
Types of verb complements • Intransitive verbs do not require complements • Transitive verbs require an object as a complement (e.g. find a key) • Transitive verbs allow passive forms (e.g. a key was found) • Ditransitive verbs require one direct and on indirect object (e.g. give Mary a book)
Other verb complements • Clausal complements • Some verbs require clausal complements • Mary knows that John left • Prepositional phrase complements • Some verbs requires specific PP complements • Mary gave the book to John • Others require any PP complement • John put the book on the shelf/in the room/under the table
Adjective phrases • Simple • Angry, easy, etc • Complex • Pleased with the prize • Angry at the committee • Willing to read the book • Complex AdjP normally do not precede nouns, they are used as complements of verbs such as be or seem
Adverbial phrases • Indicators of • Degree • Location • Manner • The time of something (now, yesterday, etc) • Frequency • Duration • Location in the sentence • Initial • Medial • Final
Grammars and parsing • What is syntactic parsing • Determining the syntactic structure of a sentence • Basic steps • Identify sentence boundaries • Identify what part of speech is each word • Identify syntactic relations
Context Free Grammar S -> NP VP NP -> det (adj) N NP -> Proper N NP -> N VP -> V, VP -> V PP VP -> V NP VP -> V NP PP, PP -> Prep NP VP -> V NP NP
Parses The cat sat on the mat S NP VP Det PP N V the cat sat NP Prep N on Det mat the