280 likes | 388 Views
עיבוד שפות טבעיות - שיעור שבע Partial Parsing. אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן. Syntax. The study of grammatical relations between words and other units within the sentence. The Concise Oxford Dictionary of Linguistics
E N D
עיבוד שפות טבעיות - שיעור שבעPartial Parsing אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן 88-680
Syntax • The study of grammatical relations between words and other units within the sentence.The Concise Oxford Dictionary of Linguistics • the way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses)Merriam-Webster Dictionary 88-680
Brackets • “I prefer a morning flight” • [S [NP [pro I]][VP [V prefer][NP [Det a] [Nom [N morning] [ N flight]]]]]] 88-680
Parse Tree S VP NP NP Nom Det Noun Noun Pronoun Verb I prefer a morning flight 88-680
Parsing • The problem of mapping from a string of words to to its parse tree is called parsing. 88-680
Generative Grammar • A set of rules which indicate precisely what can be and cannot be a sentence in a language. • A grammar which precisely specifies the membership of the set of all the grammatical sentences in the language in question and therefore excludes all the ungrammatical sentences. 88-680
Formal Languages • The set of all grammatical sentences in a given natural language. • Are natural languages regular? 88-680
English is not a regular language! • anbn is not regular • Look at the following English sentences: • John and Mary like to eat and sleep, respectively. • John, Mary, and Sue like to eat, sleep, and dance, respectively. • John, Mary, Sue, and Bob like to eat, sleep, dance, and cook, respectively. 88-680
Constituents • Certain groupings of words behave as constituents. • Constituents are able to occur in various sentence positions: • ראיתי את הילד הרזה • ראיתי אותו מדבר עם הילד הרזה • הילד הרזה גר ממול 88-680
The Noun Phrase (NP) • Examples: • He • Ariel Sharon • The prime minister • The minister of defense during the war in Lebanon. • They can all appear in a similar context:___ was born in Kfar-Malal 88-680
Prepositional Phrases • Examples: • the man in the white suit • Come and look at my paintings • Are you fond of animals? • Put that thing on the floor 88-680
Verb Phrases • Examples: • Getting to school on timewas a struggle. • Hewas trying to keep his temper. • That womanquickly showed me the way to hide. 88-680
Chunking • Text chunking is dividing sentences into non-overlapping phrases. • Noun phrase chunking deals with extracting the noun phrases from a sentence. • While NP chunking is much simpler than parsing, it is still a challenging task to build a accurate and very efficient NP chunker. 88-680
What is it good for • The importance of chunking derives from the fact that it is used in many applications: • Information Retrieval & Question Answering • Machine Translation • Preprocessing before full syntactic analysis • Text to speech • Many other Applications 88-680
What kind of structures should a partial parser identify? • Different structures useful for different tasks: • Partial constituent structure[NPI] [VPsaw [NPa tall man in the park]]. • Prosodic segments[I saw] [a tall man] [in the park]. • Content word groups[I] [saw] [a tall man] [in the park]. 88-680
Chunk Parsing • Goal: divide a sentence into a sequence of chunks. • Chunks are non-overlapping regions of a text: • [I] saw [a tall man] in [the park]. • Chunks are non-recursive • a chunk can not contain other chunks • Chunks are non-exhaustive • not all words are included in chunks 88-680
Chunk Parsing Examples • Noun-phrase chunking: • [I] saw [a tall man] in [the park]. • Verb-phrase chunking: • The man who [was in the park] [saw me]. • Prosodic chunking: • [I saw] [a tall man] [in the park]. 88-680
Chunks and Constituency Constituents: [a tall man in [the park]]. Chunks: [a tall man] in [the park]. • Chunks are not constituents • Constituents are recursive • Chunks are typically subsequences of Constituents • Chunks do not cross constituent boundaries 88-680
Chunk Parsing: Accuracy • Chunk parsing achieves higher accuracy • Smaller solution space • Less word-order flexibility within chunks than between chunks • Better locality: • Fewer long-range dependencies • Less context dependence • No need to resolve ambiguity • Less error propagation 88-680
Chunk Parsing: Domain Specificity Chunk parsing is less domain specific: • Dependencies on lexical/semantic information tend to occur at levels "higher" than chunks: • Attachment • Argument selection • Movement • Fewer stylistic differences within chunks 88-680
Chunk Parsing: Efficiency • Chunk parsing is more efficient • Smaller solution space • Relevant context is small and local • Chunks are non-recursive • Chunk parsing can be implemented with a finite state machine 88-680
Psycholinguistic Motivations Chunk parsing is psycholinguistically motivated: • Chunks as processing units • Humans tend to read texts one chunk at a time • Eye-movement tracking studies • Chunks are phonologically marked • Pauses, Stress patterns • Chunking might be a first step in full parsing 88-680
Chunk Parsing Techniques • Chunk parsers usually ignore lexical content • Only need to look at part-of-speech tags • Techniques for implementing chunk parsing: • Regular expression matching / Finite State Machines • Transformation Based Learning • Memory Based Learning • Others 88-680
Regular Expression Matching • Define a regular expression that matches thesequences of tags in a chunk • A simple noun phrase chunk regexp:<DT>? <JJ>* <NN.?> • Chunk all matching subsequences:the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN] • If matching subsequences overlap, the first one gets priority 88-680
Chunking as Tagging • Map Part of Speech tag sequences to {I,O,B}* • I – tag is part of an NP chunk • O – tag is not part of • B – the first tag of an NP chunk which immediately follows another NP chunk • Example: • Input: The little cat sat on the mat • Output: B I I O O B I 88-680
Chunking State of the Art • Depending on task specification and test set: 90-95% 88-680
Homework 88-680
Context Free Grammars • Putting the constituents together • Next Week… 88-680