580 likes | 614 Views
Sentence Grammar. Human Language Technology. Introduction. This lecture has several themes: Crash course in sentence-level grammar Jurafsky and Martin 2nd ed. Chapter 12 Internet Grammar of English http://www.ucl.ac.uk/internet-grammar/
E N D
Sentence Grammar Human Language Technology HLT - Sentence Grammar
HLT - Sentence Grammar Introduction • This lecture has several themes: • Crash course in sentence-level grammar • Jurafsky and Martin 2nd ed. Chapter 12 • Internet Grammar of Englishhttp://www.ucl.ac.uk/internet-grammar/ • Show how different linguistic phenomena can be captured by grammar rules. • Dependency Parsing • Tagsets and Treebanks
Grammar of English Part 1 HLT - Sentence Grammar
HLT - Sentence Grammar Different Kinds of Rule • Morphological rules.. govern how words may be composed: re+invest+ing = reinvesting. • Syntactic rules .. govern how words and constituents combine to form grammatical sentences. • Semantic rules .. govern how meanings may be combined.
HLT - Sentence Grammar Syntax: Why? • You need knowledge of syntax in many applications: • Parsing • Grammar checkers • Question answering/database access • Information extraction • Generation • Translation • Full versus superficial analysis?
HLT - Sentence Grammar Levels of Grammar Organisation • Word Classes: different parts of speech (POS). • Phrase Classes: sequences of words inheriting the characteristics of certain word classes. • Clause Classes: sequences of phrases containing at least one verb phrase. On the basis of these one may define: • Grammatical Relations: role played by constitutents e.g. subject; predicate; object • Syntax-Semantics interface: mapping between syntactic structures and meaning
HLT - Sentence Grammar Word Classes • Closed classes. • determiners : the, a, an, four. • pronouns : it, he etc. • prepositions : by, on, with . • conjunctions : and, or, but. • Open classes. • nouns refer to objects or concepts: cat , beauty , Coke. • adjectives describe or qualify nouns: fried chickens. • verbs describe what the noun does: John jumps. • adverbs describe how it is done: John runs quickly.
HLT - Sentence Grammar Word Class Characteristics • Different word classes have characteristic subclasses and properties
HLT - Sentence Grammar Phrases • Longer phrases may be used rather than a single word, but fulfilling the same role in a sentence. • Noun phrases refer to objects: four fried chickens. • Verb phrases state what the noun phrase does: kicks the dog. • Adjective phrases describe/qualify an object: sickly sweet. • Adverbial phrases describe how actions are done:very carefully. • prepositional phrases: add information to a verb phrase: on the table
HLT - Sentence Grammar Phrases can be Complexe.g. Noun Phrases • Proper Name or Pronoun: Monday; it • Specifier, noun: the day • Specifiers, premodifier, noun:the first wet day • Specifiers, premodifier, noun, postmodifier:The first wet day that I enjoyed in June
HLT - Sentence Grammar was sunny. But they all fit the same context • Monday • It • The day • The first wet day • The first wet day that I enjoyed in June
HLT - Sentence Grammar Clauses • A clause is a combination of noun phrases and verb phrases • Clauses can exist at the top level (main clause) or can be embedded (subordinate clause) • Top level clause is a sentence. E.g.The catate the mouse. • Embedded clause is subordinate e.g.John said that Sandyis sick. • Unlike phrases, whole sentences can be used to say something complete, e.g. to state a fact or ask a question.
HLT - Sentence Grammar Different Kinds of Sentences • Assertion: John ate the cat. • Yes/No question: Did John eat the cat? • Wh- question: What did John eat? • Command: Eat the cat John! • NB. All these forms share the same underlying semantic proposition.
Context Free Grammar Rules Part II HLT - Sentence Grammar
HLT - Sentence Grammar Formal Grammar • A formal grammar consists of • Terminal Symbols (T) • Non Terminal Symbols (NT, disjoint from TS) • Start Symbol (a distinguished NT) • Rewrite rules of the form , where and are strings of symbols
Classes of Grammar HLT - Sentence Grammar
HLT - Sentence Grammar Classes of Grammar • Learnability • Different classes of grammar result from various restrictions on the form of rules
HLT - Sentence Grammar Restrictions on Rules • For all rules • Type 0 (unrestricted): no restrictions • Type 1 (context sensitive): |||| • Type 2 (context free): • is a single NT symbol • Type 3 (regular) • Every rule is of the form A aB or A a where A,B NT and aT
HLT - Sentence Grammar Which Class for NLP? • Type 3 (Regular). Good for morphology. Cannot handle central embedding of sentences.The man that John saw eating died. • Type 2(Context Free). OK but problems handling certain phenomena e.g. agreement. • Type 1 (Context Sensitive). Computational properties not well understood. Too powerful. • Type 0 (Turing). Too powerful.
HLT - Sentence Grammar Weak versus Strong • Grammar class that is too restrictive • cannot characterise/discriminate exactly NL sentence structures. • Grammar class that is too general • has the power to characterise/discriminate structures that don't exist in human languages. • More general, higher complexity→ less efficient computations.
HLT - Sentence Grammar Example Grammar • Cabinet discusses police chief’s case • French gunman kills four • s np vp • np n • np adj n • np n np • vp v np
HLT - Sentence Grammar Classifying the Symbols • NT – symbols appearing on the left • Start – symbol appearing only on the left from which every other symbol can be derived. • T – symbols appearing only on the right • To include words we also need special rulessuch as n [police]n [gunman]n [four] • Latter rules define the lexicon or “dictionary interface”.
HLT - Sentence Grammar Grammar InducesPhrase Structure s vp np np adj n v n French gunman kills four
HLT - Sentence Grammar Phrase Structure • PS includes information about • precedence between constituents • dominance between constituents • PS constitutes a trace of the rule applications used to derive a sentence • PS does not tell you the order in which the rules were used
HLT - Sentence Grammar Procedural versus Declarative • A grammar induces a structure but does not tell you how to discover that structure • A grammar is declarative • A parser is a procedure that, given a suitable representation of a grammar and a sentence, actually discovers the structure(s). • A parser is procedural
HLT - Sentence Grammar Handling Linguistic Phenomena • Different sentence-types • Nested structures • Agreement • Multiwords • Subcategories of verb
HLT - Sentence Grammar Different Sentence Types........Different Grammar Rules • DeclarativesJohn left.S → NP VP • ImperativesLeave!S →VP • Yes-No QuestionsDid John leave?S →Aux NP VP • WH QuestionsWhen did John leave?S →Wh-word Aux NP VP
HLT - Sentence Grammar Recursively NestedStructures handled by .... • Flights to Miami • Flights to Miami from Boston • Flights to Miami from Boston in April • Flights to Miami from Boston in April on Friday • Flights to Miami from Boston in April on Friday under $300. • Flights to Miami from Boston in April on Friday under $300 with lunch.
HLT - Sentence Grammar Recursive Rules • NP → N • NP → NP PP • PP → Preposition NP • Flightsfrom miami to boston
HLT - Sentence Grammar Ambiguity • np np pp • pp prep np • (the man) (on the hill with a telescope by the sea) • (the man on the hill) (with a telescope by the sea) • (the man on the hill with a telescope)( by the sea) • etc.
HLT - Sentence Grammar Handling Agreement • NP → Determiner N • Include these days, this day • Exclude this days, these dayNP → NPSingNP → NPPlurNPPlur → DetSing NSingNPPlur → DetPlur NPlur • Agreement also includes number, gender, case. • Danger: proliferation of categories/rules.
HLT - Sentence Grammar Handling Multiwords • John ran up the stairs • John rang up the doctor • John ran the stairs up* • John rang the doctor up • John rang the doctor who lives in Paris up
HLT - Sentence Grammar Ordinary CF rules don’t work • John rangup the doctor • VP → V NP • here V is multiword • John rang the doctor up • VP → V NP particle_from _V • here, multiword has split into two parts • challenge is to express the relation between the parts
HLT - Sentence Grammar Subcategorisation • Intransitive verb: no objectJohn disappearedJohn disappeared the cat* • Transitive verb: one objectJohn opened the windowJohn opened* • Ditransitive verb: two objectsJohn gave Mary the bookJohn gave Mary*
HLT - Sentence Grammar Subcategorisation Rules • Intransitive verb: no objectVP → V • Transitive verb: one objectVP → V NP • Ditransitive verb: two objectsVP → V NP NP • If you take account of the category of items following the verb, there are about 40 different patterns like this in English.
HLT - Sentence Grammar Overgeneration • A grammar should generate only sentences in the language. • It should exclude sentences not in the language. s n vp vp v n [John] v [snore] v [snores]
HLT - Sentence Grammar Undergeneration • A grammar should generate all sentences in the language. • There should not be sentences in the language that are not generated by the grammar. s n vp vp v n [John] n [gold] v [found]
HLT - Sentence Grammar s vp n v d a n John ate a juicy hamburger Appropriate Stuctures • A grammar should assign linguistically plausible structures. s vs. np vp np n v d a n John ate a juicy hamburger
HLT - Sentence Grammar Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or is it just a collection of special cases? • How ambiguous is it?
HLT - Sentence Grammar Tagsets • The main parts of speech reflect naturally occurring occurrence data. • Practical applications often make use of special tags which include additional information such as number and case. • One of the most commonly used tagsets is the 45-tag Penn Treebank tagset, used for the Brown corpus.
HLT - Sentence Grammar Penn Treebank Tagset
HLT - Sentence Grammar POS Tagging The grand jury commented on a number of other topics POS Tagger The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.
HLT - Sentence Grammar Treebanks • Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one). • These are generally created • By first parsing the collection with an automatic parser • And then having human annotators correct each parse as necessary. • This requires detailed annotation guidelines that provide a • POS tagset, • a grammar and • instructions for how to deal with particular grammatical constructions.
HLT - Sentence Grammar Penn Treebank • Penn Treebank is a widely used treebank maintained by the Linguistic Data Consortium. • The Penn Treebank Project annotates naturally-occurring text for linguistic structure. • Contains skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. • Most well known is the Wall Street Journal section containing 1M words from the 1987-1989 Wall Street Journal.
HLT - Sentence Grammar Penn Treebank Example
HLT - Sentence Grammar Treebank Grammars • Treebanks implicitly define a grammar for the language covered in the treebank. • Simply take the local rules that make up the sub-trees in all the trees in the collection and you have a grammar. • Not complete, but if you have decent size corpus, you’ll have a grammar with decent coverage.
HLT - Sentence Grammar Treebank Grammars • Such grammars tend to be very flat due to the fact that they tend to avoid recursion. • For example, the Penn Treebank has 4500 different rules for VPs. Among them...
HLT - Sentence Grammar Heads in Trees • Finding heads in treebank trees is a task that arises frequently in many applications. • Particularly important in statistical parsing • We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.
HLT - Sentence Grammar Lexically Decorated Tree
HLT - Sentence Grammar Head Finding • The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.