110 likes | 231 Views
Elicitation Corpus. April 12, 2003. Agenda. Tagging with feature vectors or feature structures Combinatorics Extensions. Annotating the corpus. Feature Vectors: Maria saw the girls. Snum-s, stype-prop, sanim-an, scount-na, sdef-def, vtype-perc, vtime-past, onum-pl, odef-def, etc.
E N D
Elicitation Corpus April 12, 2003
Agenda • Tagging with feature vectors or feature structures • Combinatorics • Extensions
Annotating the corpus • Feature Vectors: • Maria saw the girls. • Snum-s, stype-prop, sanim-an, scount-na, sdef-def, vtype-perc, vtime-past, onum-pl, odef-def, etc. • Feature Structures: • ((SUBJ ((num sg) (type prop) (anim an) (count na) (def def))) (vtype perc) (vtime past) (OBJ ((etc. • These are easy: they come right out of the parser.
Adapting parser output • Do we need to filter out irrelevant features? • E.g., features about “have” and “be” to make the English auxiliary system work. • E.g., (AUX- TYPE) = have
Not covered by the parser • Derived features: • does the subject outrank the object in animacy? • Constructional features: • Counterfactual conditional: If I had gone, I would have seen him. • Do we want to extend the parsing grammar to label these automatically? • Discourse/semantic/context features: • Context: Who saw John? • Elicitation sentence: Bill saw John. • Feature: subject is new information. • Elicitation sentence: He must see it. • Feature: evidential or deontic (obligation) • Features that aren’t used in English • Context: we=you and me (inclusive ‘we’) • Elicitation sentence: We are tall.
Example of Combinatorics: subject verb agreement • five numbers (singular, plural, dual, trial, paucal) • three genders (masculine, feminine, and neuter, and more for Bantu languages) • four persons (first, second, third, and fourth), • several levels of animacy (animate, inanimate, first and second person, third person) • two levels of definiteness (definite and indefinite) • huge number of tenses and aspects (present, past, future, non-past, non-future, near past, remote past, near future, remote future, continuous, perfective, etc.). Two steps? (1) Which features are involved? (2) Which values are involved?
Example of combinatorics: determiners and possessive pronouns • See handout.
Current Coverage of the Elicitation Corpus • Basic word order: intransitive verb and subject; transitive verb with subject and object; noun phrase with determiners, adjectives, an possessors. • Definiteness and animacy: special treatment of indefinite subjects, inanimate subjects, definite direct objects, animate direct objects, and sentences where the object outranks the subject in definiteness or animacy. • Agreeement (in number, gender, person, etc.): subject and verb; object and verb; determiner and noun; adjective and noun; possessor and noun; relative pronoun and noun. • Possessive NPs: with inalienable possession (body parts); kinship terms; alienable possession; pronominal possessors; full NP possessors. • Inflectional Features: gender, number, person, case, tense.
Not covered by the elicitation corpus • Subcategorization frames for major verb classes: stative, change of state, change of location, change of possession, creation, filling and covering, experience, cognition, perception, saying and telling, causatives, etc. • Voice: active, passive, and oblique voices. • Negation: sentences and noun phrases • Relative clauses: inflectional features of the relative pronoun; possible locations of the gap; headed or unheaded, etc. • Embedded clauses: argument clauses; adjunct clauses; nominalized clauses.
Not Covered • Coordination: sentences (switch reference and same subject), noun phrases, and other constituents. • Questions: Yes-no questions (positive answer expected and negative answer expected); • Open questions (possible locations of gaps). • Other constructions: comparatives, conditionals, causatives, desideratives, imperatives, possessor ascension, quantifier float, noun incorporation (polysynthesis). • Each of these has a few parameters to check: e.g., does the causee come out in dative or accusative case; can the incorporated noun take an unincorporated modifier; which NPs can possessors ascend from/quantifiers float from, etc. • Further coverage of tense, aspect, and modality: present, past, and future time; ongoing and completed actions; punctual and non-punctual activities; habituality; iteration; realized and non-realized. • Cross product of these with lexical aspect: state, activity, accomplishment, punctual.
Not Covered • Information structure: treatment of topic (given information) and focus (new information), including clefted and topicalized sentences. • Other meanings that are typically grammaticalized: yet, still, only, distributive (each), etc. • Other noun phrase phenomena: quantification, deictic determiners, classifiers, etc.