260 likes | 402 Views
PrepNet: a Framework for Describing Prepositions: Preliminary Investigation results. Patrick Saint-Dizier IRIT-CNRS, France. Long-term objectives. Construct a repository of preposition syntactic and semantic behaviors,
E N D
PrepNet: a Framework for Describing Prepositions: Preliminary Investigation results Patrick Saint-Dizier IRIT-CNRS, France
Long-term objectives • Construct a repository of preposition syntactic and semantic behaviors, • Develop a multi-level approach, from prototypical uses to unexpected ones, that accounts for diversity of preposition uses and for their polysemic behavior, • Develop a relatively shallow semantic characterization based on frames, • Investigate the verb-preposition-NP relations: restrictions and compositionality • Develop a multi-lingual approach. Applications: MT, Knowledge extraction, QA, etc.
This paper:basic elements of a preliminary approach • Introduce a general characterization of preposition senses viewed as abstract notions, • Characterize these abstract notions by means of frames(viewed as linguistic or conceptual macros), • Populate preposition frames via corpus and then validate, • Develop a multi-level characterization of preposition uses, to organize the diversity of their uses in language, • Raise a few questions about multilinguality (prepositions can be realized by other categories or by morphology in some languages) Investigate evaluation methods, in abstracto, and via applications.
Related work • Very little in CL circles compared to verbs and nouns, in spite of their necessity in a number of applications (MT, IE, QA, …), • Almost nothing in EWN, FrameNet or VerbNet, • Some valuable work in AI: e.g. temporal, spatial reasoning, • A few isolated works in linguistics on a given preposition, • Quite a lot of work in psycho-linguistics. Other resources: B. Dorr’s large description for English, with MT in view (about 500 entries).
Why is that so ? • High polysemy (but may be not more than adjectives?, and smaller number: 95 preps. in French + compounds, 32 in Spanish: not always agreement on what a preposition is…..) • Linguistic realizations very difficult to predict, large number of idiosyncratic uses and cross-linguistic differences, • Syntactic difficulties due to the chain V-Prep-N, e.g.: PP-attachment problems, VPC, • Deep level in the semantic-cognitive structure: prepositions often used in metalanguages as primitives Study here only compositional uses of prepositions
Global architecture of the proposal Prep. Senses: 3 level set of abstract notions Shallow semantic representation with strata Uses in language 1 Uses in language 2 etc.
General architecture (1): categorizing preposition senses Preposition categorization on 3 levels: • Family (roughly thematic roles): localization, manner, quantity, etc. • Facets: localization: source, position, destination, etc. • Modalities. Facets viewed as abstract notions on which PrepNet is based • 12 families defined
Families/ facets Quantity: numerical/ frequency / proportion Accompaniment: adjunction/ simultaneity/ inclusion/ exclusion Manner: means/ manners and attitudes/ imitation or analogy Localisation: source/ destination/ via/ fixed position Choice and exchange: exchange / choice or alternative / substitution Causality: cause/ goal or consequence/ intention Opposition Ordering: priority/ subordination/ hierarchy/ ranking/ degree of importance Minor elements: about, in spite of, comparison (see examples in paper) Conceptual/ ontological status of these dictinctions ??
Families ‘superframes’ : general principles and restrictions • Facets: frames, strata: subframes : with some general forms of inheritance and property consistency • Whenever appropriate: modalities subframes Frames are viewed as linguistic macros, to be interpreted. They are shallow or coarsed-grained representations so far. Language realizations are a priori associated with the lower level frame nodes.
(2): a conceptual, prelexical structure • name + gloss, • shallow restrictions • simplified LCS representation Frame of abstract notion strata of abstract notion: subframes SF1 SF2 SF3
Structure of a frame • Structure: • Number, name, gloss, • Frame with shallow constraints: X <Action> Y [Number] Z • Conceptual representation in simplified LCS (kind of LST) • In the future: inferential patterns (within a frame or among frames) • 195 senses/abstract notions described using 65 primitives • Shallow constraints: • (1) generic semantic types • (2) generic verb class types from WordNet • (3) generic semantic fields from the LCS: temp, poss, loc, psy, epist, perc, amount, comm, prop, abs, etc.
Example 1: ‘via’ [1] :VIA - generic. 'An entity X moving via a location Y' X <ACTION> [1] Y X: concrete entity, ACTION: movement verb, Y: location representation: X : via(loc, Y) French synset: {par, via} example: Jean rentre par la porte Stratification 1: [1.1] : VIA - narrow passage. 'An entity X moving via / an action that uses a narrow passage in an object Y' X <ACTION> [1.1] Y X: concrete entity, ACTION: perception verb, Y: location with a narrow passage representation: X : through(loc or temp, Y) French synset: {a travers, au travers de, dans} example: Jean regarde a travers la grille / dans les jumelles. .
Example 1, cont’: Stratification 2: [1.2.1] VIA UNDER – from generic 'An entity X moving via under a location Y' X <ACTION> [1.2.1] Y X: concrete entity, ACTION: movement verb, Y: location with a form of passage under it representation: X : via(loc, under(loc,Y)) French synset: {par dessous} example: Jean passe par dessous le pont. [1.2.2] VIA ABOVE – from generic etc.
Example 2: instruments Stratification requires the taking into account of 2 relations, characterized by means of primitives (Mari and Saint-Dizier 03): • Actor/instrument: undergo (no control), select (controls another prop.), control, • Instrument/ V+NP object: be (passive, but participates), react (other prop than controlled by the agent), act (full participation) Contrast: cut the bread with a knife / eat soup with a spoon John burned himself with boiling oil. • A generic entry for instruments, and, potentially: 9 strata (combinations), depends on language. 4 strata for French
(2) cont’ [5] : MANNER - MEANS - Instrument 'Someone X doing an action Y using instrument Z.' X <ACTION> Y [5] Z X: human, ACTION: verb of change, Y: object Z: instrument representation: X: by-means-of(_, Z) Followed by a priori 9 Strata. Example: Application to French: 1. Be(X,Z) Λ Undergo(Z, Action+Y) : synset: {grâce à} , restrictions… 2. Be(X,Z) Λ Select (Z, Action+Y) : synset: {par} , restrictions… 3. Select(X,Z) Λ React (Z, Action+Y) : synset: {avec} , restrictions… 4. Act(X,Z) Λ Control (Z, Action+Y) : synset: {avec, au moyen de}, …..
(3) The language realization level SFi (= lower frame level) Multi-level partitioning of realizations from usage norms Direct uses Indirect uses etc… etc… Derived types, … restr1 restr2 restr3 …. … + frequency measures synset1 synset3 synsets ??
Populating preposition frames from corpora • Conceptual frames are associated with shallow constraints Move on to the language level, elements of a method: • For a given language: associate each frame strata with corpus and dictionary observations • Manual analysis: identify prototypical uses, promote usage norms multi-level partitioning of realizations • Contrast, if possible, direct versus indirect (mainly metaphorical) realization levels • Elaborate conceptual/ontological status of categorizations and related constraints (mainly semantic types)
A few notes • Multi-level architecture: helps to account for the large variety of (compositional) behaviors, investigate in more depth partitioning strategies, incremental depth to get finer-grained analysis worth pursuing?? • For each synset: develop frequency measures, identify contexts of use (from syntactic to type of text): frequency rates are very diverse (some uses are only found in dictionaries!) • Populate but then valide on new corpora: develop several forms of corpus annotations (the frame; the relation with the head, with the NP, etc.)
Looking at other languages • Hypothesis: given an abstract notion (interlingua), translations are constructed on the basis of the restrictions that hold on the corresponding synsets, BUT: • Large realization variations are in general observed, even for closely related languages: up to what point is this just surface language contrasts? Or is it also conceptual ? : Regarder dans le microscope / look through the microscope (durch; a travès de) • Some languages have do not use so much pre-/post-positions, but other categories, incorporation in heads, or just case marks .
Preliminary conclusions • Preliminary investigation to identify difficulties and organize the research, • Global architecture looks an interesting approach • Abstract notion definitions seem to be quite stable, status of strata needs further investigations, • Multi-level approach to language realizations seems a good direction, but needs a much larger testing on a number of languages and a more clear method to organize sets of realizations • Implement an open system on the Web.
Some obvious research directions • ontological/conceptual status of categorizations and restrictions, • Investigate integration with other frameworks: VerbNet, FrameNet, • Investigate preposition polysemy and derived uses in more depth, and ways to characterize it • Relations Head-preposition-NP, and compositionality (Head is often a verb, but can be any other kind of predicate): some PPs have wider scope over the proposition. • Inferential patterns associated with prepositions (e.g. for approximation notions, spatial notions, etc.)