350 likes | 369 Views
This document provides an overview of feature structure representation, focusing on its representational aspect. It explores the use of feature structures in linguistics and language resource management, and explains how feature structures capture partial information and make finer-grained distinctions among objects. The formal definition of feature structures and different notations used to represent them are also discussed.
E N D
Overview of Feature Structure Representation Kiyong Lee klee@korea.ac.kr July 7, 2003 Sapporo, Japan
Preliminary Remark • This work item on feature structures concerns itself with their representational aspect only. • The task of specifying their description or declaration mechanism shall be taken up as a separate but related new work item. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Historical Background • Initial use of feature structures to theoretical linguistics in treating distinctive features of phonological segments and representing their oppositions in 1930’s or earlier. • Extensive use made in 1960’s in generative phonology and then to the development of grammar formalisms in 1980’s and finally to other computational or theoretical work such as parsing, lexicology or formal semantics in 1990’s and up to the present. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Use of feature structures • Feature structures are an essential part of many linguistic formalisms as well as • an underlying mechanism for representing the information consumed or produced by and for language resource management or information technology. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
This international standard provides a format to represent, store or exchange feature structures in natural language applications, • both for the purpose of annotation or production of linguistic data. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
General Characteristics of Feature Structures Feature structure representation can: • Capture the notion of partial information. • Make finer-grained distinctions among objects being described. • Provide a systematic format for accommodating various types of constraints on organizing information ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
1. Capturing partiality of information • A feature structure represents partial information about an object being described. • This information can be manipulated or augmented monotonically • with simple operations like unification, generalization or merging, • while such feature structures can explicitly be compared with each other with respect to some formally defined relation such as subsumption. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
2. Finer-grained distinctions • Feature structure representation can also make finer-grained distinctions among objects being analyzed. • Phonemes, for instance, like /p/ and /k/ are non-continuant consonants which share the property of being peripheral or non-coronal with respect to their place of articulation in the oral cavity, thus sometimes behaving similarly toward some phonological process of assimilation. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
3. Organizing information • Feature structure representation makes it easier to organize information systematically by accommodating various constraints or inheritance mechanisms into the sorting out of information from various sources ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Formal definition of feature structure • A feature structure is thus formally defined • either as a partial function from features (or attributes) to values in set-theoretic terms • or as a directed acyclic graph (dag) in graph-theoretic terms. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Typed feature structure • Feature structures may be constrained by some typing. • Multiple inheritance type hierarchy, for instance, constrains the construction of well-formed feature structures. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Notations • Feature structures can be represented in either matrix or graph format. • Each matrix consists of pairs of a feature (or attribute) and its unique value, thus being called AVM. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Each graph representing a feature structure, on the other hand, consists of • - a single root, labeled and directed branches, and terminal nodes. • Each node including the particular node called root represents a type and each label on a branch a feature. • A type can either be an atom or a complex object that is a feature structure itself. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Path • In a graph notation, a feature structure can easily be seen as consisting of many but possibly null paths. Each path starts with the root and goes through each branch to a terminal node, thereby consisting of labels on its branches. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Shared values or reentrant • Some features may share a token identical value. The verb runs, for example, has the agreement value of 3rdSg and so does the noun Mary. • Such shared values are called reentrants because, in graph notation, two branches that share a value emerge into one and the same node. • In matrix format, reentrancy or sharing is represented by a tag with the same index. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
List as a value • A feature may take a list of values. • The argument structure of a predicate, for instance, may take as value a list of arguments, say Subject and one or two Objects, depending on the type of the predicate. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
F: <v1, v2, v3> ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Systematic representation of feature structures • In typed feature structure, the value of each feature is a type and • typing is constrained by particular applications. • Nevertheless, values need be characterized in some systematic ways. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
One is to build libraries for feature, feature-values or feature structures • fLib • fsLib • fvLib • through some clustering and identification mechanism. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Another is to introduce structures into values or organize them as singleton, set, bag (or multiset) or list. • Singleton: {a} • Set: {a,a} = {a}, • {a,b} = {b,a} • Bag: m{a,a} /= m{a}, • m{a,b} = m{b,a} • List: <a,b> /=<b,a> ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
A third way may be to introduce special values such as • -<plus, minus> for binary values • -<any> for the Boolean truth value variable, cf. wild card? • -<none> for the Boolean falsity value variable, • -<dft> for default value, or • -<uncertain> for uncertainty value. • cf. <unknown> ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Finally, the <rel> attribute may be provided for some values. • The rel = ne for non-equality may be introduced to exclude a certain feature-value pair, while allowing other alternatives. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Comments from P-members • See Annex 2 to ISO/TC 37/SC 4 N 053 • Bulgaria: extensive use of FSR in applications • Japan: XML’s interoperability of other description tools such as RDF and OWL and accommodation of the dump format for LAF ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Korea: ensuing work on description or declaration language for FSR and B-R Ryu’s detailed proof-reading • UK: the SGML/XML implementation should not be so prominent, since FSR is a data-modeling tool. • No comments from France because … ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Proof Reading • Part of Annex B, namely on subsumption, should be added to Table of Contents because that notion is used in Chapter 5. ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Suggestions for Improvements • Any restructuring of the document? • Addition of subsections on historical background, basic and Boolean operations,definition of type and multiple inheritance hierarchy • Adding more terms to section 3 • Different uses of the term “tag”?? ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Table of Contents • 4 General Characteristics of Feature Structure • Overview(move to the beginning of the document) • 4.0 Historical Background(addendum) • 4.1 Use of Feature Structures • 4.2 Basic Concepts • 4.2.1 Typed Feature Structure(addendum) • 4.2.2 Multiple Inheritance Hierarchy(addendum) • 4.3 Notations • 4.3.1 Graph Notation • 4.3.2 Matrix notation • 4.4 Shared Feature Structure or Reentrancy • 4.5 Basic Operations and Relations(addendum) • 4.6 List, Set, etc as Feature Values(addendum) • 4.7 Boolean Operations and Relations(addendum) ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Discussion of singleton, set, bag, and list as possible forms of feature-values in section 4 • Coordination of sections 4 and 5, for instance, by converting examples in section 5 into AVM’s and listing them in section 4. • More illustrations or restricting them to language related ones, perhaps from MSA? ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Main action to be taken by the joint working group The following items have been identified as requiring revision in the current document: - Preliminary formal description of feature structures - Provision of a simplified representation (FS lite) describing the basic subset of FS representation without libraries; -Provision of a re-entrance mechanisms; -Description of typed feature structure; - Simplification of feature value content by replacing some elements (<symb>, <num> etc.) by references to types (à la XML schemas) - Provision of more NLP related examples -.[Note: reference to pointer/linking group for ID/IDREF mechanisms] ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Questions? • Is an XML representation the third type of notation for FS viewed as being at the same descriptive level as AVM and DAG? • List possible applications, say to lexicology, (polysemy, dialectal variations), MorSA, and Description of Sem Rep (tripartite analysis of Quantification), with relevant illustrations? • Degree of formality in definitions, perhaps in Annexes ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan
Editor’s responsibility • Contact each expert with specific questions for revising the document • Koiti Hasida, Manfred Pinkal, and Eric de Clergerie agreed to write up some comments • KH: use of XML for representing FS • MP: • EC: coordination of sections 4 and 5 specification of atomic values ISO/TC 37/SC 4/WG 1 Meeting Sapporo, Japan