1 / 29

A Common Ontology for Linguistic Concepts

A Common Ontology for Linguistic Concepts. Scott Farrar University of Arizona. Overview. Discuss a project for the preservation of endangered language data. Endangered language data Overview of the EMELD system Special focus on the linguistic ontology motivation detailed examples

woodruffr
Download Presentation

A Common Ontology for Linguistic Concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona

  2. Overview • Discuss a project for the preservation of endangered language data. • Endangered language data • Overview of the EMELD system • Special focus on the linguistic ontology motivation detailed examples special problems

  3. Endangered Languages • As many as half of the world’s languages (3000 out of 6000) are in danger of disappearing LaPolla (1998) • In particular, languages in Africa (Kiong), the Americas (Hopi), Australia (Mara), and Southeast Asia (Biao Min). • Many are not written languages. • Language data is of interest because of cultural and scientific significance.

  4. EMELD • EMELD (Electronic Metastructure for Endangered Languages Data) • An NSF funded effort • Purpose is to preserve endangered language data.

  5. Linguistic Data • Field linguists collect data. • Hopi: sivu-’ikwiw-ta-qa [vessel-carry: on: back-DUR-REL] ‘kachina’ • Linguistic data includes: grammars, dictionaries, text, sound recordings, glossed corpora object language markup markup

  6. Linguistic Markup Purpose: classifying languages, discovering universals, preserving knowledge. Form: XML, HTML, etc. (stand-off vs. in-line) Content: grammatical categories and relations, speech acts, cultural content

  7. Challenges to Preserving Linguistic Data • Authors or communities want exclusive rights. • No standardized method for linguistic encoding. • Language data should be searchable and comparable—broad access. • Local control should be balanced with data interoperability.

  8. Attempts at Standardization • Text Encoding Initiative (TEI) (Sperberg-McQueen and Burnard 1994) • Corpus Encoding Standard (CES) (Ide and Romary 2000)

  9. Solution Strategy for Preserving Language Data • Integrate the data using metadata—a linguistic ontology. • Make the data available on the Semantic Web. • Build tools to access and process the data—query engines, expert systems.

  10. UI • Query: • Australian • SOV • Text EMELD Query Engine Linguistic Ontology Semantic Web Hopi Mocovi Biao Min EMELD Architecture

  11. Markup is Mapped to the Linguistic Ontology Linguistic Ontology Hopi <DUR> <REL> <ANMT> Biao Min <ILL> <NOUN> <CAUS> Mocovi <SENT> <REL> <MOOD>

  12. Interoperability ontology markup markup markup markup Data Data Data Data Data

  13. Ontology Formalizes Metadata Nouns represent objects. Tense relates an event with a point in time. Case is a relation between a predicate and its argument, e.g., He knows him. SOV, SVO, OVS are possible natural language word orders. general concepts linguistic concepts

  14. Linguistic Ontology • Conceptual model for the linguistics domain • Starting point is morpho-syntax (tense, case, aspect, causality, inflection) • Built on top of the Standard Upper Merged Ontology (SUMO) (Niles and Pease 2001)

  15. Standard Upper Merged Ontology(SUMO) • extensible resource • incorporates concepts from a number of top-level ontologies • already includes a number of concepts related to semiotics and linguistics • peer-reviewed (IEEE working group) and freely available (http://suo.ieee.org)

  16. The Upper Ontology contains nearly 700 terms and 2500 assertions Abstract Class Proposition Quantity PhysicalQuantity Number … Attribute BiologicalProperty … Entity Physical Object Organic Inorganic … Collection Process NaturalProcess Motion …

  17. Predicates in the SUMO • Structural Predicates instance domain domainSubclass subclass subrelation valence range documentation disjoint (subclass Virus Microorganism)

  18. Predicates in the SUMO • Linguistic/Semiotic Predicates refers represents containsInformation realization representsForAgent representsInLanguage (containsInformation ?SENT ?PROP)

  19. Upper Linguistic Ontology Entity Physical Object ContentBearingObject Icon SymbolicString LinguisticExpression WrittenLinguisticExpression Text Sentence Phrase Word Morpheme SpokenLinguisticExpression Dialogue Sentence Phrase Word Morpheme SUMO Concepts

  20. Upper Linguistic Ontology (cont.) Abstract Class Relation Predicate GrammaticalRelation Aspect Tense Case Agreement Attribute GrammaticalAttribute Gender Person Number Extended SUMO contains 300 concepts

  21. Simple Query Example (same-concept ?Morpheme PT) EMELD markup for ‘past tense’ Examples containing synonymous tags in other datasets: PAST, PST, … ,TENSE

  22. Simple Query Example ContentBearingObject SymbolicString Tag KiongTag ‘PT’ MaraTag ‘PAST’ Class Relation Predicate GrammaticalRelation Tense PastTense represents represents

  23. Query Example subdomain—Language and Space Expressed by case, prepositions and verbs

  24. Morphosyntactic Case Case InherentCase Spatio-KineticCases PositionalCase InessiveCase DirectionalCase IllativeCase ExistentialCase AbessiveCase PartitiveCase InstrumentalCase StructuralCase GenitiveCase AccusativeCase NominativeCase instances of Case express propositions e.g., spatio-kinetic relations or predicate-argument “who did what to whom”

  25. Query Example (cont.) (same-concept ?Morpheme ‘into’) EMELD Glossed text from Hopi Word list from Biao Min

  26. Query Example (cont.) LinguisticExpression Word Preposition ‘into’ Case InherentCase SpatioKineticCase … IllativeCase (=> (exists ?OBJ1 ?OBJ2 ?EVENT) (instance ?OBJ1 SelfConnectedObject) (instance ?OBJ2 SelfConnectedObject) (penetrable ?OBJ2) (instance ?EVENT Motion) (outside ?OBJ1 ?OBJ2) (inside ?OBJ1 ?OBJ2))

  27. Challenges to Conceptual Modeling • Ontological Commitment: text and speech can express the same proposition. But are they the same? • Linguistic theory: what is a Sentence anyway? • Linguistics touches on many domains • causality • time and space • action

  28. Future directions • Extend ontology into the domains of phonology and discourse analysis. • Recommend markup approach (XML) • Explore applications of the ontology beyond the immediate EMELD project: • Topic Maps: Consider the linguistic ontology as part of a Published Subject for linguistics. • Machine Translation: Explore the need for such an ontology in MT.

  29. Contact Info • Scott Farrar farrar@u.arizona.edu • See our website: http://emeld.douglass.arizona.edu

More Related