1 / 34

Hypermedia Lexica and Lexicon Metadata

Hypermedia Lexica and Lexicon Metadata. The MetaLex model in the ModeLex project Dafydd Gibbon U Bielefeld Europe E-MELD Workshop, Detroit, August 2002. Overview.

jake
Download Presentation

Hypermedia Lexica and Lexicon Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypermedia Lexica andLexicon Metadata The MetaLex model in the ModeLex project Dafydd GibbonU BielefeldEurope E-MELD Workshop, Detroit, August 2002

  2. Overview Metalex goalsBackground: DATR, Hyprlex, Speech, Language DocumentationMetalex design: theory and practiceLexical documents & metadocuments Lexical objects, properties, structuresMetalex implementationIvory Coast encyclopaedia project Ega documentation model project The Modelex (multimodal lexicon) project Ivory Coast + Nigeria documentation curriculum projectExtending metalexModalities & submodalities Data-driven lexicography Data structures & algorithms: trees, lattices; induction, inference

  3. Metalex goals: background • General objectives: • Versatile high quality spoken language lexicography • Motivated balance of high-tech + low tech • Good resources are data-driven and theory-informed • Specific project objectives: • DATR/ILEX: formal lexicon theory and implementation • VerbMobil: integrated HyprLex dissemination model • HyprLex encyclopaedia model for Ivory Coast Languages • Ega endangered language documentation model • Modelex - theory and design of multimodal lexica • Ivory Coast and Nigeria curricula for language documentation

  4. Metalex design: data and theory • Data-driven data + metadata acqusition: • Systematic metatext derived from and supporting ... • Computational fieldwork • Induction of lexica • Theory-informed data + metadata acquisition: • Integrated Lexicon (ILEX) consisting of ... • Abstract Lexicon (ALEX) - "theory" in the mathematical sense • Object Lexicon (OLEX) - "model" in the mathematical sense

  5. Metalex design: data • Data-driven acquisition: • Computational fieldwork • Portable metadatabase with restricted vocabulary and general metatext, and • Definition of and support for transcription + annotation • Portable support for scenarios, scripts • Portable support for lexicon processing • Induction of lexica • Lexicon tools for • Extraction of macrostructural elements (lexeme elements) • Induction of microstructural information (media concordance, POS, ...) • Induction of mesostructural regularities and subregularities (grammar, ...)

  6. Metalex design: theory • Theory-informed formalisation: • Abstract Lexicon (ALEX) - "theory" in the mathematical sense • Decomposition (componential A-V description) • Generalisation (inheritance) • Composition (multilinear operations) • Object Lexicon (OLEX) - "model" in the mathematical sense • XML archiving and dissemination formats • object-relational database acquisition and processing formats • = Integrated Lexicon (ILEX)

  7. Metalex implementation:architecture • Data model Ç Theory = shared lexicon architecture: • Macrostructure: declarative and procedural components • Lexicon architecture: relational, inheritance, text, ... • Lexical objects: entry types • Lexical access: fact query, semasiological / onomasiological indexing • Mesostructure: • Generalisations: grammar, phonetics, cultural background, ... • Composition of lexicon object types: idioms, words, morphemes, ... • Lexical access: inferential query • Microstructure: • Lexical entry (article, lemma structure - atom, string, tree, ...) • Types of lexical information - standardly: "lexicon model"

  8. Metalex implementation:microstructure • Microstructure specification philosophy: • Anybody can specify any kind of unpredictable detail • Questionnaire / Experiment / Corpus / Archive dependence • Lexicon architecture: relational, inheritance, text, ... • Intelligent (semi-)automatic classification, not fixed attributes • Theory-informed coarse grouping is possible • Media attributes: visual, auditory, tactile, ... • Meaning attributes: definition, gloss, lexical relations, ... • Composition attributes: context/category, parts, operations • Use attributes: style, register, concordance, media illustrations, ... • Micrometadata attributes: lexicographer DB indices, source (e.g. fieldwork metadata) DB indices, modification, ...

  9. Metalex implementation:fieldwork metadata source (1) Situation dimensions • participant: fieldworker, partners, contacts • channel: modalities, media • locale: indoor/outdoor, spatial configuration • temporal: date, time, calendar event • functional: affiliation, role, occasion; observation (prompt, metadata management) Language dimension • affiliation • discourse level: discourse type, genre + prosody • phrase level: recursive phrasal categories/relations + prosody • word level: clitics, inflexion, word formation + prosody

  10. Metalex implementation:fieldwork metadata source (2) Technical dimension • physical characteristics of participants: age, sex, health • physical characteristics of locale: indoor/outdoor, spatial configuration, temporal sequence, date (season), time (of day) • audio: mike type, position, room; A/D; channels, fsample, resolution; formats • video: camera & microphone type, analogue/digital; filters, lenses; audio; formats • other sensors: laryngograph, airflow, data glove, ... Metalinguistic dimension • empirical method: introspection, experiment, corpus elicitation • materials: questionnaire, experiment layout, corpus scenario • metadata specification: index, metatext type, metacatalogue type

  11. Metalex implementation:fieldwork metadata entry tool LREC 2002, Workshop on Portability Issues

  12. Metalex implementation:fieldwork metadata entry tool HanDBase DBMS for PalmOS

  13. Metalex objectsin conjunction with work in ISLE CLWG(Computational Lexicon Working Group) (see Gibbon in reading list) LEXICON: • { < Macrostructure > , < Mesostructure > } • Macrostructure: Ordering( {ENTRY, ...} ) • Mesostructure: < FrontmatterMetadata, Descriptions > ENTRY: • < Microstructure, HousekeepingMetadata >

  14. The LEXICON object Front Matter Metadata: • Bibliographical: creator, publisher, title, date, ... • Medium / format: paper, CD-ROM/DVD, web, ... Macrostructure type: • access: semasiological/onomasiological, • n-lingual/langue(s), • special: taxonomy (thesaurus), concordance • structure, e.g. tabular: f(type,attrib)=value

  15. The ENTRY object: metadata Entry Metadata: (see Gibbon & al. in reading list) • Entry type (wrt macrostructure specification): • encyclopaedic • multiword unit, word, ... • Microstructure data model specification: • entry structure: flat, tree, graph (net), ... • dta categories specification (atribute, field, information type) • DC groups - structural skeleton • DCs • DC substructure - homography, homophony, polysemy ...

  16. The ENTRY object: DC groups Media ("surface"): • acoustic (phonetic, earcon, sonification,), visual (orthography, icon, gesture, ...) Composition (structure): • part (e.g. morphology for words), context (e.g. POS, subcat for words) Meaning (definition, illustration): • semantic (components, relations, senses, ontology) • pragmatic (speech act, dialogue, disfluency, ...) Use: typically: media (e.g. audio) concordance, ... Metadata: lexicographer, ...

  17. The ENTRY object: DCs Countless Data Category models: (see reading list) • every existing dictionary • linguistic "types of lexical information" • several European projects (GENELEX, MULTILEX, ACQUILEX, ...) • ISO terminology norms (cf. MARTIF etc. ...)

  18. The ENTRY object: DC structures Computationally relevant properties of fields: • type (atomic, complex: tree, string, xyz-formatted text) • character encoding spec.: ASCII, Unicode, xyz • tree (or other graph/net): • finite depth • flat, disjunctive disjunctive tree • recursive graph (net) • table, non-tree graph, anchor/link/index structure • generated text: • print, hypertext (compiled vs. dynamic (generated on the fly)

  19. Metalex microstruture application Media ("surface"): • phonemic & tonemic transcription (SAMPA ASCII - still waiting for Unicode...) Composition (structure): • morphemic substructure, category & subcategory Meaning (definition, illustration): • glosses (English, French, German) • definitions, senses, relations, components; audio-visual illustration Use: genres; examples (e.g. concordance link); free text notes Metadata: first record; last field

  20. Metalex field lexicon microstruture Anouman_1: • Media attributes: • Phonemic tier: `an'U~m`'a~ • Skeletal tier: VNVNV • Tonal tier: L H LH • Signal tier: Audio • Meaning attributes: • F-gloss: Oiseau • E-gloss: Bird • G-gloss: Vogel • Definition: avis • Homophone full: Anouman_2: grandchild • Homophone phonemic: Anouman_3: yesterday • Use: • < Concordance pointer > • Genre: narrative • Metadata: • Lexicographer: S. Adouakou • Source: Bielefeld-Anyi-Corpus, Adaou village, CI • Date: March 2002

  21. Metalex portable lexical database Relational database: • Metalex specs flattened • structure re-constitution via metalex specs • HanDBase for PalmOS • Features: • standard full RelDBMS • XML, CSV, text export • export/import via GSM • inexpensive (wrt laptop) • stylus, keyboard, sync input • light weight • low power consumption • inconspicous in use • interfaces to Scheme, C

  22. Metalex extensionThe Modelex project:"Theory and Design of Multimodal Lexica" Goals: • Data-driven, theory-informed lexicon models • Formal properties of abstract data models for multimodal lexica • Interpretation of abstract data models in XML • Integration of parallel annotation lattices for modalities and submodalities • Development of a prototype multimodal lexicon

  23. The Modelex domain:modalities and submodalities

  24. Modelex: data driven lexicography

  25. Modelex: gesture annotation Time Aligned Signal Corpus System (Java, GPL) Jan-Torsten Milde, U Bielefeld TASX annotator: • Phonological tier • ToBI tiers • Gesture tier • Speech Act tier Anyi, Ega, German

  26. Model-theoretic compilation in ILEX:INTERPRETATION ( ALEX ) = OLEX

  27. Metalex in the Modelex project:Multimodal concordance as microstructure DC Prototype: http://www.spectrum.uni-bielefeld.de/langdoc/PAX/

  28. Metalex in the Modelex project:underspecified ALEX microstructure for gesture coordinates Hand: <parts> == "Palm" "Digit" <vector> == "<name>" <coord "<name>"> <coord> == "<x1>" "<y1>" "<x2>" "<y2>" <> == . Palm: <parts> == <vector> <name> == palm <width> == pw <height> == ph <x1 fore> == <x1> <x1 middle> == ( <x1> + ( <x2> - <x1> ) / 3 ) <x1 ring> == ( <x1> + ( <x2> - <x1> ) * 2 / 3 ) <x1 pinky> == <x2> <x1> == px1 <y1> == py1 <x2> == ( <x1> + <width> ) <y2> == ( <y1> + <height> ) <> == Hand .

  29. Metalex in the Modelex project:fully specified ALEX microstructure for gesture coordinates Hand:<parts> = palm px1 py1 ( px1 + pw ) ( py1 + ph ) thumb px1 py1 ( px1 - lt ) py1 fore px1 py1 px1 ( py1 - lf ) middle ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) ( py1 - lm ) ring ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) ( py1 - lr ) pinky ( px1 + pw ) py1 ( px1 + pw ) ( py1 - lp )

  30. Metalex: conclusion & prospects User complexity: • demands an open, data-driven approach Domain: • demands a theory-informed approach • with computational acquisition & inference Data-driven and theory-informed lexica • are possible (METALEX) • need integrated model-theoretic approach (ILEX): INTERPRETATION (ALEX) = OLEX • a formal problem remains: differing complexity of trees (archive): simulation of other graphs via semantics only annotation lattices (data), tables (lexica): regular relations if non-recursive, indexed grammars if recursive?

More Related