1 / 16

text-technology.de

Information Modelling of Language and Text: XML-based, multi-level, semantic-oriented . - Some methods (only)-. www.text-technology.de. Research Group „Texttechnological Information Modelling“. University of Bielefeld: D. Gibbon MODELEX D. Metzing SEKIMO

shima
Download Presentation

text-technology.de

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Modelling of Language and Text: XML-based, multi-level, semantic-oriented.-Some methods (only)- www.text-technology.de

  2. Research Group „Texttechnological Information Modelling“ University of Bielefeld: D. Gibbon MODELEX D. Metzing SEKIMO associated: J.-T. Milde Multimodal Corpora TASX University of Dortmund: A. Storrer HYTEX University of Giessen: H. Lobin SEMDOC University of Tübingen: U. Mönnich COMOD The TASX-Annotator: http://tasxforce.lili.uni-bielefeld.de/

  3. Methodological issues: Multidimensionality of linguistic data requires: • multiple tiers of annotation (xml-based) • connections between multiple tiers (specific methods) • multi-annotation of identical raw data (multiple trees) • specific relations between multi-level annotations

  4. Methodological issues: Multidimensionality of linguistic data requires: (5) a distinction between one or more conceptual levels (semantic markup) and one or more annotation layers (syntactic markup) as well as mappings between both (6) ways to make use of and to generate different annotation sets (annotation + data) given more uniform conceptual representations (accessibility of corpora (search, hypothesis testing, comparative or typological analysis))

  5. Semdoc: Annotation structural <sect1> <para> ... From the now infamous McDonald's coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet (see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996). <footnoteref linkend="i5">5</footnoteref> </para> </sect1> thematic <segment id="s24" parent="g6" newtopic="illustration_bck" litref="s" footnoteref="s33a"> From the now infamous McDonald's coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet (see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996).5 </segment> rhetorical <segment id="i17" parent="i56" relname="span"> From the now infamous McDonald&apos;s coffee spill case to litigation against Ford and Firestone for injuries caused by tire tread separation to tobacco litigation, high stakes civil cases have become familiar staples of our media diet</segment> <segment id="i18" parent="i17" relname="evidence“>(see e.g., Are lawyers burning America, 1995; Budiansky, 1995; Church, 1986; Langley, 1986; Stossel, 1996).5</segment>

  6. Sekimo: Multiple annotations of Japanese dialogue corpora • Annotation categories are based upon widely used tag-sets like IPADIC (Chasen) • The results of corpus analysis can be used to • compare the tag-sets empirically • augment tag-sets with conceptual information, • reuse existing corpora which are based upon the same tat-sets

  7. Sekimo: Sample Annotation

  8. Example: Modeling of congruency in Japanese and German Conceptual difference of congruency reflects in different configurations of annotations, related via secondary information structuring: Lexical-pragmatic congruency watashi ha murano to moushimasu Morpho-syntactic congruency Ich heiße Meier General two annotation units have marker Ja-Germ-1 verb has marker Ja-1 verb and utterance have marker Ja-Germ-2 sentence has subject Ja-Germ-3 subject has marker

  9. Visualisation as SVG graphic

  10. WORD NOUN KOPULA Sekimo: Example for mapping annotations <->concepts Concepts Mapping noun word[@pos=„noun“] NOUN word[feature=„pos“ & value=„noun“] Annotations <noun>watashi</noun> <word pos=„noun“>watashi</word> <word><feature>pos</feature> <value>noun</value> watashi</word> Transformation <NOUN> watashi </NOUN>

  11. ModeLex: Temporal Calculus (Allen) for multimodal annotations • Relations between annotation layers • Can be applied to • Text: Order is given by character sequence • Signal: Order is given by timestamps

  12. Lexicon Model: Subclassification of annotation units, based upontemporal relations Classification hierarchy Corpus Properties class properties of class subclass properties of subclass subsubclass properties of subsubclass

  13. HyTex: Multi-level approach Adaptive generation of hypertext views on coherence criteria User model (static or dynamic) TermNet: Representation of semantic relations between technical terms of the domain Textgrammatical annotation: Definitions and technical terms Topical and rhetorical structures Linguistic annotation: POS-Tagging Lemmatization Chunk-Parsing

  14. WordNet Project, Princeton University GermaNet Project, University of Tübingen Exchange of entities and relations for the TermNet model TEMIS: Text Mining Solutions Heidelberg/Paris Annotation schema for anaphoric and co-reference relations in German texts. Usage of the Text Mining-Tool Knowledge Extractor for the annotation of definitions DFG-Forschergruppe 437:text technological modelling of information Intelligent Views: Knowledge Management, Darmstadt Usage of the tool „K-Infinity“ supporting the convenient construction and maintenance of the TermNet DEREKO: Corpus Technology at the University of Tübingen Chunk Parser for the syntactic annotation of the HyTex corpus Hytex:Research Cooperation and Contacts Text-grammatical foundations for the (semi)automated text-to-hypertext conversion (HyTex)

  15. secondary information structuring and comparative discourse analysis DFG-Forschergruppe 437 Texttechnologische Informationsmodellierung NITE:NaturalInteractivity Tools Engineering University of Southern DenmarkUniversitat Autònoma deBarcelona DFKI Saarbrücken HCRC Edinburgh IMS Stuttgart ILC Pisa Sekimo: Project Context SFB Mehrsprachigkeit Hamburg:Jadex Japanese and German expert discourse in mono- und multilingual constellations

  16. Research Group „Texttechnological Information Modelling“ January 2004 International Conference Center for Interdisciplinary Research „Modeling Linguistic Information Resources“ University of Bielefeld • Semantics of Generic Document Structures and Discourse Parsing • Modelling Textual, Lexical and World Knowledge as a Basis for Hypertext Linking • Multiple Annotation of Language Data • Multimodal Lexical Information for Language Documentation

More Related