1 / 29

Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

KONVENS Wien, 15 Sep 2004 EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription of spoken language. Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg. Background. Multilingual Database , SFB 538 „Mehrsprachigkeit“, University of Hamburg

kerri
Download Presentation

Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KONVENS Wien, 15 Sep 2004EXMARaLDA – A modeling and visualization framework for the computer-assisted transcription of spoken language Thomas Schmidt SFB 538 ‚Mehrsprachigkeit‘ University of Hamburg

  2. Background • Multilingual Database, SFB 538 „Mehrsprachigkeit“, University of Hamburg • EXMARaLDA (Extensible Markup Language for Discourse Annotation) • Dissertation project „Computer-based transcription of spoken language as a modelling and visualisiation process“ (Supervisor: Angelika Storrer)

  3. Background • Transcription of spoken language • Interviewer / child interaction • Classroom interaction • Interpreted doctor-patient discourse • for discourse / conversation analysis • for (child) language acquisition studies

  4. Background • Problem: Diversity of Transcription Data • Theoretical diversity: • Entities of transcription (utterances, turns, non-verbal activities etc.) • Relations between entities (temporal, hierarchical, features, ...) • Presentation formats (partitur notation, column notation, ...) • Technological diversity: • Storage formats (text, binary, RDB) • Software (syncWriter, HIAT-DOS, DBM-Systems, word processors, ...) • Operating Systems (Windows, MAC OS)

  5. Background

  6. Background

  7. Background • Problem: Diversity of Transcription Data • Aim: A common platform for computer-assisted transcription • Exchange, reuse, archive transcription data Merge corpora Use different software tools with one piece of data

  8. Background • Problem: Diversity of Transcription Data • Aim: A common platform for computer-assisted transcription • (Elements of a) Solution • XML technology • Three level architecture • Separate form from content • Separate logical from physical structure

  9. Topics of this talk 2. Components of the developed system • Some methodological considerations: Linguistic methods  Computer science methods „Computing in the humanities“ Interdisciplinary communication

  10. Modified view Computer Transcription as... Visualisation Visualisation Visualisation Modelling Form Analogue model Symbolic model Model theory view Application vs. Logical layer Form E/R model Form View Database view Form Form Content Document... Form Text technology view Methodological considerations Transcript Transcription as... „Verschriftlichung“ Theory Established view Readability Adequacy Quality criteria

  11. Methodological considerations Transcription as Modeling and Visualization of spoken language • Accordance with text-technological concepts • One model, different visualizations • No tradeoff between readability and adequacy • No tradeoff between human and computer processability • No “Standardization” of models • a common modelling framework, not a common model • no ontological specifications • XML = Standardization of physical representation

  12. Visualization to Model

  13. Visualization to Model • Structural relations: • Temporal sequence

  14. Visualization to Model • Structural relations: • Temporal sequence • Simultaneity

  15. Visualization to Model • Structural relations: • Temporal sequence • Simultaneity • Equivalence (Entity  Feature)

  16. Visualization to Model • Structural relations: • Temporal sequence • Simultaneity • Equivalence (Entity  Feature) • Hierarchy (Containment)

  17. Modeling framework • Relational?  Sequence? Simultaneity? • OHCO?  Simultaneity? • DAG: Annotation Graphs?  Complexity? •  Transcription Graphs

  18. System architecture

  19. Application: Input tools EXMARaLDA Partitur-Editor

  20. Application: Input tools Simple EXMARaLDA Text file

  21. Application: Input tools TASX annotator

  22. Application: Input tools PRAAT

  23. Application: Input tools EUDICO Linguistic Annotator (ELAN)

  24. Application: Visualization ... as a wrapped partitur ... as a line transcript ... in column notation

  25. Application: Corpus management EXMARaLDA Corpus Manager (COMA)

  26. Application: Query/Analysis Search and Query Instrument for EXMARaLDA (SQUIRREL)

  27. Project status • Software past beta stage • Five projects at our own institution use EXMARaLDA for their corpus work • Around 800 users in research and teaching outside SFB • Used at the IDS in Mannheim • Submitted a suggestion for integration of data model into P5 of the TEI guidelines

  28. Summary • Transcription as theory and „Verschriftlichung“  Computer-assisted transcription as modelling and visualisation • Interdisciplinary bridge / Methodology of computational techniques in „classical“ linguistics • Concrete practical improvements for work with transcription data EXMARaLDA and Database „Multilingalism“ • Data model, formats and tools building on the separation of model and visualisation

  29. Fin.

More Related