380 likes | 496 Views
ISO 16642. TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria. Overview. General principles. Expressing constraints on the representation of computerized terminologies What is the underlying structure of computerized terminologies?
E N D
ISO 16642 TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria
General principles • Expressing constraints on the representation of computerized terminologies • What is the underlying structure of computerized terminologies? • Which data-category is used and under which conditions? • Maintaining interoperability between representations • Providing a conceptual tool to compare two given formats
Definitions • TMF: Terminological Mark-up Framework • Definition of underlying structures and mechanisms needed for the computer representation of terminological data • Independence with regards any specific format • TML: Terminological Mark-up Language • One specific representation format generated within TMF • E.g.: DXLT is a possible TML
A family of formats TMF … TML1 TML2 TML3 TML1 (Geneter) (DXLT)
Meta-model Representing the underlying structure of terminological data
Terminological Data Collection 0:1 * * 1 1 1 Global Information Terminological Entry Complementary Information * * Terminology- related Information 1 * Language Section 1 * * 1 Term Section * * 1 Term Component Section
The structural skeleton Terminological Data Collection (TDC) Global Information (GI) Complementary Information (CI) * Terminological Entry (TE) * Language Section (LS) * Term Level (TL) * Term Component Level (TCL)
How does this work? Walking through an example…
DXLT example <termEntryid='ID67'> <descrip type='subjectField‘>manufacturing</descrip> <descrip type='definition'>A value between 0 and 1 used in ...</descrip> <langSetlang='en'> <tig> <term>alpha smoothing factor</term> <termNote type='termType'>fullForm</termNote> </tig> </langSet> <langSetlang='hu'> <tig> <term>Alfa ...</term> </tig> </langSet> </termEntry>
id=‘ID67’ [attribute] subjectField=‘ manufacturing ’ [typedElement] definition=‘A value…’ [typedElement] TE lang=‘ en ’ [attribute] LS lang=‘ hu ’ [attribute] TS term=‘…’ [element] term=‘alpha smoothing factor’ [element] termType=‘fullForm’ [typedElement] Identifying the structural skeleton TE: Terminological Entry LS: Language Section TS: Term Section
TMF information model id=‘ID67’ subjectField=‘ manufacturing ’ definition=‘A value…’ TE LS LS lang=‘ hu ’ lang=‘ en ’ term=‘alpha smoothing factor’ termType=‘fullForm’ TS term=‘…’ TS
GMT representation <struct type=“TE”> <feat type=“id”>ID67</feat> <feat type=“subjectField”>manufacturing</feat> <feat type=“definition”>A value between 0 and 1 used in ...</feat> <struct type=“LS”> <feat type=“lang”>en</feat> <struct type=“TS”> <feat type=“term”>alpha smoothing factor</feat> <feat type=“termType”>fullForm</feat> </struct> </struct> <struct type=“LS”> <feat type=“lang”>hu</feat> <struct type=“TS”> <feat type=“term”>Alfa ...</feat> </struct> </struct> </struct>
TML à la mode ISO • Ingredients • A structural skeleton • (take the TMF Metamodel) • A reference Data Category Registry • ISO 12620 is a good place to find one • Recette • Choose some data categories from the registry • You can even constrain the values of your datcats • Associate a style and vocabulary to each datcat • You can inspire yourself from others (DXLT) • Serve it hot to your software guy with a piece of SALT software
GMT Generic Mapping Tool
Background • Interoperability principle • If any two TMLs have exactly the same DCS, even though they differ radically in style and vocabulary, they are equivalent. • Consequence • It is always possible to define a filter from one TML to another when they are interoperable • GMT is the intermediate representation to do so
From one TML to another • GMT - Generic mapping tool • an abstract XML representation • identification of levels • <struct type=“LS”>…</struct> • a recursive element • representation of data-categories • <feat type=“definition”>…</feat>
GMT description cont. • Bracketing features <brack> <feat type=“classificationCode“> xxx </feat> <feat type=“classificationSystem“> Lenoc </feat> </brack>
GMT description cont • Annotating information <feat type=“definition”> pencil whose <annot type=“characteristic”> casing </annot> is fixed around a cental graphite medium which is used for writing or making marks </feat>
Data Categories A Formal Description
Data Category Registry DCRegistry rdf:about Description dcsd:DataCategory VersionNumber Data Category
Data Category description DCIdentifier DCParent DCName dcsd:DCIdentifier dcsd:DCParent DCDefinition dcsd:DCName dcsd:DCDefinition dcsd:DCType DCType (S, C) Data Category dcsd:DCExample DCExample dcsd:DCAdmin dcsd:DCComment dcsd:Content dcsd:Level DCAdmin DCComment Locus Content Salt 2000-11-08/SEW
Levels and content Content dcsd:DataType dcsd:TargetType Level/Loci rdf:Alt rdf:Alt TargetType DataType List of References List of References rdf:Alt rdf:li Ref to other datcats rdf:li List of References Ref to other datcat(s) rdf:li Ref to other datcat(s)
Actualizing a DatCat TMF specific properties
Styling properties Simple Element Attribute TypedElement ValuedElement TVElement Anchor StyleName Data Category dcsd:Anchor dcsd:StyleName dcsd:Style dcsd:ElementName ElementName Style dcsd:Value dcsd:AttributeName dcsd:TypeValue AttributeName Value TypeValue Pour simple
Attribute style description • dcsd:StyleName=“Attribute” • Conditions of use: • Not valid for annotations • Required properties • dcsd:AttributeName • Example: • dcsd:AttributeName=“id” • <anchorElement id=“xx54893”>…</>
Element style description • dcsd:StyleName=“Element” • Required properties • dcsd:ElementName • Example: • dcsd: ElementName =“definition” • <definition>…</definition>
TypedElement style description • dcsd:StyleName=“TypedElement” • Required properties • dcsd:ElementName, dcsd:TypeValue • Example: • dcsd:ElementName =“termNote” • dcsd:TypeValue=“partOfSpeech” • <termNote type=“partOfSpeech”/>N</termNote>
ValuedElement style description • dcsd:StyleName=“ValuedElement” • Conditions of use: • Not valid for annotations • Required properties • dcsd:ElementName • Example: • dcsd:ElementName =“pos” • <pos value=“noun”/>
TVElement style description • dcsd:StyleName=“TVElement” • Conditions of use: • Not valid for annotations • Required properties • dcsd:ElementName, dcsd:TypeValue • Example: • dcsd:ElementName =“free” • dcsd:TypeValue=“pos” • <free type=“pos” value=“noun”/>
Simple style description • dcsd:StyleName=“Simple” • Conditions of use: • Express the value of simple data categories • Required properties: • dcsd:Value • Example: • dcsd:Value =“Nom” • <pos>Nom</pos>
Two types of languages • Working language • The language used at a given place in a document, along the XML hierarchy • Representation: xml:lang • Object language • The language about which you speak at a given place in your terminological entry (e.g. describes the Language Section level) • Representation: as a data category “language”, with a narrow scope
Example — DXLT <langSet lang='en’xml:lang=“fr”> <descrip type='definition’>Une valeur entre 0 et 1 utilisée…</descrip> <tig> <term xml:lang=“en”>alpha smoothing factor</term> <termNote type='termType'>fullForm</termNote> </tig> </langSet>
Example — GMT <struct type=“LS”xml:lang=“fr”> <feat type=“language”>en</feat> <feat type='definition’>Une valeur entre 0 et 1 utilisée…</feat> <struct type=“TL”> <feat type=“term” xml:lang=“en”>alpha smoothing factor</feat> <feat type='termType'>fullForm</feat> </struct> </langSet>
Conclusion • A general model for analysing and representing terminological data collection • An underlying formalism expressed in XML,RDF • Associated tools (Salt project) • DCSEditor, • DCSBrowser, • Automatic generation of XSLT filters and XML schemas from a given TML specification
Useful pointers • SALT project • http://www.loria.fr/projets/SALT • http://www.ttt.org/ • The TMF site • http://www.loria.fr/projets/TMF