230 likes | 327 Views
TMF - a tutorial Part 3: Designing (schemas and) filters. TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria. General principles. Terminological information interchange Three components: Source TDB 1 Target TDB 2 Terminological interchange format
E N D
TMF - a tutorialPart 3: Designing (schemas and) filters TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria
General principles • Terminological information interchange • Three components: • Source TDB1 • Target TDB2 • Terminological interchange format • A specific TML (DXLT, Geneter) TDB1 TDB2 TML
Important notice • GMT is not a TML • A too abstract format • Uncontrolled recursivity (‘ struct ’ element) • Uncontrolled content (‘ feat ’ and ‘ annot ’) • Necessity to provide a schema to check interchanged data • Precise list of datacategory • Precise definition of format • GMT is here to provide conceptual simplicity
Designing filters TML to GMT
General principles • Just for your information • The creation of the filters can be automatized • Basic processes • Reduction of expansion trees • Mapping elements and attributes to the corresponding data categories
Reducing expansion trees • Example • DXLT (Martif) sub-tree <ntig> <!-- some general information associated with the term --> <termGrp> <!-- term related information --> </termGrp> </ntig> • GMT <struct type="TS"> <!-- some features --> </struct>
Element mapping • Example • DXLT (Martif) <definition>Bla, bla, bla etc.</definition> • GMT <feat type="definition">Bla, bla, bla etc.</feat>
Structural elements • Generating a GMT ‘ struct ’ element <xsl:template match="termEntry"> <xsl:element name="struct"> <xsl:attribute name="type">TE</xsl:attribute> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template>
Features • Generating a GMT‘ feat ’ element • (style=Attribute) <xsl:template match="@id"> <xsl:element name="feat"> <xsl:attribute name="type">iso12620-identifier</xsl:attribute> <xsl:value-of select="."/> </xsl:element> </xsl:template>
Features • Generating a GMT‘ feat ’ element • (style=Element) <xsl:template match="term"> <xsl:element name="feat"> <xsl:attribute name="type">iso12620-term</xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>
Features • Generating a GMT‘ feat ’ element • (style=TypedElement) <xsl:template match="descrip[@type='subjectField']"> <xsl:element name="attr"> <xsl:attribute name="type">SubjectField</xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template>
XML Schemas for TMLs …work ahead…
Analysing existing TDBs Towards a generic methodology
Automatic GMT2TML stylesheet Format specific XSL stylesheet Simple DB dumper General Architecture TDB Flat XML GMT TML
A two phase process • List the various Data Categories used in the TDB • Relate them to existing registries (e.g. iso 12620), cf. http://salt.loria.fr/public/salt/DCQuery.html • Identify the underlying organization of the TDB • Relate it to the Meta-model • Anchor the DatCat where they actually occur
Analysis of an existing TDB Going through an example
Eurodicautom sample <entry> <BE>BTB</BE> <TY>DAG77</TY> <NI>398</NI> <CF>3</CF> <CM>AG1</CM> <CM>JUA</CM> <EN> <VE>key money</VE> <RF>CILF,Dict.Agriculture,ACCT,1977</RF> </EN> <FR> <VE>pas-de-porte</VE> <DF>prix payé au précédent occupant pour le droit d'entrer dans une exploitation agricole</DF> <RF target="DF">TNC(1997)</RF> <RF>CILF,Dict.Agriculture,ACCT,1977</RF> <NT type="NTE">droit rural;pratique prohibée par la loi</NT> </FR> </entry> classificationCode-12620A.4.2 (TE) Language 12620A.10.7(LS) term-12620A.1 (TS) definition-12620A.5.1 (TS) note-12620A.8 (TS)
Result in GMT (1/2) <tmf> <struct type="TE"> <feat type="entryIdentifier-12620A.10.15">BTB-TY-398</feat> <feat type="originatingInstitution-12620A.10.22.2">BTB</feat> <feat type="projectSubset">DAG77</feat> <feat type="NI">398</feat> <feat type="reliabilityCode">3</feat> <feat type="classificationCode-12620A.4.2">AG1</feat> <feat type="classificationCode-12620A.4.2">JUA</feat> <struct type="LS"> <feat type="language-12620A.10.7">EN</feat> <struct type="TS"> <feat type="term-12620A.1">key money</feat> </struct> <feat type="sourceIdentifier-12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat> </struct>
Result in GMT (2/2) <struct type="LS"> <feat type="language-12620A.10.7">fr</feat> <struct type="TS"> <feat type="term-12620A.1">pas-de-porte</feat> </struct> <brack> <feat type="definition-12620A.5.1">prix payé au précédent occupant pour le droit d'entrer dans une exploitation agricole</feat> <feat type="sourceIdentifier-12620A.10.20">TNC(1997)</feat> </brack> <feat type="sourceIdentifier-12620A.10.20">CILF,Dict.Agriculture,ACCT,1977</feat> <feat type="note-12620A.8">droit rural;pratique prohibée par la loi</feat> </struct> </struct> </tmf>
Simple rules • Using XSL locality <xsl:template match="CM"> <feat type="classificationCode-12620A.4.2"> <xsl:apply-templates/> </feat> </xsl:template>
Introducing specific levels • Necessity to combine structure and content <xsl:template match="VE"> <struct type="TS"> <feat type="term-12620A.1"> <xsl:apply-templates/> </feat> </struct> </xsl:template>
Default rule • Useful for keeping track of unmapped data categories <xsl:template match="*"> <feat> <xsl:attribute name="type"> <xsl:value-of select="name()"/> </xsl:attribute> <xsl:apply-templates/> </feat> </xsl:template>
Useful pointers • TMF page: • http://www.loria.fr/projets/TMF • HLT/Salt project page • http://www.loria.fr/projets/SALT • Data category query tool: • http://salt.loria.fr/public/salt/DCQuery.html