190 likes | 278 Views
SALT XLT Markup and Mapping in Termbases - Empirical Experience -. Klaus-Dirk Schmitz University of Applied Sciences Cologne Institute for Information Science klaus.schmitz@fh-koeln.de. SALT - Work Package 4.
E N D
SALTXLT Markup and Mapping in Termbases- Empirical Experience - Klaus-Dirk Schmitz University of Applied Sciences Cologne Institute for Information Science klaus.schmitz@fh-koeln.de
SALT - Work Package 4 • Analyse in detail data categories and format structures of existing terminological data collections and formats in order to develop conceptual mapping tables and procedures to and from DXLT. • This will serve as technical specifications for the develop-ment and implementation of filters (converters) from specific databases and formats to DXLT and vice versa. • Based on Deliverable 2 / 3.1 describing existing terminological data formats and structures of concrete sample data.
One of the formats: Eurodicautom • Eurodicautom is the terminological databank of the EU Commission, developed and filled with data since the end of the 60ties. • Eurodicautom is a main frame application with more than 1 million records, and since some years, the data are provided to the public via a web interface.
Eurodicautom: Sample Data %%BE BTB %%TY DAG77 %%NI 612 %%CF 3 %%CM AG4 CH6 GO6 %%DA %%VE C/N kvotient[1];kulstof-kvælstofforhold[2] %%RF A.Klougart[VE1,VE2] %%EN %%VE C-N ratio %%RF CILF,Dict.Agriculture,ACCT,1977
Eurodicautom: Sample Data %%NL %%VE C/N-quotient[1];koolstof-stikstofverhouding[2] %%DF (in bodem)verhouding vh totale koolstofgehalte tot het totale stikstofgehalte van organische stoffen... %%RF Agr.WP[VE1,VE2];Huitenga,Landbouwwdbk N-E[VE1,VE2] %%NT {NTE}(in plant)verhouding van koolstof en stikstof(koolhydraten en eiwitten)...[VE1,VE2]
Eurodicautom: Data Structure • After a general block of entry-related (concept-related) information, language blocks are repeated for each of the EU languages. • Every data category can only appear once within a language block, i.e. only one data category for all terms in one language. • The Note field can be ”structured” by unique starting tags that can be seen as ”virtual” data categories.
Eurodicautom: Data Categories (part) %%BE (EU) terminology service responsible for the entry (M) %%TY ”collection” code (M) %%NI entry number (M) %%NX entry number for updating (R) %%NZ entry number for deleting (R) %%CF reliability code (1 lowest, 5 highest) %%AU author, originator %%DATE date (of last modification) %%CM subject field (Lenoch Code) (M) ......... M=Mandatory / R=Rare or old
Eurodicautom: A first DTD (part) <!-- DTD for EURODICAUTOM KDS 30.8.2000 --> <!ELEMENT EURODICAUTOM (entry+ )> <!ELEMENT entry (BE , TY , (NZ | NX | NZ ) , CF , AU? , DATE? , CM* , langSet+)> <!ELEMENT BE (#PCDATA )> <!ELEMENT TY (#PCDATA )> ... <!ELEMENT langSet (VE?, AB?, PH?, DF?, RF?, MC?, MC?, NT?)> <!ATTLIST langSet lang CDATA #IMPLIED > <!ELEMENT VE (term , termID? )*> <!ELEMENT AB (term , termID? )*> <!ELEMENT PH (term , termID? )*> <!ELEMENT RF (text , refID* )*> <!ELEMENT term (#PCDATA )> ...