160 likes | 328 Views
Metadata Acquisition with XML. Case studies from the Swiss Federal Archives 9. October 2002 / Stephan Heuscher. Overview. Problems acquiring metadata Why XML? Featured Projects Lessons learned Conclusions. Problems acquiring metadata. Documentation Data format Data consistency
E N D
Metadata Acquisition with XML Case studies from the Swiss Federal Archives 9. October 2002 / Stephan Heuscher
Overview • Problemsacquiring metadata • Why XML? • Featured Projects • Lessons learned • Conclusions Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Problemsacquiring metadata • Documentation • Data format • Data consistency • System borders • Money • Communication with stakeholders Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Why XML? XML … • … is an open standard • … is self-explanatory • … is human-readable • … can be validated automatically • … has a broad software support • Most products feature XML support Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
SIARD Archiving of relational databases Manual generation of additional metadata Metadata and content is stored in XML files AMDA Manages metadata for audio data from the Swiss Parliament Does not manage audio data Import of XML metadata Must provide a variety of export formats Featured Projects Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
SIARD (System Independent Archiving of Relational Databases) Oracle MS-SQL ???-DB Database regeneration Data and low-level metadata extraction Digital Archive (to be built) Additional high-level descriptive metadata Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
XML use in SIARD • SQL-99 (ISO/IEC 9075) • Low-level data description • Structure • Datatypes • Constraints • XML • High level metadata • Table content (thin wrapper) Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Data Logic (SQL) CREATE TABLE "FLUGLE"."CLASS" ( "CLASS_ID" NATIONAL CHARACTER VARYING(20) NOT NULL , "SCHEDULE_ID" NATIONAL CHARACTER VARYING(20) , "CLASS_BUILDING" NATIONAL CHARACTER VARYING(25) , "CLASS_ROOM" NATIONAL CHARACTER VARYING(25) , "COURSE_ID" NATIONAL CHARACTER VARYING(5) , "DEPARTMENT_ID" NATIONAL CHARACTER VARYING(20) , "INSTRUCTOR_ID" NATIONAL CHARACTER VARYING(20) , "SEMESTER" NATIONAL CHARACTER VARYING(6) , "SCHOOL_YEAR" TIMESTAMP(0) ) CREATE TABLE "FLUGLE"."CLASS_LOCATION" ( "CLASS_BUILDING" NATIONAL CHARACTER VARYING(25) NOT NULL , "CLASS_ROOM" NATIONAL CHARACTER VARYING(25) NOT NULL ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
SIARD Metadata XML <?xml version="1.0" encoding="UTF-8"?> <archive> <database product-name="Oracle" product-version="Personal Oracle9i Release 9.0.1.1.1 - Production. With the Partitioning option. JServer Release 9.0.1.1.1 - Production" table-number="22" view-number="4" archiv-size="175KB"> <schemas> <schema tag-name="FLUGLE" table-number="22" view-number="4"> <status sql3="true" integrity="true" archiv="true" reason="0" mandatory="true"/> <tables> <table tag-name="BACKUP_CLASS" column-number="9" row-number="10"> <status sql3="true" integrity="false" archiv="true" reason="3" mandatory="true"/> <columns> <column tag-name="CLASS_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" type="VARCHAR2" length="20" precision="" scale="" nullable="false" defaultvalue=""> <status sql3="true" integrity="true" archiv="true" reason="0" mandatory="true"/> </column> ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
SIARD Data XML <?xml version="1.0" encoding="UTF-16"?> <dmp-file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../dmp.xsd"> <schema tag-name="FLUGLE"/> <table tag-name="CLASS"/> <column tag-name="CLASS_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" defaultvalue="" nullable="false" constraints="PK:PK_CLASS"/> <column tag-name="SCHEDULE_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" defaultvalue="" nullable="true" constraints="FK:FLUGLE.SCHEDULE_TYPE.SCHEDULE_ID"/> ... <data> <row>6,104200;4,S180;9,POCO HALL;3,150;3,198;5,PHILO;4,E491;6,SPRING;19,1997-03-01 00:00:00;</row> <row>6,104500;3,T15;11,NARROW HALL;3,200;3,184;4,HIST;4,D944;6,SPRING;19,1997-03-01 00:00:00;</row> ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Audio data AMDA (Audio MetaData Acquisition) Access DB Online parliament session metadata (XML) Webinterface Unified XML import AMDA Metadata Digital Archive (to be built) Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
XML use in AMDA • Import • XSLT transformation to common format • Online metadata • Legacy data (Access database) • Export • Raw XML output transformed using XSLT Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
AMDA Import XML (raw) <?xml version="1.0" encoding="iso-8859-1"?> <root> <session oid="34695" session_id="session_4609" text_update_time="1002882007656"> <meeting date="20010917" local_time="1430" location="N" oid="34696" publish_status="final"> <subject oid="34697" publish_status="draft" subject_type="gesch"> <gesch_list oid="34698" publish_status="draft" transfer_gesch_list="01.9001;"> 01.9001; <gesch_info oid="000000000"> <a99_gesch last_modified="2001/03/05 14:43:42 GMT+01:00"> <gesch_id raw_id="20019001">2001.9001</gesch_id> <title language="d"> <line>Mitteilungen</line> <line>des Präsidenten</line> </title> </a99_gesch> </gesch_info> </gesch_list> <speech_text audio_channel="N" audio_end="1000729995203" audio_start="1000729751250" speaker_id="9005" turnus_nr="1000" turnus_oid="155989"> <pd_text> <p>Der Beginn dieser Herbstsession ist schmerzlich getrübt von unseren Gedanken an das ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
AMDA Import XML (transformed) <?xml version="1.0" encoding="iso8859-1"?> <Session id="4609" start="20010917T1430+0200"> <Geschaefte> <Geschaeft nummer="1998.0446" themaDeutsch="Parlamentarische Initiative
Hämmerle Andrea.
Post, SBB, Swisscom.
Arbeitsplätze
in der ganzen Schweiz" themaFranzoesisch="Initiative parlementaire
Hämmerle Andrea.
Poste, CFF, Swisscom.
Des emplois
dans toute la Suisse"/> <Geschaeft nummer="2001.9001" themaDeutsch="Mitteilungen
des Präsidenten" themaFranzoesisch="Communications
du président"/> ... </Geschaefte> <Verhandlungen> <Verhandlung geschaeftNummern="2001.9001" rat="V" start="1000729751" dauer="244" bulletin="" bulletinSeiten="825"> <Votum start="1000729751" dauer="20" sprache="de"> <Person id="9005" vorname="Peter" nachname="Hess" kanton="ZG" ort="Zug"/> <VotumText>Der Beginn dieser Herbstsession ist schmerzlich getrübt von unseren Gedanken ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Lessons learned • Transforming and reformatting of XML data is easy • Documentation and data integrity are crucial • Agree on rules and standards for XML formats early • Stakeholders‘ uses of XML differ greatly Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives
Conclusions • XML • is not a preservation strategy • is only a technology • is too new for a common understanding • XML provides tools and techniques for a concise metadata management • Working solutions need both XML and non-XML experience • Most problems are still of human nature Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives