110 likes | 193 Views
The mapping process – some observations. Robina Clayphan EDLF. Data Flow. Local schemas > ESE. Management of the process. Sheer complexity of managing the hundreds of files going through the steps in the process keeping track of the status of the files
E N D
The mapping process – some observations RobinaClayphan EDLF
Data Flow Local schemas > ESE
Management of the process • Sheer complexity of managing the hundreds of files going through the steps in the process • keeping track of the status of the files • straight-forward ones – in the right place for the next step • problem ones - refer back to provider or a developer • Use of Sharepoint document libraries and rapid establishment of procedures that all must adhere to • The management of the process evolved during implementation - a very steep learning curve • Maintenance of authority files • getting for meta-metadata from the providers (types etc) • collection IDs
(sort of) Policy issues • Inclusion criterion: must have a link giving direct access to the digital object • check if URLs in data actually resolve to the object described • Often: • resolve to metadata page with e.g. pdf icon • how many clicks are acceptable – need for policy decision • granularity mismatch – link at title level only • Sometimes: • 404 page not found - refer to provider – persistence of URLs • need a plug in (e.g. DjVu) – is that OK? • Occasionally: a log-in required for restricted access resources • Need for providers to ensure they only provide links to resources that can be accessed
Data level problems 1 • Trying to understand decision-making process of the original metadata creators • What they meant by e.g. dc:date, dc:source • Trying to discern the (implicit) data model of the original metadata creators • What is the dc:relation referring to • Understanding data in a foreign language or foreign script • Is negyedévenként really hungarian for terminally? • And, if so, why is it in dc:format?
Data level problems 2 • Questions to developers that arose from examining the data • All records have two instances of dc:identifier the first a URL the second (possibly) a shelfmark. Need to map each instance to a different ESE - can it be done? • All records have two instances of dc:rights the first appropriate the second not – is it possible to just display the first and ignore the second? • Where values had been divided between multiple instances of the same element – could they be concatenated with punctuation for a better display e.g spatial1, spatial2, spatial3 used for a geographic hierarchy. Another with up to 14 instances of dc:subject.
Normalisation level • At the normalisation stage you can see if your interpretation of the record actually makes sense when it has been processed against the source data. • Apply the Quality Control Checklist • Edit mapping and repeat !
(my) Conclusion • All indicates: • that it is easier if the mapping and normalising is done as close to source as possible, ideally by the providers • they are the ones who understand what the data means and can make sensible mapping decisions • they understand the language and script • Tools would be nice!
Data Flow #0 Transform data to populate local repository Local schemas > ESE Aggregator with provider? Aggregator with provider? Aggregator? EuropeanaLocal #5 Export data to Europeana
EuropeanaLocal Content Provider Model - to illustrate movement of metadata only C o n t e n t p r o v i d e r l o c a l s y s t e m s Customised transformations to e.g. OAI-DC C o n t e n t p r o v i d e r r e p o s i t o r i e s Harvesting of e.g. OAI-DC Aggregator Aggregator Mapping and transformation to ESE, including <europeana> elements Europeana EuropeanaLocal Parallel Test Environment No metadata transformations
Issues for EuropeanaLocal • Currently a great deal of manual effort goes into metadata transformation. • at provider sites: local format to repository format • by the Europeana development team harvested format to ESE • normalisation by Europeana development team • Where will this work happen in EuropeanaLocal? • feasibility of central Europeana staff handling hundreds more collections? • Can we minimise the current manual overhead? • What are the possibilities for automating all or some of the transformation work?