250 likes | 258 Views
The STAR project aims to investigate the potential of semantic technologies for widening access to digital archaeology resources and associated grey literature. It focuses on aligning and enriching archaeological datasets using controlled vocabularies and ontologies.
E N D
Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK http://hypermedia.research.glam.ac.uk/
STAR project - overview • AHRC funded project in collaboration with English Heritage Centre for Archaeology, Portsmouth • Aim: to investigate the potential of semantic technologies for widening access to digital archaeology resources, including disparate datasets and associated grey literature.
STAR - general architecture Applications – Server Side, Rich Client, Browser Data access layer - Web Services, SQL, SPARQL RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS) Data Mapping / Normalisation Indexing Conversion (SKOS) STAN RRAD IADB LEAP RPRE Grey Literature reports EH thesauri, glossaries Archaeological Datasets
The Archaeological Archipelagos [Keith May, English Heritage]
English Heritage controlled vocabularies • 27 glossaries – from English Heritage recording manuals (2006) • 6 main thesauri used: • Monument Types thesaurus • Archaeological Sciences thesaurus • Evidence thesaurus • Main Building Materials thesaurus • MDA Object Types thesaurus • Timelines thesaurus • Converted to SKOS format for use within STAR
Expressive vs. controlled vocabulary “…how many of those writing [grey literature] reports would think to describe what they are recording/writing about using the same thesauri? […] it would have been a lot quicker and easier if standardised terminology had been used in the report text when describing types of monument, event and artefact, as well as dates/periods etc.” [G. Falkingham] “Grey Literature is very often the only place where field workers have any opportunity to engage in creating their own narrative of the site, both of the archaeological event and of the archaeological story of the site itself. I think it would be throwing the baby out with the bath water to concentrate solely on the data without continuing to offer highly skilled and experienced fieldworkers the opportunity to actually tell us what they think the data means...” [S. Jeffrey]
Worst of all worlds? Descriptive, semi-controlled vocabulary… “…another of my examples has something about some flint that is ‘snuff coloured’ & I don’t know if I’ve ever seen snuff, let alone know what colour it is, or might have been over 150 years ago, and I would think it would make sense to take some kind of integrated approach from the outset, rather than the usual ‘bricolage’ of having one route for the archivists, another for those interested in searching spreadsheets, another for people interested in googling graphics, etc.” [G. Carver]
Centuries BC / AD years 3 age system Monarchs / Roman emperors Cultural styles Geological periods Prefixes: pre, post, mid etc. Any combinations of these Terminology control for time periods
Time period alignment – data cleansing / semantic enrichment
Time period relationships Period P1 occurs before P1* occurs after P1* meets P1 met by P1 overlaps P1 overlapped by P1 starts P1* started by P1* finishes P1* finished by P1* includes P1* occurs during P1* [*Transitive] equal to P1* Time
Time Period Comparison – Closeness Calculation IU Period P1 Period P2 NMP MP NMP Period P3 NMP D NMP Time Match(P1, P2) = W1 (MP / IU) + W2 (IU / (NMP + IU)) + W3 (IU / (D + IU))
SKOS Concepts + CRM Entities Time period concepts also have implicit spatio-temporal context skos:Concept crm:E2.TemporalEntity crm:E52.Time-Span rdf:type crm:P4F.has_time-span rdfs:subClassOf <#stuart> crm:P7F.took_place_at rdf:type <#stuart> crm:E4.Period crm:E53.Place crm:P116F.starts skos:broader skos:broader skos:broader skos:broader skos:broader crm:P115F.finishes <#jacobean> <#caroline> <#restoration> <#williamandmary> <#queenanne> crm:P119F.meets crm:P118F.overlaps crm:P119F.meets crm:P119F.meets
Time period alignment – data processing • Align data relative to closest period concepts from English Heritage ‘Timelines’ thesaurus
Time period alignment - results Data records relative to closest ‘known’ periods
Semantic enrichment • Borderline between data cleansing and data creation… “Possibly fragment of belt buckle or nail” • BELT • Belt Clasp -> use STRAP FITTING • BUCKLE • Buckle Plate -> use BUCKLE • NAIL • HOBNAIL • SHOEING NAIL “The single most useful thing you can do to ensure the long-term preservation of your data is to plan for it to be re-used” [Archaeology Data Service]
Aligning controlled vocabularies • Different scope notes, same concepts? • Different thesauri, same concepts? • Archaeological Objects • SARCOPHAGUS • SUNDIAL • WALL PAINTING • WHIPPING POST • RCHME Monument Types • SARCOPHAGUS • SUNDIAL • WALL PAINTING • WHIPPING POST RCHMS Monument Types RCHMW Monument Types
STAR general architecture EnglishHeritage thesauri (SKOS) • Windows applications • Browser components • Full text search • Browse concept space • Navigate via expansion • Cross search archaeological datasets Grey literature indexing STAR web services ArchaeologicalDatasets (CRM) STAR client applications STAR datasets
Windows Client Applications Browse available thesauri Search across multiple thesauri Navigate via semantic expansion
Controlled types used in main search interface • Interactive selection from glossary/thesaurus concepts • Filtered to concepts actually used in indexing • Group / context types – from (enhanced) cuts and deposits glossary • Context find materials – from building materials thesaurus • Context find types – from MDA Object types thesaurus • Context sample types – from existing data values...
Summary • Tension between expressive vs. controlled vocabulary • Semantic enrichment process and terminology control (e.g. for time periods) • Alignment of controlled vocabularies • Web services and interactive tools to aid data entry and search • Issues encountered are not about particular technologies – more fundamental KO issues
Archaeology and Terminology Ceri Binding Hypermedia Research Unit, University of Glamorgan, Wales, UK http://hypermedia.research.glam.ac.uk/