160 likes | 280 Views
Enrichment and Structuring of Archival Description Metadata. Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou**. ** National Centre for Text Mining The University of Manchester, UK Ioannis.Korkontzelos@manchester.ac.uk Sophia.Ananiadou@manchester.ac.uk.
E N D
Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** ** National Centre for Text Mining The University of Manchester, UK Ioannis.Korkontzelos@manchester.ac.uk Sophia.Ananiadou@manchester.ac.uk * Tilburg Centre for Cognition & Communication The University of Tilburg, NL K.Zervanou@uvt.nl Antal.vdnBosch@uvt.nl
Research on Metadata • Developing standards: • collection specific (e.g. EAD, MARC21) • cross-collection (e.g. Dublin Core) • Provide mappings: • across schemas • ontologies (ad hoc or standard CDOC-CRM) • Discard metadata for IR (Koolen et al., 2007) • Exploit metadata for IR (Zhang&Kamps, 2009)
The IISH EAD dataset • EAD: XML standard for encoding archival descriptions • Challenges: • Variety of languages used • Varying type and amount of information • Style: enumerations, lists, incomplete sentences
Motivation & Objectives • Improved search and retrieval • content-based metadata document clustering • content-based/semantic search • support exploratory search • link across collections, metadata formats & institutions • create unified metadata knowledge resources
Pre-processing • EAD/XML element selection & extraction • EAD elements containing free-text & archive content information • Language identification (n-gram method) • Identifier trained on Europarl corpus • Text snippets length: ~20 tokens
Enrichment & Structuring • Topic detection: Automatic term recognition using C-value method • Agglomerative hierarchical term clustering: • complete, single & average linkage criteria • document co-occurence & lexical similarity measures
Results • C-value best performance: candidates that occur as non-nested at least once • Average linkage criterion & Doc Co-occurence: provide broader and richer hierarchies
Questions? Check-out our poster!