340 likes | 448 Views
Controlled Vocabularies in TELPlus. Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007. Agenda. TELPlus Context Improving subject access 3 sub-tasks Services for TEL. TELPlus Context. Started October 2007 Running 27 months Content WPs
E N D
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007
Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL
TELPlus Context • Started October 2007 • Running 27 months • Content WPs • OCRing previously digitised material • Improving the usability of TEL through OAI PMH compliancy • Improving Access • Integrating services with TEL portal • User personalisation services • Extending TEL to Bulgaria & Romania
WP3 – Improving Access • Task 1: Indexing for usability • Review/test state-of-the-art semantic search engines • On content of documents • Task 2: Improving subject access • Task 3: FRBR aggregation, search and browsing • Create/exploit FRBR metadata repositories • Task 4: Focus on users • Focus groups on prototypes
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Search through collections • Using metadata • In a controlled setting • Paving the way for enhanced usages • Advanced treatments mentioned in TELplus need conceptual structures and links between these structures • E.g. clustering
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manually-built semantic equivalences between Rameau, SWD & LCSH headings
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manual equivalences between Rameau, SWD, LCSH headings • Here: an experiment on deploying automatic alignment techniques • Determining possible strategies • Assessing feasibility and usefulness • MACS context
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Converting subjects to standard representation language Goal: solving syntactic heterogeneity between vocabularies • Enabling the use of standard tools • E.g. for query (re)formulation • Paving the way for dealing with semantic heterogeneity • Definitions of concepts expressed according to a common model
Converting subjects to standard representation language Approach: Semantic Web and SKOS • Semantic Web • Knowledge objects as web resources (URIs) • Description by linking resources (RDF) • Description using shared formal vocabularies (ontologies) • SKOS • A standard Semantic Web model (ontology) • For knowledge organization systems (thesauri, subject heading lists…)
SKOS: Example skos:ConceptScheme rdf:type skos:Concept http://www.iconclass.nl/ rdf:type skos:inScheme http://www.iconclass.nl/s_11F skos:prefLabel skos:broader “the Virgin Mary”@en “la Vierge Marie”@fr skos:prefLabel http://www.iconclass.nl/s_11
Converting subjects to standard representation language - Process • Getting processable versions from owners • E.g. XML • Analyzing the models • Converting to SKOS
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Vocabulary Alignment • Specifying required alignment format (links) • Type of mapping links: equivalence, broader • Cardinality: one-to-one, one-to-many • Taking application context (TEL) into account
Vocabulary Alignment • Specifying required alignment format (links) • Selecting (& running) alignment techniques/tools • Inspired by semantic web approaches
Vocabulary Alignment Techniques • Similar to ontology alignment problem • Existing approaches for (semi-) automatic ontology alignment • Using techniques from linguistics, computer science, statistics • Problem: performances do not allow 100% automatic alignment • Problem: multilingual case • Some techniques cannot be used
Backgroundknowledge Potential Technique: Using Background Knowledge • Using a shared conceptual reference to find links “Publication” “Calendar” SHL 1 SHL 2
Potential Technique: Statistical Alignment • Object information (book indexing) “Dutch Literature” SHL 1 SHL 2 “Dutch” Dually-indexed books
Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Considering application
Evaluation of Alignments • MACS has produced mappings! • Possible gold standard • But: has MACS produced all mappings? • Which proportion of the SHLs is covered? • Taking into account all indexing strings? • Are MACS mappings the only interesting ones? • “Serendipity” mappings • Concepts that are not equivalent but could bring useful results when added to queries • Compensating for indexing variability
Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Navigation
Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Retrieving books indexed by SHL1 using SHL2 concepts • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Matching user search terms to both SHL1 or SHL2 concepts • Navigation • Browsing several collections using one SHL structure
Evaluation of Alignments • Several settings for a single scenario • Fully automatic reformulation vs assisted reformulation (candidates) • Different evaluation measures • Good mappings vs acceptable ones • Number of candidates for reformulation • Semantic closeness to original query
Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Assessment of the approach • Efforts required, quality, extendibility
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Deploying the alignment knowledge obtained into TEL framework • Observing integration of MACS data into TEL • Conceptual input for alignment requirements • Integration of the obtained alignment in TEL • Assessment of the alignment integration • Technical aspects, usage aspects
Reminder • Alignment is a difficult problem • Application-specific alignment pretty much unexplored in Semantic Web research More a feasibility study than a complete solution to the problem Practical goal: investigate how automatic techniques could help MACS-like initiatives • Manual mapping is labour-intensive
Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL
WP4 – Integrating services with the European Library portal Theo van Veen (KB) Tasks: • Identifying services that are going to give the user the greatest return • Creating new services • Integrating services within TEL …
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Providing terms linked to query terms • Semantic enrichment service • Users can annotate search results with terms • Distance between terms and related terms
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Semantic enrichment service • Distance between terms and related terms Adding more value from controlled vocabularies and alignments between them