270 likes | 422 Views
SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory. Semantic Web Best Practices and Deployment. Reminder: what is it?. S imple K nowledge O rganisation S ystem Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?)
E N D
SKOSEcoterm 2006Alistair MilesCCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment
Reminder: what is it? • Simple Knowledge Organisation System • Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?) • Subject metadata & information retrieval … • ‘this document is about romantic love’. • ‘this document is about the cure of tuberculosis by x-ray in India in the 1950s’. • Application of RDF
Since Ecoterm 2005 … • SKOS Core Guide & SKOS Core Vocabulary Specification … • First Working Draft May 2005 • Second Working Draft October 2005 • Minor changes • Quick Guide to Publishing a Thesaurus on the Semantic Web … • First Working Draft May 2005
What comes next … ? • Life after SWBPD-WG … ? • Plans for next phase of W3C Semantic Web Activity … • New WG? • SKOS W3C Recommendation by end 2007? • N.B. Not yet approved!
If Rec then … • What is the scope? What is the fundamental design goal? • First part of SKOS Rec would be requirements specification. • Between now and Sept/Oct 2006 … define scope and requirements.
What I’d like to do here … • Talk about some of the assumptions behind SKOS. • Sketch some ideas on how to define scope and requirements for SKOS. • Get your feedback. public-esw-thes@w3.org “SKOS: Requirements for Standardization” isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf
Brief history of scope … • 2003-04: SWAD-Europe • ISO 2788 thesauri • “Non-standard” thesauri via extensibility e.g. GeMET • Classification scheme (PACS) • Multilingual thesauri • Semantic mapping • 2004: W3C Glossaries • 2005: Discussion re “terminologies” • Subject headings? Gazeteers? Folksonomies? Taxonomies?
Assumptions: purpose … • Formal representation of controlled structured vocabularies intended for use in information retrieval applications.
Assumptions: workflow … • Build a vocabulary • Build an index • Retrieve
Assumptions: components … • Vocabulary Development Application • Something to help build a vocabulary • Indexing Application • Something to help build an index • Retrieval Application • Something to help retrieve things • SKOS ultimately designed to support interoperation of these three “key components”.
Proposed scope … • SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications. • SKOS is required to support the interoperation of these three key components. • I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.
Other components … • Vocabulary mapping … ? • Metadata registries … ? • … ?
Component specs … • … first discuss social and technological context, then return to component specs …
Context … • What is the social and technological context in which controlled structured vocabs are used? • Assume two basic needs… • Locate something I already know about. • Discover something new. • N.B. a good location service is not necessarily a good discovery service. • Cf. Google and del.icio.us
Strategies … • Basic strategies for implementing retrieval services … • Statistical text analysis • Analysis of user behaviour • Index with controlled vocab • Other strategies … • … kos-assisted text analysis?
Cost problem … • Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment… • Given that other strategies are cheaper… • Huge pressure to drive down cost and increase utility. • Requirement for seamless integration. • I.e. controlled vocab is seldom used in isolation, most applications will combine strategies.
Use case … • Search portal … • Use combined strategies.
Component specs … • Important factors … • Minimise cost. • Decentralisation. • Assistance. • Maximise “utility”. • Query expansion. • Smart ranking. • Maximize lifetime. • Use the Semantic Web! • Situation A. search across many collections, where indexers use same controlled vocab. • Situation B. search across many collections, where indexes use different controlled vocabs.
Focus areas … • Decentralisation requires different models of collaboration and change. • Representing change a key factor to keeping a vocab applicable. • Ranking and scoring well understood for text, less so for controlled index. • Theory of query expansion? Field trials of query expansion? • Strategies for providing assistance?
Change and collaboration • Continuum of collaboration models: centralized <-> decentralised • Continuum of change management models: continuous <-> discrete • Decentralization can reduce cost of development and maintenance • Change management can ensure continued utility – maximize ROI • Support for declarative representation of change a requirement for SKOS.
Semantic Web architecture… • Exploit Semantic Web facility to distribute and merge data. • However, publication of data in the Semantic Web, best practices need work. • See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).
Information retrieval… • Indexing and query evaluation well understood for text content. • Less well understood for controlled metadata. • Query types? • Query evaluation strategies, e.g. query expansion? • Ranking?
Assistance for indexers … • Provide suggestions • Comparison of labels and annotations • Machine learning • Exploit lexical resources • … ?
Assistance for mappers … • Provide suggestions … • Analysis of labels and annotations • Exploit lexical resources • … ?
Summary • SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies. • Define requirements by describing information retrieval functionalities. • Divide functionalities into: • Presentation styles • Query types e.g. compound queries, coordination … • Query evaluation strategies • Assumptions: • Key components • Semantic Web interaction • Context – pressure to make vocabularies “profitable” • … Issues: change, assistance, theory …