1 / 27

SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory. Semantic Web Best Practices and Deployment. Reminder: what is it?. S imple K nowledge O rganisation S ystem Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?)

Download Presentation

SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SKOSEcoterm 2006Alistair MilesCCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment

  2. Reminder: what is it? • Simple Knowledge Organisation System • Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?) • Subject metadata & information retrieval … • ‘this document is about romantic love’. • ‘this document is about the cure of tuberculosis by x-ray in India in the 1950s’. • Application of RDF

  3. Since Ecoterm 2005 … • SKOS Core Guide & SKOS Core Vocabulary Specification … • First Working Draft May 2005 • Second Working Draft October 2005 • Minor changes • Quick Guide to Publishing a Thesaurus on the Semantic Web … • First Working Draft May 2005

  4. What comes next … ? • Life after SWBPD-WG … ? • Plans for next phase of W3C Semantic Web Activity … • New WG? • SKOS W3C Recommendation by end 2007? • N.B. Not yet approved!

  5. If Rec then … • What is the scope? What is the fundamental design goal? • First part of SKOS Rec would be requirements specification. • Between now and Sept/Oct 2006 … define scope and requirements.

  6. What I’d like to do here … • Talk about some of the assumptions behind SKOS. • Sketch some ideas on how to define scope and requirements for SKOS. • Get your feedback. public-esw-thes@w3.org “SKOS: Requirements for Standardization” isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf

  7. Brief history of scope … • 2003-04: SWAD-Europe • ISO 2788 thesauri • “Non-standard” thesauri via extensibility e.g. GeMET • Classification scheme (PACS) • Multilingual thesauri • Semantic mapping • 2004: W3C Glossaries • 2005: Discussion re “terminologies” • Subject headings? Gazeteers? Folksonomies? Taxonomies?

  8. Assumptions: purpose … • Formal representation of controlled structured vocabularies intended for use in information retrieval applications.

  9. Assumptions: workflow … • Build a vocabulary • Build an index • Retrieve

  10. Assumptions: components … • Vocabulary Development Application • Something to help build a vocabulary • Indexing Application • Something to help build an index • Retrieval Application • Something to help retrieve things • SKOS ultimately designed to support interoperation of these three “key components”.

  11. Proposed scope … • SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications. • SKOS is required to support the interoperation of these three key components. • I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.

  12. Other components … • Vocabulary mapping … ? • Metadata registries … ? • … ?

  13. Component specs … • … first discuss social and technological context, then return to component specs …

  14. Context … • What is the social and technological context in which controlled structured vocabs are used? • Assume two basic needs… • Locate something I already know about. • Discover something new. • N.B. a good location service is not necessarily a good discovery service. • Cf. Google and del.icio.us

  15. Strategies … • Basic strategies for implementing retrieval services … • Statistical text analysis • Analysis of user behaviour • Index with controlled vocab • Other strategies … • … kos-assisted text analysis?

  16. Cost problem … • Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment… • Given that other strategies are cheaper… • Huge pressure to drive down cost and increase utility. • Requirement for seamless integration. • I.e. controlled vocab is seldom used in isolation, most applications will combine strategies.

  17. Use case … • Search portal … • Use combined strategies.

  18. Component specs … • Important factors … • Minimise cost. • Decentralisation. • Assistance. • Maximise “utility”. • Query expansion. • Smart ranking. • Maximize lifetime. • Use the Semantic Web! • Situation A. search across many collections, where indexers use same controlled vocab. • Situation B. search across many collections, where indexes use different controlled vocabs.

  19. Focus areas … • Decentralisation requires different models of collaboration and change. • Representing change a key factor to keeping a vocab applicable. • Ranking and scoring well understood for text, less so for controlled index. • Theory of query expansion? Field trials of query expansion? • Strategies for providing assistance?

  20. Change and collaboration • Continuum of collaboration models: centralized <-> decentralised • Continuum of change management models: continuous <-> discrete • Decentralization can reduce cost of development and maintenance • Change management can ensure continued utility – maximize ROI • Support for declarative representation of change a requirement for SKOS.

  21. Semantic Web architecture… • Exploit Semantic Web facility to distribute and merge data. • However, publication of data in the Semantic Web, best practices need work. • See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).

  22. Semantic Web architecture

  23. Direct interaction …

  24. Information retrieval… • Indexing and query evaluation well understood for text content. • Less well understood for controlled metadata. • Query types? • Query evaluation strategies, e.g. query expansion? • Ranking?

  25. Assistance for indexers … • Provide suggestions • Comparison of labels and annotations • Machine learning • Exploit lexical resources • … ?

  26. Assistance for mappers … • Provide suggestions … • Analysis of labels and annotations • Exploit lexical resources • … ?

  27. Summary • SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies. • Define requirements by describing information retrieval functionalities. • Divide functionalities into: • Presentation styles • Query types e.g. compound queries, coordination … • Query evaluation strategies • Assumptions: • Key components • Semantic Web interaction • Context – pressure to make vocabularies “profitable” • … Issues: change, assistance, theory …

More Related