1 / 24

Implementation of Topic Centered Portals

Implementation of Topic Centered Portals. David Norheim Computas AS, Norway Robert Engels, ESIS AS, Norway. Motivation The system Challenges and lessons learned Future work. Computas. 23 years experience in knowledge management, expert systems, and process modeling

curtisj
Download Presentation

Implementation of Topic Centered Portals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of Topic Centered Portals David NorheimComputas AS, Norway Robert Engels, ESIS AS, Norway

  2. Motivation • The system • Challenges and lessons learned • Future work

  3. Computas • 23 years experience in knowledge management, expert systems, and process modeling • Special focus toward government and the oil- and gas sector • The major semantic web company in Norway

  4. Computas’ semantic Web activities • Sectors • Oil- and gas industry • Government • Type of applications • Knowledge management • Semantic search support • Research and commerical projects

  5. Background • A clear shift towards open source and open standards • Linux for Schools, Open Document formats in the public sector • National semantic registry, Large governmental information portals based on semantic standards • The government through Norwegian Archive, Library and Museum Authority (ABM-utvikling): development of an open standard based, open-source software for creation and maintenance of topic-driven portals. ”there is a need for a targeted effort to create a framework based on Semantic Web to enable professional users to organize information and to make libraries build and maintain metadata-driven search solutions.” A digital culture and knowledge policy? EFN.no

  6. A topic driven portal • For a library it is as natural to evaluate, describe and enable retrieval of any resource on the web as printed material • Quality evaluated collection of information resources organized according to some topic structure and published online. • Retrieval through search and navigation in topics Source: Ellen Aarbakken, Oslo Public Library (Deichmanske Bibliotek)

  7. Yahoo! provided the first subject driven portal, but focused on most popular aspects -> replaced by Search (e.g. Google) However, the words in the long tail is context dependent, and generic web search will frequently pollute results due to ambiguity Example of long tail portals Medical information for laymen Primary school educational resources Public information for immigrants Juridical information for laymen Norwegian architecture portal Why a topic centric portal tool and not search?

  8. Why not Web 2.0? • Folksonomies • Collaborative “categorization” • Freely chosen keywords • Manual “tagging”, practicallyno existing metadata • Mostly acting as a popularity measure • Topic tools • Conceptual level with navigation • Quality evaluated with metadata • Manual “tagging”,but support formore automation

  9. SUBject oriented tool for LIbraries, Museums andArchives • Several roads to the same destination • Key requirements in developing the tool • Handle metadata of various sources and vocabularies (e.g. Dublin Core) • Interoperability - among portals based on the same tool and same protocols (SPARQL, SRU) • Open source and open (semantic web) standards • Combining free text search and navigation through models • Handling both informal and formal models (e.g. SKOS and OWL DL) - future

  10. Scandinavian Medical Information for Laymen (SMIL) is a Scandinavian international cooperation to offer quality controlled meta-data with references to pages related to health, illnesses and treatments. Contributing partners to the portal are librarians and nurses from the Nordic countries. The current SMIL base consists of 8500 records creating around 250.000 triples. Detektor targets public schools. Resources are annotated by public libraries consists of about 1850 topics and 4600 resources. This results in about 100.000 triples Two initial portals

  11. Portal Technical Characteristica (grounding technologies) Evaluation criterieas inspired by the Esperonto project

  12. Web client External clients Search and navigation SUR client SPARQL client Portal configuration Ontology maintenance SPARQL queries SPARQL update SPARQL dispatcher Local endpoint External servers Open search SRU server SPARQL endpoint Topic ontology Indexing Metadata store Crawler Web resources Architecture The client consists of a search interface allowing users to search using free text and meta-data search. The search string is transferred into a structured SPARQL query System accept queries from both SPARQL and SRU/CQL Interoperability at the query layer Backend consists of an RDF Store with SPARQL interfaces. Freetext indexing using lucene/LARQ System can query external SPARQL and SRU/CQL services

  13. Sublima • Ontologies generally provide the structure for the navigation of the results, support browsing and classification. • Ontologies allow for term disambugation, query rewriting and semantic distance measures • In sublima we use informal SKOS to • Navigating through subjects, showing the subject relations (“fish eye”) • Search expansion; synonyms, common misspellings • Faceted filtering; topics as well as other metdata • Future version will also support OWL DL

  14. Good and bad choices, lessons learned the hard way • Keeping the semantics • Living with free-text indexing and structrued queries • Tool maturity • Scalabilty Keep in mind this is NOT a research project, but with a real and demanding customers expecting everything to work

  15. Perserving the semantics • We needed flexibility for users to add any metadata without touching code • SPARQL SELECT loses the meaning returning only a binding, hence clients become static. We therefore used SPARQL DESCRIBE extensively DESCRIBE ?x WHERE { ?lit pf:textMatch ”cancer*”@en . ?x dc:title ?lit . }

  16. Living with free-text indexing and structrued queries • Indexing with respect to structure • Our breastfeeding twin-problem • Not sufficient to index all literals as users expect hits on the combination of dc:title and dc:description • And even worse; the combination of dc:title and dc:subject/skos:preferedLabel • Scoring/ranking • Easy with SELECT, but not with DESCRIBE • How do you rank results from a structured query? No universal way to handle sturctured and unstructure information

  17. Constistent tool maturity and missing links • Some ”small” issues • Support for Turtle in Protégé -> needed to convert to RDF/XML • Resources identified with URLs in Protégé • Tools mostly geared towards one dialect of RDF/OWL • Indeterministic RDF/XML serialization for XSLT processing • Lacking a binding from OWL classes to OO languages The simple things sometimes turns out to be the hardest…

  18. Scalability • Response time varies with store size and query complexity • Too much complexity in queries • Moving from 500k triples to 10th of millions • Need to refactor into smaller faster queries • Federation of queries

  19. Some good lessons • New standards (e.g. SPARQL), proposals for standardization (e.g. SPARUL), new tools (e.g. Jena), open source (e.g. Tomcat, Apache), lack of good documentation all say high risk!!!! • However, the support and maintenance from the W3C community and open source developers (e.g. Jena team) has been impressive, the support through IRC channels, mailing lists etc has been invaluable for the project.

  20. Some good lessons • Good experiences with reusing metadata schemas • FOAF, Dublin Core, Powder, SKOS, SIOC, Lingvoj • Extensive dereferencing of URIs, any topic and resource URI pasted in the browser results in a DESCRIBE query for that URI.

  21. Living with informal and formal ontologies • Current ontologies are modeled informally with W3C Simple Knowledge Organization System (SKOS) • No distinction between part-of, contains, is-a • No reasoning support • Possible with small datasets • Sublima will also support models using formal ontologies • Formal IS-A • DL reasoning • Required for large datasets Smaller data sets Expressivity Reasoning Large data sets

  22. Future work The code-base is no in use in more projects • Integration with other SPARQL-based portals. • Interoperability with ISO Topic Maps models • Graphical visualization with touch screen, clever UIs • Hi-quality multimedia resources

  23. Conclusion • We clearly found that the technology currently available starts to reach a certain state of maturity if it comes to functionality. BUT STILL RISKS! • Careful evaluation of tools and scalability is needed as content increases. Do not eat the whole menu at once! Query interoperability Recording companies Broad-casters High quality metadata Open metadata e.g.Wikipedia

  24. Thank you for your attention david.norheim@computas.com We welcome sharing our experiences with yours! Welcome to upcoming conferences in Norway next year • Mid February in Oslo - hands-on tutorials • May in Stavanger - Semantic Days focusing on the oil- and gas industry • September 2008 - initiating Scandinavian Semantic Web Conference

More Related