1 / 25

Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk. Anne G S Asserson Research Department University of Bergen anne.asserson@fa.uib.no. INTEREST INTER operation for E xploitation, S cience and T echnology. Authors. Keith G Jeffery STFC-RAL.

mills
Download Presentation

Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk Anne G S Asserson Research Department University of Bergen anne.asserson@fa.uib.no INTEREST INTERoperationforExploitation,Science andTechnology

  2. Authors Keith G Jeffery STFC-RAL Anne Asserson UiB

  3. Background The Hypothesis Conclusion Remote Wrapper Local Wrapper Catalog Catalog Plus Pull (ERGO2++) Full CERIF Harvesting Structure

  4. Background: GL • Grey literature is important but is only a small component of the total research information environment and must be seen in context of the overall research process • Grey literature is a product • To understand the product need to have information on the sources and the process i.e. the research context • Do not try to obtain information through a ‘fog’ backwards from GL metadata • Get it moving forwards through the research process then much GL metadata derived directly and consistently

  5. Background: Access • Interoperation: homogeneous access to distributed heterogeneous information • Query against schema (of user) • Translation to other schemas (of sources) • Answer reconciled to original schema (of user) • If common interoperation format n interfaces • If not n(n-1) interfaces • Utilise one common interoperation format • [Character set, language, syntax, semantics] • The alternative is ‘google-like’ where the end-user has to do the translations and reconciliations • This does not scale

  6. Background: Metadata • Grey literature repositories can be interoperated without CERIF-CRIS using OAI-PMH and DC (OAISTER) • Grey Literature Repositories provide better recall and relevance when interlinked via CERIF-CRIS – research context • formal syntax, declared semantics • Metadata • Schema, Navigational, Associative {descriptive, restrictive, supportive} • The key to everything is quality metadata • input validation, query/retrieval, relationship linking, INTEROPERATION

  7. Background PROJECT ORGUNIT PERSON Contact Event Prize/Award Results Publication General Facility Skills Results Patent Particular Equipment CV Service Results Product Funding Programme Classification CERIF: EU Recommendation to Member States

  8. Result PublicationInstance Diagram OrgUnit M Part of member Person A OrgUnit O employee member OrgUnit N Part of Project leader Project P author owns IPR Metadata in CERIF-CRIS much richer than usual repository Publication X

  9. End-User CRIS Research Context [projects, persons, organisational units funding, products, patents, publications facilities, equipment, events] CERIF CERIF Various protocols OAI-PMH OA Repository (hypermedia) Documents e-Research repository Datasets and Software CERIF- CRIS + Repositories at 1 institution

  10. Institution A Institution B Institution C End-User End-User End-User CRIS CRIS CRIS OA repository OA repository OA repository e-Research repository e-Research repository e-Research repository ….and multiple institutions

  11. Hypothesis • Comparison of possible architectures for interoperation of grey repositories • (of publications or data and software) • Leads inexorably to === • CERIF should be used either : • as the native storage format, • as the storage format of a derived data warehouse (transformed copy of the CRIS) • as the export format converted from the CRIS native format using a wrapper.

  12. user Query form Presentation form LAN Presentation convertor schemas integration addresses dispatcher receiver network dispatcher receiver receiver dispatcher addresses addresses answer convertor Query convertor answer convertor schema schema Query Query <<<<Non-CERIF CRISs>>>> Remote Wrapper Query convertor

  13. Remote Wrapper • the user needs only web browser and simple query form • the host has to write query converter • the host has to write answer (XML?) converter (to a specific XML DTD?) • the query expressivity is very limited • the user client has to write an integrator for the answers

  14. user Presentation form Query form LAN Presentation convertor Query convertor schemas integration Query schemas addresses dispatcher receiver network receiver dispatcher receiver dispatcher addresses addresses <<< non-CERIF CRISs >>>>> Local Wrapper

  15. Local Wrapper • each host has only to supply and update its schema to the client (all clients if there is not a central query server) • each host has no software to provide except receiver and dispatcher • the client (if it is a central service) has a very large workload • if there is no central service then each client has to have all schemas supplied and updated • the client software has to include a complex query refiner • the client software has to include multiple complex query converters • the client software has to include a complex answer integrator • the client software has to include a presentation converter (complexity depends on specification of presentation required and complexity of the answer structure)

  16. Construction phase from each host Retrieve phase by user user schema User phase1 Query (standard) LAN CRIS Query form Hit list convertor dispatcher network Query receiver loader CERIF Metadata Catalog CERIF Metadata Catalog Catalog

  17. Catalog • simple query on union catalog (which may be centralised or replicated) • possibly not all required entities and attributes in catalog • effort to populate catalog; requires converter at each host to supply CERIF metadata

  18. Catalog Plus Pull (ERGO2++) User phase1 User phase2 Query form Presentation form LAN Query Hit list processing CERIF Metadata Catalog addresses dispatcher receiver network receiver dispatcher receiver dispatcher addresses addresses Unique id query Unique id query <<< non-CERIF CRISs >>>>>

  19. Catalog Plus Pull (ERGO2++) • advantage of simplicity as for catalog-only architecture • advantage of additional information provision • disadvantage that additional information is heterogeneous (unless converted to CERIF export data model) • disadvantage of hosts having to maintain entries representing their database content in the CERIF metadata catalog

  20. Full CERIF user Query form Presentation form LAN addresses dispatcher receiver network receiver dispatcher receiver dispatcher addresses addresses Query Query <<<<< CERIF CRISs >>>>>

  21. Full CERIF • very simple and easy to use for the end-user • each host has to either run a full CERIF model database or provide a full CERIF model version of the host database

  22. network Crawling robot Catalog of documents with associative descriptivemetadata network Html pages Html pages Html pages Html pages converter converter converter converter CRIS CRIS non-CRIF CRISs << << >>>> CRIS CRIS Harvesting (construction phase)

  23. User phase1 User phase2 LAN Query form Presentation form Hit list processing Query network Harvester Associative descriptive metadata catalog addresses dispatcher receiver network receiver dispatcher receiver dispatcher addresses addresses URL query URL query Html pagesfrom CRIS Html pages from CRIS Harvesting (search phase)

  24. Harvesting • The host has to provide a copy of the database as webpages to be available to the search robot and subsequent accesses based on clicks from URL of metadata. • The query is based on existence of term(s); constraining by entity or attribute is not possible (without sophisticated xml form processing). • The results are unstructured and one page at a time (click on URL in metadata catalog to see page); this inhibits statistical processing or report generation. • It is easy to implement and maintain (although the database may be ~2 weeks out of date) and has a familiar interface for many WWW users.

  25. Conclusion • To interoperate grey repositories link to a CRIS • Best: Full CERIF architecture • Else: wrap CRIS to interoperate using CERIF

More Related