400 likes | 511 Views
A Web-Based Resource Model for eScience: Object Reuse & Exchange 2008 Microsoft eScience Conference Indianapolis, December 8, 2008. OAI-ORE Editors. Carl Lagoze Cornell University Herbert Van de Sompel Los Alamos National Laboratory Pete Johnston Eduserv Foundation Michael Nelson
E N D
A Web-Based Resource Model for eScience: Object Reuse & Exchange 2008 Microsoft eScience Conference Indianapolis, December 8, 2008
OAI-ORE Editors • Carl Lagoze • Cornell University • Herbert Van de Sompel • Los Alamos National Laboratory • Pete Johnston • Eduserv Foundation • Michael Nelson • Old Dominion University • Rob Sanderson • University of Liverpool • Simeon Warner • Cornell University
OAI Object Reuse and Exchange: Support The Andrew W. Mellon Foundation The Coalition for Networked Information Joint Information Systems Committee Microsoft Corporation The National Science Foundation
OAI Object Reuse and Exchange Subject: Aggregations of Web resources Approach: Publish Resource Maps to the Web that Instantiate, Describe, and Identify Aggregations
Aggregations Instantiate, Describe, and Identify Aggregations
Aggregations At one time it was possible to convey all scientific information about a topic in a single “convenient” medium. Babylonian Astronomical Catalogue
Aggregations But quickly the limitations of that medium became obvious. 1857 Astrophysics paper text data
Aggregations Those limitations seem to live on.
Aggregations “Solving” the problem with ad hoc methods. 1890 Astrophysics paper Photo plate kept separate from text (digitized version of original plate shown) text
Aggregations Objects of interest in eScience are by nature compound. 2006 Astrophysics paper X-MM-Newton X-ray observation Vilspa, Spain Chandra X-ray observation Cambridge, MA A1795 Basic object information Strasbourg, France Hubble optical observation Baltimore, MD text
Formats Relationships Identifiers Splash page Versions Aggregations! http://arxiv.org/abs/astro-ph/0611775
Object Reuse and Exchange: A Web-Centric Approach • The Web Architecture as the platform for interoperability • De-facto integration with existing Web • applications • Potential of adoption by other • communities • Potential of tools created by other • communities • Incorporating the “social web” (Web 2.0) in eScience
Foundations of OAI-ORE • Web Architecture • <http://www.w3.org/TR/webarch/> • Semantic Web, RDF • <http://www.w3.org/TR/rdf-primer/> • Linked Data • <http://linkeddata.org/> • <http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/> • Cool URIs for the Semantic Web • <http://www.w3.org/TR/cooluris>
URI Represents Resource Represents Content Negotiation Representation 2 Representation 1 W3C Web Architecture Identifies • The tools we have to solve the interoperability problem are: • Resource • URI • Representation
URI RDF Semantic Web Vocabularies Semantic Web • The tools we have to solve the interoperability problem are: • URI • RDF • Vocabularies
Linked Data • Linked Data principles: • Use URIs as names for things. • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information. • Include links to other URIs. So that they can discover more things.
OAI Object Reuse and Exchange: The Approach Subject: Aggregations of Web resources Approach: Instantiate Aggregations as Resources with unique URIs on the Web Approach: Publish Resource Maps to the Web that Instantiate, Describe, and establish identity of Aggregations
An Aggregation and the Web • Resources of an Aggregation are distinct URI-identified Web resources • Missing are: • The boundary that delineates the Aggregation in the Web • An identity (URI) for the Aggregation
ORE Data Model We want to have our cake and to eat it too (don't we all?): • ORE should be simple and easy to use without deep understanding • Use simple tools and rules to create Atom Resource Maps • ORE should have well crafted data model that enables interoperability through well defined semantics • Separate design from implementation • Future-proof ORE – today's technologies will be replaced (even HTTP?) • Don't need to understand Data Model fully to do ORE
This resource is an Aggregated Resource This resource is an Aggregation A Relationship defined in the ORE vocabulary Aggregation: Resource that is a set of resources
Resource Map: Describes an Aggregation: Resource Map Serialization The resource has a representation Implied as inverse of “describes” HTTP GET ore:isDescribedBy This resource is a Resource Map
Recommend use if HTTP URIs • HTTP is technology of today's web • Want to be able to cite of refer to Aggregation but get Resource Map describing it • Follow Linked Data strategies to link: access URI-A, get redirected to URI-R (HTTP 303) or simple # URI • Provides notion of Authority Multiple Resource Maps • An Aggregation MAY be asserted and described by multiple Resource Maps • The purpose of multiple Resource Maps is to provide descriptions of the Aggregation in multiple serializations (e.g., Atom, RDF/XML, RDFa, etc.) • Each Resource Map MUST have only one representation
Authority • Authoritative Resource Maps • Get to Resource Map via Aggregation, usually created by same authority • Multiple: MUST be minimally equivalent (same Aggregated Resources and Proxies), SHOULD assert mutual existence • Non-authoritative Resource Maps • Best practice is to not create them • Assert your own Aggregation instead • Use rdfs:seeAlso to assert relationship between two Aggregation
RDF/XML ore:describes Atom Atom ore:describes RDFa These are non-authoritative Resource Maps These are authoritative Resource Maps Multiple Resource Maps
Required Adding other properties to the core The ReM makes the assertions Metadata about the ReM Metadata about the Aggregation
Asserting other Relationships Assertions about Aggregated Resources. The ReM makes the assertions Assertions about the Aggregation. Aggregation is a journal Aggregation has another version “A” Aggregated Resources are articles “AR-3” is by Stephen Hawking
Limits of Assertions thus Far • The meaning of an RDF triple is independent of the context in which it is stated • Think of the difference: • Carl is a man • Carl is visiting Indianapolis • All the triples described thus far are context independent • Therefore they can have the URI of an aggregated resource as subject or object • But remember that is just the URI of the Resource and is not exclusive of it being an Aggregated Resource • Introduce proxy URI
Proxy: Stands for resource in context of other resource hasNext might have meaning only in context
lineage: “this came from” Reuse of data set AR-1 in Aggregation A-2. ore:lineage predicate expressed origin or provenance of data. Needs proxies because statement depends on contexts
arXiv.org: ORE possibilities arXiv is an e-print archive of 500k scholarly articles Express: • Structure of arXiv: archives, sub-categories, articles • Versioning: “article” (concept) and specific versions and formats • Articles by Joe Smith – somewhat like a result set • Constituents of an article (metadata, PDF, source, video, data, extracted references) • Describe internal and external components (e.g. external video associated with article but on Perimeter Institute server) • Use as part of workflow for ingest – assembly of components, possible combination with SWORD