150 likes | 275 Views
OCLC Open Source Linked Data Framework. Ralph LeVan Sr. Research Scientist OCLC Research. Goal: Expose Text Database Content as Linked Data.
E N D
OCLC Open Source Linked Data Framework Ralph LeVan Sr. Research Scientist OCLC Research
Goal: Expose Text Database Content as Linked Data • Technique: Using a combination of the urlrewritefilter from tuckey.org, the content negotiation component from the Freie Universität Berlin’s Pubby server and our Open Source SRW/U server you can expose the records in your database as Linked Data
Roadmap • SRW/U Server • URIs for records • Real World Objects for records • Multiple record formats • RDF needs to be returned • Content Negotiation
SRW/U Server • SRW/U server sits in front of text databases • We have interfaces for DSpace, Lucene and Pears • Easy to write your own interface • Convert CQL query to native query language • Do search and return a resultset object • Return records from the resultset • (The Lucene interface is a good simple example of how to build your own database interface) • I expose my SRW/U service as <context>/search, but you can put it wherever you want.
URIs for records • urlrewritefilter implements apache mod_rewrite patterns for java servlets • It sees the URI and converts it to an SRU search. • <from>^/([0-9][0-9]+)/$</from> <to>/search?query=local.viafID+exact+%22$1%22</to> • E.g. viaf/123 becomes viaf/search?query=viafID+exact+%22123%22
Aside: What to Return? • An SRU query returns a searchRetrieveResponse. A smart client can pick its record out of that response, but that seems wrong • A bad URI will result in “no records found”, but a 404 (record not found) is more appropriate • Solution: add a new parameter (service=APP) to signal that this was a request for a single record • E.g. viaf/123 becomes viaf/search?query=viafID+exact+%22123%22&service=APP
Real World Objects for records • urlrewritefilter can generate 303 (see other) redirects based on URI patterns • <from>^/([0-9][0-9]+)$</from> <to type="seeother-redirect">/viaf/$1/</to> • E.g. viaf/123 redirects to viaf/123/ • The target, viaf/123/, is called the Generic Record • (note: now we use viaf/123/ as the URI that gets turned into the SRU search)
Multiple record formats • urlrewritefilter plus the new httpAccept parameter in SRU • E.g., viaf/123/marc21.xml becomes viaf/search?query=viafID+exact+123&httpAccept=application/marc21+xml • SRU is configured with a list of supported media types and the XSL stylesheets that render them
MimeType Configuration • XML.mimeTypes=application/sru+xml;q=0.85, application/xml, text/xml • HTML.mimeTypes=text/html;q=0.9, application/xhtml+xml • RSS.mimeTypes=application/rss+xml;q=0.8 • RSS.styleSheet=viaf2rss.xsl • M21.mimeTypes=application/marc21+xml;q=0.7 • M21.styleSheet=viaf2marc21.xsl • marc21HTML.mimeTypes=application/marc21+html;q=0.7 • marc21HTML.styleSheet=viaf2marc21.xsl
Aside: 123/marc21.xml NOT 123.m21 • The Generic Record being at viaf/123/ seems to imply that it is a collection of records • How do I ask for the HTML version of the MARC21 version of viaf/123 if suffix mangling is all I have?
RDF needs to be returned • viaf/123/rdf.xml • Making good RDF is tricky and beyond the scope of this presentation (but I think we’re getting close to agreements of sensible basics)
Content Negotiation on Generic Record • Pubby has a really nice Content Negotiation module • It is configured with the list of supported media types with optional quality measures • It takes an HTTP Accept header and returns the supported media type that matches best • SRU is configured with a list of supported media types and their quality measures and the XSL stylesheets that render them (see previous slide)
Result • viaf/123 redirected to viaf/123/ • viaf/123/ turned into viaf/search?query=viafID+exact+123 • VIAF record returned by SRW/U server • Content Negotiation causes viaf/123/viaf.html to be returned to googlebot and browsers, viaf/123/rdf.xml to applications that ask for application/rdf+xml and viaf/123/viaf.xml returned when no preference is provided