UKOLN is supported by:

What's Wrong With Today's Web? Emma Tonkin, Rosemary Russell Interoperability Focus, UKOLN UKOLN is supported by:

Contents • About UKOLN • What's Wrong with the Web? • Reusability and standards • Some handy technologies • Solutions and best practices • Questions and discussion

About UKOLN • Core funding from MLA and JISC (Joint Information Systems Committee) • Cross-sectoral remit • Our audiences include: • HE/FE (L&T, research, admin) • Cultural heritage (museums, public libraries, archives) • National libraries (British library) • e-government • NHS/Health sector • International digital library research community • About 30 staff • Mix of technical support and development, advisory and research capabilities • Based at University of Bath

About Interoperability Focus About UKOLN • Funded by MLA & JISC • Since 1999 • Cross-sectoral remit • 4 staff • Activities include • Standards work • Brokerage • Support & advice • Dissemination

What's wrong with the Web? • Depends on your perspective, but... • IP issues: licensing • Lost resources, broken links • Support • Reusability • Learning objects, data, articles, images, programs • Interoperability using common standards and protocols • Finding what you want • Overload – Google • Relevance • Provenance • Quality-Assurance • Accessing resources • Accessibility • Usability • Legislation

Reusability • Change in attitudes • Focus on creating static product • Maintenance problems: • Broken links, changing technologies • High costs • New focus: reusable products • 'Replacable' technologies • Interoperability becomes vital

Masterpiece...

Reuse... Misguided Masterpieces: Mona Lisa Design by George Castaldo “Smile” and “teeth” by Guido Poggi

Difficulties of reuse, maintenance • Required: standards for • Storing information • Accessing information • Accessing functionality... • ...maintaining order! • Good standards often deceptively simple

XML and Web services • Based on HTTP (Web) and XML, the eXtensible Markup Language – a flexible basis for standard information formats • Web services: “A standardized way of integrating Web-based applications using open standards over an Internet protocol backbone”

XML, data, information • Humans are good at interpretation, inference • eg. +44 (0) 1225 386580 • Computers need more help • Data with XML is self-describing • Data plus surrounding structure • You may also hear of XSD • XML Schema Definition: describes XML structures (so you don't have to guess).

XML example: RSS • Uses XML to define a list of items: <title>University of Bath Noticeboard</title> <link>http://www.bath.ac.uk/noticeboard/</link> <description> A Web notice board for staff and students at the University. </description> <item rdf:about="http://www.bath.ac.uk/noticeboard/18840.html"> <title>Book(s) for sale: 'Intermediate Microeconomics</title> <link>http://www.bath.ac.uk/noticeboard/18840.html</link> </item> <item rdf:about="http://www.bath.ac.uk/noticeboard/1915.html"> <title>For sale: R reg 1.9 TD Volvo V40 estate</title> <link>http://www.bath.ac.uk/noticeboard/1098447598-1915.html</link> </item> ...

Web page displaying RSS

Creative uses of RSS feeds • News feeds from web sites, blogs, content providers • Publish your bookmark list • Alternative interfaces: Bookmarks

Data and presentation • XML/XSL vs (X)HTML • Example: OSS Watch • Page content in XML, rendered with XSLT stylesheets to: • Standard HTML layout • Single page HTML for printing • Simple text • PDF

www.oss-watch.ac.uk

Web Services • Suite of standards + best practices • Machine (m2m/b2b) interfaces between functional components on the Web • Can be used for informational and transactional services • Communicate using XML (and a protocol such as SOAP) • Interface/functionality described,(WSDL), and published

Example: The Google API • Able to: • Perform searches and return results in XML • Get cached copy of page • Spell-check (“did you mean?”)

RDN/Google spellchecker

How does it work?

Not just for Web pages • Integration with • desktop software • Research pane in MS Word • Client-side Web applications • Flash, Java/.Net applets • Internet applications: • Weather forecast, phonebook, TV guide...

Office 2003 Research Library service (MSDN) • Query Google web API from Research pane • Search from within Office • Written in C# or VB.net http://msdn.microsoft.com/library/default.asp?url= /library/en-us/dnofftalk/Html/office03062003.asp

Searching Google in Flash • eg. www.flash-db.com/google

Aside: metadata, Dublin Core • Metadata: data about data; information for cataloguing (for the Web…) • Simple Dublin Core • 15 elements • e.g. Title, publisher, creator, description, identifier

Lost files, broken links • Using a library... • Physical addresses • 2nd book, 5th shelf, 1st floor, Bath • Representative addresses ('in section') • Redirects ('look up ISBN in catalogue') • Books available from several locations • ...as with the Web • URLs, redirects, mirrors

Traditional linking Link source Link destination hardwired link

OpenURL style linking Link source OpenURL resolver Metadata and identifiers Chooses document delivery service (based on contextual information) One of many possible destinations

filename protocol server PURLs • Persistent Uniform Resource Locator • Traditional URLs point to location: • http://www.site.com/resource.html • PURLS point to an intermediate resolution service

OpenURL • “Providing access to an appropriate copy of a full-text article” • Defines article by means of metadata: • issn=, date=, volume=, issue=, spage= • URL sent to link resolver • Resolver uses metadata to identify article • Chooses appropriate target

DOI: Digital Object Identifier • “A system for identifying and exchanging intellectual property in the digital environment” • A DOI is a unique ID controlled by a DOI registration agency, referring to a digital object • Each DOI may be resolved to data associated with that digital object, eg. summary, article text...

Improved resource discovery • Two approaches to searching multiple databases: • Cross-searching (Z39.50, SRW) • Inefficient on a large scale • Harvesting metadata centrally • Efficient but requires agreed transport protocols, metadata formats, quality control and intellectual property/usage rights • Result: OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)

OAI-PMH in a nutshell • Data providers' records harvested by central service provider • Records may be articles, images, learning objects, descriptions of museum artifacts... • OAI is based around standards: • HTTP, XML, Dublin Core metadata • Once metadata centrally available, search efficiency improves

Z39.50 made easy: SRW! • Z39.50 – still not adopted on large scale • SRW - ‘low-barrier-to-entry solution’ • more easily implemented • retains many important aspects of Z39.50 (eg rich semantics developed over many years • XML based • carried via SOAP or in a URL (SRU)

Exposing information • Data providers can use both OAI and Z39.50/SRW • OAI is used for metasearch engines to harvest records (to avoid slow cross-searching) • SRW provides Web-based search capability for individual or metasearch hubs

Exposing information: portals • Study at Loughborough University: use of library databases rose by 609% after implementation of cross-search tool • But don’t want to cross-search everything • Subject portals • ‘Portlets’ – functional modules (reuse within other portals)

Best practices – what you can do tomorrow • Publish content as RSS/XML eg. news • Make data accessible via OAI-PMH, Z39.50/SRW • Encourage community participation, publicise resources • Provide and use web services • Consider robust linking solutions • Stick to standards!

Questions... • Any questions?

UKOLN is supported by: