‘aggregation as a tactic’ - to support discovery

‘aggregation as a tactic’ - to support discovery Peter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations in Scholarly Communication (OAI7) University of Geneva, 23 June 2011

Context RDTF Vision: The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision: “UK researchers and students will have easy, flexible, and ongoing access to content and services through a collaborative, aggregated and integrated resource discovery and delivery framework which is comprehensive, open and sustainable ” Making content more discoverable both by people and machine via a mixed economy of technological solutions. The Discovery Initiative aims to: • Engage stakeholders across libraries, archives and museums • Build critical mass of open content to inspire others to participate • Encourage development of ‘purposeful aggregations and compelling applications’ - mashing at the macro-level • Exemplify what can be done across domains to free data and explore how to make that data work harder No one-size fits all solution!

‘aggregation as a tactic’ - a phrase coined to end an an impasse during a meeting to discuss technical aspects of the RDTF Vision statement to identify stakeholder groups Key concept in RDTF Vision is aggregation, directly or represented through metadata – to unlock the online & digital riches held in our organisations ‘Regard aggregation as intervention to exploit the telematic opportunity for things [that] are 'remote, digital & published’ - a phrase derived from an IASSIST conference in 1990 exploring what it meant with the Internet if we regarded all [content] as ‘remote and published’. The Web in mid-1990s simplified and thus improved Unfortunately, even now, much which is online and on the Web is badly or inadequately published … We have to improve, re-interpreting what it means to be ‘well-published’

The term aggregation is used a lot in computer science for: • “objects … assembled or configured together to create a more complex object” UML, IBM • “aggregating resources based on … properties. … they are owl:sameAs and their other properties can be intermixed.” For purposes of RDTF aggregation means: • an assembly of data sources • more than a collection of objects (image banks, data services, catalogues, activity data) – related or otherwise • for machine-as-user – independent of presentation layer However aggregation is not a goal nor an end in itself - It is an interventionto be used for a twofold strategic purpose: • ‘improvement’- merge & match, customisation and consumption, multiple output formats, reduce duplication of effort • ‘discoverability’ – via ‘promiscuous’ or ‘well-dressed’ metadata through e.g. Google or tailored services

Language & Perspectives Digital Library has mixed parentage- a ‘re-mix’ of the document tradition & the computation tradition • “approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like … [Content Provider speak] • “approaches based on uses of formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures).” [Developer speak] Prof. Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th (1998) http://people.ischool.berkeley.edu/~buckland/asis62.html

Perspectives … as provider • EDINA - develops and delivers JISC-sponsored national online services • adding value to data and content • Digimap Collections (OS mapping; SeaZone; BGS) • NewsfilmOnline (various; digitised with JISC £) • UK Access Management Federation (institutions; authentication) • Data Library – move from support to middle folk • Research data support for Edinburgh researchers • Research data management guidelines, training, OER materials • Edinburgh DataShare – open data repository • RADAR – Researching A Data Asset Registry • Maybe as ‘middle folk’ - c.f. those who deal in middleware • sometimes having the role of creator and supplier of some service • sometimes being the user of what others supply • ‘inter-operator’

Perspective … as aggregator: developing and delivering JISC-sponsored aggregation services • JISCMediahub- links to collections & hosted content(c. 1m resources) CultureGrid; First World War Poetry; Films of Scotland; Getty images (all content searchable and viewable within JISC Media Hub) • GoGeo! - metadata registry for spatially-referenced data Geodoc Metadata creation tool, ShareGeo Open • SUNCAT– serials union catalogue: 80 libraries metadata/links to full text, download MARC records (& XML & SUTRS - Simple Unstructured Text Record Syntax - data exchange format widely used in Z39.50) • PEPRS- e-journal preservation registry jointly led by EDINA with the ISSN International Centre metadata registry of available back copy e-journals - aggregated from preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS)

Some RDTF-related projects @ EDINA • GOgeo Linked Data (GOLD) – triplify INSPIRE compliant metadata to – improve discoverability of metadata records via search engines • SUNCAT: Exploring Open [bibliographic] Metadata (working with OKF to open up data sent by contributing libraries – convert to RDF) • Sharing OpenURL Activity Data - monthly usage data: date & time; anonymised IP address/inst. ID; title; author; ISSN, DOI Uses – article/journal recommendations, publishers reviewing what content is of interest to specific communities, innovative services to meet users’ needs • CHALICE – Use data mining to extract placenames from the English Place Name Survey to create a UK historic gazetteer published as Linked Data & link it to the Geonames ontology on the semantic web. • AddressingHistory – Geo-parsing of Scottish Post Office Directories, API onto digitised content, output in XML, CSV, JSON • 3 further case studies on other EDINA services illustrating how other collections can benefit from the same techniques.

The end is the start of a new beginning … • In earlier ‘web time’ we had the MODELS ‘user-verbs’: Discover -> Locate -> Request -> Access (Deliver) Dempsey, Russell & Murray (1999) http://www.ukoln.ac.uk/dlis/models/publications/utopia/ where Access was the end game for us ‘middle folk’ even if the beginning & part of a deeper process for researchers, students … • Now there is call for more than bilateral & negotiated interoperability, where Access is the beginning for developers and for other services • RDF/Linked Data enables information to be shared in a more Web-friendly way • RDF/Linked Data enables structure and content of those data sources to be explicit - vocabularies, ontologies, relationships Exposing the complexity and relationship in the underlying data, hanging the insides on the outside!

The treasures are on show inside, but … Centre Pompidou

… and so to summarise.. • Early web approaches focused on making content accessible for humans • hiding the complexity and relationship in the underlying data • paying attention to the user interface: HCI & GUI; Usability and Accessibility • However to ensure content gets noticed it must be made easier for machines to understand by: • exposing the complexity and relationship in the underlying data • having in mind the machine-as-user: API as well as HCI • Aggregation should be seenas intervention, with strategic purpose: • to engage in value-added improvement of content • to enhance the discoverability of that which is ‘aggregated’ • to be a focus of attention (thro’ promiscuous metadata!) • If it is with RDF, then that’s good don’t make a fuss if not • Publish RDBMS schemas, catalogue records, codebooks, and • ancillary or related content in multiple, machine-readable formats

The Many Minds principle “the coolest thing to do with your data will be thought of by someone else“ Using data as the building platform Jo Walsh & Rufus Pollock (2007-05-17). Open Data and Componentization. XTech 2007 (slide 14) "Benefits of freeing data are many, arguably being the most relevant one the “Many Minds principle”: there’ll always be someone that will find out a way to reuse data that you wouldn’t have even figured.“ José Manuel Alonso, Notes from the 5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites, organized by the Open University of Catalonia, School of Law and Political Science, and held in Barcelona, Spain, on July 6th and 7th, 2009.

THANK YOU Stuart.Macdonald@ed.ac.uk Peter.Burnhill@ed.ac.uk http://edina.ac.uk/ Repository Fringe 2011 – call for participants: http://www.repositoryfringe.org/ CC BY-NC-ND 2.0 - image by enggul courtesy of Flickr – http://www.flickr.com/photos/enggul/2361808668/

‘aggregation as a tactic’ - to support discovery

‘aggregation as a tactic’ - to support discovery

Presentation Transcript

Welcome to Discovery Education Science for Elementary

November 7, 2007

Discovery Education ThinkLink Assessment

Psychology A Discovery Experience

SURVIVING E-DISCOVERY

Molecular Modeling and Drug Discovery

SNP Discovery and Analysis: Application to Association Studies

Livestock Marketing

Cardiology - hemodynamics

Statistical approaches for detecting unexplained clusters of disease . Spatial Aggregation Thomas Talbot New York State

PFIZER´S MOVE TO THE CLOUD Clinical Aggregation Layer (CAL)

Journey to Discovery

Motif search and motif discovery

The Discovery of America

Gamma-Ray Bursts: Recent Progress and Relation with Cosmology Dai Zigao Nanjing University

Concept 11.1

Modern Methods in Drug Discovery

Computer Aid Discovery Course: Molecular Classification of Cancer

Cabeza De Vaca

BEST PRACTICES with ELECTRONIC DEVICES and DISCOVERY

Corporate Taxation

Mass Spectrometry as the Premier Analytical Tool in Drug Discovery and Drug Development