Data discovery from a digital library perspective

Data discoveryfrom adigital library perspective Greg Janée, Darren Hardy UC Santa Barbara

Outline • Questions • grappling with granularity • struggling with search • dithering over distribution • pondering process • Integrating search with access

institution (NASA) data center (GSFC) program (MODIS) product (sea surface temperature) resolution (1km) space time granule datum Granularity type organization

Approaches I • ADL • uniform object (metadata) representation • flat list of collections (=containers) • possible extensions: • collections as first-order objects • nested containers • THREDDS • hierarchical “collection” datasets • “coherent” datasets (=aggregation server?) • “direct” datasets

Approaches II • Granularity on the Web... • webpage • multi-page document • website • ...and sidestepping it • uniform representation (webpage) • page linking • visible, decomposable identifiers (URLs)

Flattening granularity • Use heuristics to return “best” match inherit descriptive metadata dataset aggregate intrinsic metadata

Search • Type • text, numeric, space, time, ... • Source • data itself • intrinsic metadata • added (usually descriptive) metadata • 3rd party

Distribution • Centralized system • eg. Google, ECHO • SPOF; requires resources • Peer-to-peer • eg. BRICKS, built on P-GRID • MPOF; requires commitment • ADL: incomplete peer-to-peer

A “textbook” search process • Classic process (Lancaster 1979) • Information need • Stated request • Selection of database • Search strategy • Search in database • Screening of output • Web search - about the same 25 years later

What’s the real process? • Irrational search (Pharo & Järvelin 2006) • Textbook search processes insufficient • Disjointed incrementalism theory • Many smaller steps • Learning during a search • Subjective & dynamic information needs over time • What’s the ideal for earth science data users? • How do you inform choices during search? • How do you formulate a search, and what’s the context? • When is enough enough?

Integrating search with access • File menu • Open... • Search library... • Close • Quit • Query results returned as a THREDDS catalog?

We’re funded to do this!

Data discovery from a digital library perspective

Data discovery from a digital library perspective

Presentation Transcript

Current issues in digital preservation: A perspective from the Digital Library Federation

Managing Enterprise Data From A Semantic Perspective:

Discovery tools from the CBUC perspective

From Library Catalog to Discovery Search Tools

Library discovery:

Funding a Digital Library

Workflow Discovery from Empirical Data

Ontologies for music from a digital library practitioner’s perspective

Towards a Digital Library Theory: A Formal Digital Library Ontology

DATA: The Issues From A Publisher’s Perspective

Current issues in digital preservation: A perspective from the Digital Library Federation

Bioinformatics from a drug discovery perspective

From Data to Discovery

Moving to e-only from a library perspective

Effective discovery of geospatial data: a geospatial catalogue perspective

Riometer data from a Lancaster perspective

Building a Digital Library

From Data to Discovery

DATA: The Issues From A Publisher’s Perspective