500 likes | 628 Views
Bridging the Gap between Libraries and Data Archives: Progress Report. Roger Revelle, Gulf of California Expedition, 1939. JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh. Two new NSF Projects …. “Bridging the Gap between Libraries and Data Archives”
E N D
Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 • JISC/NSF Digital Libraries Initiative All Projects Meeting • 24-25 June 2002, Edinburgh
Two new NSF Projects … • “Bridging the Gap between Libraries and Data Archives” • NSDL Collections Track • “SIOExplorer: Web Exploration of Seagoing Archives” • Information Technology Research (ITR) • Started October 2001
Collaborative effort • UCSD Libraries • Scripps Institution of Oceanography • San Diego Supercomputer Center • Advisory Board • NOAA • US Naval Oceanographic Office • Private Industry • Other oceanographic institutions
Combine … • Data • 50 years of digital data • Growing 200 GB per year • Images • 99 years of SIO Archives • Documents • Reports, publications, books … into one digital library
Approx. 3000 cruise legs online at SIO • Bathymetry, magnetics, gravity • Gathered from worldwide sources • 795 SIO cruise legs • Swath bathymetry since 1981
Multibeam sonar revolutionizes seafloor understanding • Map a wide swath • Not just a single profile • SeaBeam Classic, 1981-1992 • 16 beams • SeaBeam 2000, 1992- • 121 beams • SeaBeam 2100, 1996-2000 • 151 beams • Simrad EM120, 2001- • 191 beams • 150 degree swath width • Also backscatter • Determine bottom type • Sediment • Lava flow Realtime swath 20 km across-track
SIO Swath Mapping Expeditions • 244 swath mapping cruises on vessels, since 1981 • Thomas WashingtonMelvilleRevelle • 600 GB multibeam holdings • Adding 200 GB/year
Deliver sampling information • Sample index, 1968- • 100,000 entries • 500 types • Dredged rocks, cores • Biological trawls • Water samples • CTD • Build on www.EarthRef.org • Seamount catalog Roger Revelle, MidPac, 1950 (Amelia Earhart)
Access Voyages of Discovery • Encourage inquiry • “What’s this?” links from image • Data (“What”) • Instruments (“How”) • Other voyages • Dual use • Research and education Naga Expedition, 1959-61 (artist’s illustrations from logbook)
R/V Albatross departed SIO 1904 Sigsbee sounding machine
Voyages of Discovery in the Pacific • La Perouse 1780’s • R/V Revelle • “La Perouse Expedition” • Departed June 8 • R/V Melville • “Cook Expedition” • Returns July 17 James Cook By Nathaniel Dance, 1776 Special Collections, UCSD Library
Voyages of Discovery in the Pacific • 1950’s Ed Hamilton, MidPac, 1950 Samoa, Capricorn, 1952
Query for ideas and careers • Not just data R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, 1952-53 • Track a scientist’s expeditions and publications
Full text of publications • The Challenger Expedition • 30,000 scanned pages • Anatomy of an Expedition • Bill Menard, 1967 Nova Expedition • Link to 1998 Avon Expedition • Exploring the Deep Pacific • Helen Raitt, 1952 Capricorn Expedition
Cruise reports • 50 years available • Scan older versions • Currently generate .pdf automatically Page with swath bathymetry every 6 hours
Bridging the Gap: Progress Report
The Problem • Archives are search-impaired • Content not a problem • Material exists in great abundance • Data archives • Historical archives • But it is hard to get • Litany of woes …
Litany of archive woes • Magnetic media at risk • Need to migrate to new storage • Local access only • Some online, but sprawling directories • Tapes and CDs in drawers • Inconsistent naming over 30 years • Home-grown software • Pre-database technology • Minimal documentation • Formal metadata non-existent • Creators now retired • What to do? Shipboard archives for one recent cruise
Steps toward a Solution • Seek professional help • Computer scientists • Advisory Board • (Similar problems faced in many fields) • Review the problem • Seven issues from national workshop • Analyze the dataflow • Build a prototype • Test the prototype • New Zealand – Samoa Expedition
Review archive problems • NSF/ONR Marine Geology and Geophysics Workshop
Dataflow • First, create a conceptual data model • Spend time to review with all participants • Design a robust model • Define common categories • 9 basic directories • Specific subdirectories • Controlled design document • Map existing digital objects to categories • Both documents and data • Accommodate variations • Data types and names over 50 years • Valid for future developments • Result “CCDS” – Canonical Cruise Data Structure
Second, organize domain-specific content • Work inside a “Staging Area” • Deal with complexity • Extract from 3 archive levels • Shipboard (tape, CD) • Post-processing lab (tape) • Current online content • (not always “best”) • Opportunity for data cleanup • Apply corrections • Weed out intermediate and duplicate versions • Gather information for metadata
Third, load the “CCDS” • Clear transition in activities • Domain specialists final approval • IT team takes over • Early mistake • “Pushed” content from legacy data directories • Complex, vary over years • Revised to “pull” into Canonical Structure • IT lesson learned • Dataflow needs to be “template-driven” • Template can incorporate • Rules for automatic loading • Adaptive choice among multiple alternatives • Maintain flexibility as project evolves • Team members negotiate content of template
Fourth, load the data • Persistent data archive management • Use the “Storage Resource Broker” • San Diego Supercomputer Center product • Fifth, load the metadata • Harvest metadata from data files, automatically • Provide tools for metadata editing • Load into Oracle
Collection Developer’s Toolkit • Make it easy to build, and maintain • Not just for IT experts • Portable and scalable for other projects • Integrate • Metadata tools • Data tools • Interactive search and display console
Make use of existing resources • Alexandria Digital Library • Geospatial content • OAI-compliant server • Environmental data archive and delivery tools • John Helly, http://ceed.sdsc.edu/ • Storage Resource Broker • http://www.npaci.edu/DICE/SRB/index.html/ • Domain-specific toolkits • GMT, MB-System, ARC/IMS
Build metadata tools • Automate • Bulk harvesting from data files • Bulk loading into Oracle database • Use NSDL community standards • Dublin Core + “ADN” metadata • Alexandria Digital Library (UCSB) • DLESE (Digital Library for Earth System Education) • NASA • Controlled vocabularies • Science themes • Geographic names • Embed domain-specific metadata into standards • Multibeam, cruise, sampling
MOBE • Metadata Object Browser and Editor • Inherit metadata from • Dublin Core • Cruise • Flexible • Expand for projects as needed • Generic ascii metadata interchange format “MIF” • Export to xml • Java
Search interface • Design for alternative approaches • Geospatial • Lat, lon • Temporal • “1995-2000” • Keyword • Region “Samoa” • Vessel “Melville” • Cruise “AVON02MV” • Data type “dredge” • Scientist “Staudigel” • Expert-level • Research, teacher, student, public Prototype search interface
CruiseViewer • Interactive browser and query interface • Display tracks and samples • Download library objects • Java
Manage interfaces for multiple projects • Both data and metadata
Make it easier to collaborate • Interactions between groups • Not just a technology project • Diverse goals, vocabularies and audiences • Interoperate • Each domain has own sphere of responsibility • Don’t engineer someone else’s domain • Work through interfaces • Re-negotiate as needed • Avoid long-term maintenance headaches between domains
Build tools for collaborative projects • 3 “cultures” in this project • Oceanographers • Computer scientists • Librarians • Example: bridge vocabularies between separate domains • Use metadata “triples,” not “pairs” • Reduce phone calls by including narrative label
Adding newprojects to SIOExplorer • Make use of • Collection Developer’s Toolkit • NSDL server • Metadata interchange • Query processing • SDSC • Managed storage • Web service
Test the prototype Melville departs Lyttelton harbor
Floating Digital Library Workshop • R/V Melville • March 7-21, New Zealand to Samoa • Realtime acquisition of library objects? • Load metadata into swath files • At acquisition time • Specify cruise metadata • Sensor documentation database • Load the CCDS • Learn from a common experience
A good day at 51° S Renewed appreciation for the collection of field data
Common experience • Librarians • Computer scientists • Oceanographers • Royal New Zealand Navy Collaboration between SIO and RNZN Melville in Lyttelton
Floating Digital Library Workshop Librarian at sea Computer scientist in galley Oceanographer holding onto computer
Bollons Gap survey • New Zealand Law of the Sea Claim Librarian at sea Visualization of swath bathymetry, looking north
Heading for Samoa • Crossing the Louisville Ridge • Tonga Trench • Osbourn Trough (ancient spreading center) Visualization of Global Topography, looking north
Relate cruise to SIO holdings • Display search results • Red • SIO multibeam • Black • Other cruises • Yellow • SIO dredged rock samples • Also • Volcanoes • Earthquakes • Plate boundaries • Typical research support product • Make it available on web • Select cruises for further study • Export for ArcView • Related NSF/ITR project
Next steps • Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer • NSF Division of Biological Infrastructure • Collaboration with Smithsonian Institution • Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL • NSF NSDL Collections Track Track of the Albatross, 1884-1921
SIOExplorer: Expedition Planner • Open research data for student discovery • Leverage Digital Library efforts • Students design a virtual expedition • Explore relationships • Depth, Sediment thickness, Crustal age • More … • Earthquakes, volcanoes, trenches • Wind, waves, currents • Climate • Students publish expedition report • On the web • Teacher workshops • At the Birch Aquarium Global Topography Sediment thickness Crustal Age
SIO 100th Anniversary • September 26, 2003 R/V Alexander Agassiz, 1907 SIO, 1909 • http://SIOExplorer.ucsd.edu