410 likes | 420 Views
SIOExplorer: Advances across Disciplinary and Institutional Boundaries Cyberinfrastructure for Earth Systems Science AGU Fall Meeting San Francisco Dec 16, 2004 Stephen Miller, Dru Clark Geological Data Center, SIO
E N D
SIOExplorer: • Advances across • Disciplinary and • Institutional • Boundaries • Cyberinfrastructure for Earth Systems Science • AGU Fall Meeting San Francisco • Dec 16, 2004 • Stephen Miller, Dru Clark • Geological Data Center, SIO • John Helly, Don Sutton, Tiffany Houghton • San Diego Supercomputer Center http://SIOExplorer.ucsd.edu
Community Goals Barriers to advances Cyber-capabilities Case Study
Community Goals • Broad support for • marine sciences • Across disciplines • Across institutions • Research • And education
Persistent data storage not enough • Need metadata • Enable interdisciplinary searches • Need infrastructure • Access diverse, distributed resources
Why re-use data? • New ship time $22K/day, hard to schedule • Archive data increasingly valuable for: • Researchers unable to go on cruise • Regional synthesis projects • Monitoring environmental changes through time • Interdisciplinary studies • Beyond scope of original project • Issue for future use: • Access to complete cruise collections
Accomplish re-use of data • Current practice hit-or-miss • Individual projects archive only selected data • Cyberinfrastructure allows comprehensive solution • Systematic collection building • Auto-harvesting • Data and metadata • Claim: • May cost little more to archive all sensors
Data from a firehose • Can we keep up? • Shipboard data rates – yes • Satellite links – maybe • depends on heading • Metadata – yes, but • not widely implemented • Preservation – maybe • Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul
We can archive from paper documents Track plots Cruise reports Handwritten and printed data
But digital preservation is risky business • Endangered Species • 9-track tapes • Exabytes fail • Even CDs fail • RAIDs fail • “Shoe-box” archiving • not to be trusted
Solution: Active Archiving • Don’t trust any media, person or process • Actively monitor status • Migrate to new storage • Mirror on multiple systems • Backup to independent sites • Technology makes this possible, just need to do it
Quality barriers • Shipboard quality • Correct issue at sea • Before it becomes a problem • Satellite link • Support with shore lab expertise • Propagate changes into archives • Dynamic resources, not static • “Better versions” become available
Barrier: “Island-Style” Development • Not-invented-here syndrome • Each institution develops own • Archive procedures • Storage • Metadata • Controlled vocabulary • Relational database • User interface • Likewise for each discipline and laboratory • All islands eventually sink • Can we find a Cyber-solution for interoperability?
Risks due to NSF paradigm • Each Chief Scientist responsible for archiving own cruise • No guarantee of • Access to data from all sensors • Reliable long term storage • Persistent contact • Volatile career paths • Expertise • Not everyone a multibeam expert • Systematic corrections needed after-the-fact • Transit cruises have no chief scientist • Claim: need for systematic archiving
NSF Funding Barrier • Difficult to obtain funding for infrastructure • Inherent conflict of interest • Competition for program dollars • Reviewers choose proposals for new field programs • rather than archiving • Need for independent funding stream
Cultural Barrier – they don’t love metadata • Why bother? • It’s not really data • Everyone here already knows all about that • Standards are not helpful • Which standard to choose? • Standards are too bulky • Metadata creation takes too long • Which parameters to use? • Which values to use? • Which controlled vocabulary? • How can we create incentive?
Cultural Incentives? • Career credit for data publications • AGU recognition for good electronic citizenship • Require submission to an archive • Hold up new proposals • RIDGE, MARGINS • Other ideas?
Emerging Cyber-capabilities • Crossing discipline boundaries • Case study • SIOExplorer digital library • Design for scalability • Automate harvesting • Collection Builder’s Toolkit for other projects • Crossing institutional boundaries • Interoperability • Memorandum of Agreement – SDSC/SIO/LDEO/WHOI • DIGARCH proposal
Case Study: • Building SIOExplorer • Community access • Data • Images • Documents • 150,000 objects • 500 GB • Implications for • multi-discipline collections Allow alternative base mapscrustal age, topography, sediment thickness, other
Design to Overcome Island-Style Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress
4 Acronyms • MTF – Metadata Template File • Records digital library structure version • MIF – Metadata Interchange File • Contains metadata values • Domain-specific blocks • Export to xml • ADO – Arbitrary Digital Object • Data files • CCDS – Canonical Cruise Data Structure • Used in auto-harvesting • Organize data, generate metadata John Helly, SDSC
MTF- key to scalable collection building • “Metadata Template File” controlled document • Maintain hierarchical relationships • Synchronize metadata and data • Automatic template-driven approach • Library continues to work • While project structures evolve • All code traverses mtf • Metadata file preserves which mtf version was used • Insures forward and backward compatibility • Allows life-cycle maintenance • Technology for long-term archive persistence
Create MTF with COBE Design structure of collection Can also define metadata mapping Part of Collection Builders Toolkit
View metadata (mif) with MOBE Browse through domain-specific metadata blocks Part of Collection Builders Toolkit
Example:building the data-intensive • “SIO Cruises” collection • Publishing Authority • Geological Data Center Plone-managed workgroup website http://gdc.ucsd.edu/betahttp://plone.org/ Dru Clark, GDC
Current Status: 615 cruises loaded • 100 cruises first year • Development • 189 cruises last month • Production • Almost completely automated operation
CCDS- key to auto harvesting • “Canonical Cruise Data Structure” • Work from staging area • Map to 8 broad categories • Multibeam, underway, documentation, etc. • Adequate for 736 cruises over 40 years • Automatic template-driven approach • Migrate files to location in CCDS • Generate metadata • By location in CCDS • And by extracting directly from files • Lat, lon, date, etc • Adaptable for other disciplines
Managing Data Flow into a Digital Library Rules-based template allows auto harvesting despite decades of cruise data “evolution” Individual cruise status Data flow status board, live on web, proprietary hold, auto-updated
Automated QC Status Board for Multibeam • Automatic alerts for common problems • Shipboard files can be monitored from shore via satellite link Additional web tools for bulk metadata QC
Community access with CruiseViewer • Graphical Java interface All SIOExplorer Collections • Metadata • Oracle or PostgreSQL • Data • Storage Resource Broker • User • Graphical search • Keyword search Discover content Browse metadata View or download objects Don Sutton, SDSC Search results for visualization objects
CruiseViewer Session Search formultibeam swath sonar seafloor mapping files ~300 multibeam cruises since 1982 Right-click on track line for cruise overview
Launch visualization experiences • Visualization file created for every multibeam swath file • Research • Sonar QC • Education • Download free iView3D viewer http://www.ivs.unb.ca/products/iview3d/
Who uses • SIOExplorer? • 50,000 hits/month • 6 GB downloads Monthly web hits Results from SDSC webalizerhttp://nsdl.sdsc.edu/stats
Broader Impact with ERESE National Teachers Workshops • Enduring Resources for Earth Science Education • Two-week summer workshops • 2004 and 2005 • Build inquiry-driven learning experiences 33 participants on R/V Sproul cruise, July 14
CUAHSI Hydrologic Information System based on mtf-mif-ado technology 95 institutional members http://cuahsi.sdsc.edu/HIS/ HydroViewer, derived from CruiseViewer http://www.cuahsi.org/
Other institutions using or evaluating mtf technology • WHOI – DIGARCH proposal pending • Bob Detrick • CCOM – UNH cruise and multibeam archives • Larry Mayer, Jim Case • MBARI – collection building to start Feb 2005 • Dave Caress, Andrew Chase • HAWAII – evaluating for R/V Kilo Moana and archives • Bruce Applegate • NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand • John Helly, Don Robertson
New Projects Marine Metadata Interoperability Clearinghouse Metadata technology MBARI lead institution http://marinemetadata.org
Shipboard • “Digital Library in a Box” • Portable realtime • digital library • acquisition system • Metadata and data • Laptop with • public domain tools • PostgresQL • Storage Resource Broker • MB-System • GMT • CruiseViewer Testing San Diego - Costa Rica, DANA01RR Oct 2003
Pending DIGARCH Proposal • NSF CISE/Library of Congress • Digital archiving and long term preservation • Multi-institution Testbed for Scalable Digital Archiving • SIO/SDSC/WHOI • Extend SIOExplorer digital library approach to WHOI • Integrate SIO, SDSC and WHOI tools and data • WHOIexplorer • 30 years of cruise data • Alvin dives • Jason ROV surveys (200 DVD per cruise) • Interoperability across institutions
Memorandum of Agreement • Cyberinfrastructure for Marine Research and Education • The purpose of this Memorandum of Agreement is to recognize common goals between our institutions in advancing a community cyberinfrastructure for the future of Marine Research and Education and make a continuing commitment to collaborate in this effort… • Interoperability in data and metadata for data centers and ocean-going vehicle operations including ships and submersibles • Data acquisition protocols in the field and laboratory • Publication of data • Education of graduate students and researchers in these methods • Jointly pursue funding opportunities Cyber MOA Workshop November 4, 2004 Signed by Directors: SIO Charles Kennel LDEO G. Michael Purdy WHOI Robert Gagosian SDSC Francine Berman CIESN Roberta Balstad