1 / 41

SIOExplorer: Advances across Disciplinary and Institutional Boundaries

SIOExplorer: Advances across Disciplinary and Institutional Boundaries Cyberinfrastructure for Earth Systems Science AGU Fall Meeting San Francisco Dec 16, 2004 Stephen Miller, Dru Clark Geological Data Center, SIO

Download Presentation

SIOExplorer: Advances across Disciplinary and Institutional Boundaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. SIOExplorer: • Advances across • Disciplinary and • Institutional • Boundaries • Cyberinfrastructure for Earth Systems Science • AGU Fall Meeting San Francisco • Dec 16, 2004 • Stephen Miller, Dru Clark • Geological Data Center, SIO • John Helly, Don Sutton, Tiffany Houghton • San Diego Supercomputer Center http://SIOExplorer.ucsd.edu

  2. Community Goals Barriers to advances Cyber-capabilities Case Study

  3. Community Goals • Broad support for • marine sciences • Across disciplines • Across institutions • Research • And education

  4. Commit to long term preservation

  5. Persistent data storage not enough • Need metadata • Enable interdisciplinary searches • Need infrastructure • Access diverse, distributed resources

  6. Why re-use data? • New ship time $22K/day, hard to schedule • Archive data increasingly valuable for: • Researchers unable to go on cruise • Regional synthesis projects • Monitoring environmental changes through time • Interdisciplinary studies • Beyond scope of original project • Issue for future use: • Access to complete cruise collections

  7. Accomplish re-use of data • Current practice hit-or-miss • Individual projects archive only selected data • Cyberinfrastructure allows comprehensive solution • Systematic collection building • Auto-harvesting • Data and metadata • Claim: • May cost little more to archive all sensors

  8. Barriers to advances

  9. Data from a firehose • Can we keep up? • Shipboard data rates – yes • Satellite links – maybe • depends on heading • Metadata – yes, but • not widely implemented • Preservation – maybe • Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul

  10. We can archive from paper documents Track plots Cruise reports Handwritten and printed data

  11. But digital preservation is risky business • Endangered Species • 9-track tapes • Exabytes fail • Even CDs fail • RAIDs fail • “Shoe-box” archiving • not to be trusted

  12. Solution: Active Archiving • Don’t trust any media, person or process • Actively monitor status • Migrate to new storage • Mirror on multiple systems • Backup to independent sites • Technology makes this possible, just need to do it

  13. Quality barriers • Shipboard quality • Correct issue at sea • Before it becomes a problem • Satellite link • Support with shore lab expertise • Propagate changes into archives • Dynamic resources, not static • “Better versions” become available

  14. Barrier: “Island-Style” Development • Not-invented-here syndrome • Each institution develops own • Archive procedures • Storage • Metadata • Controlled vocabulary • Relational database • User interface • Likewise for each discipline and laboratory • All islands eventually sink • Can we find a Cyber-solution for interoperability?

  15. Risks due to NSF paradigm • Each Chief Scientist responsible for archiving own cruise • No guarantee of • Access to data from all sensors • Reliable long term storage • Persistent contact • Volatile career paths • Expertise • Not everyone a multibeam expert • Systematic corrections needed after-the-fact • Transit cruises have no chief scientist • Claim: need for systematic archiving

  16. NSF Funding Barrier • Difficult to obtain funding for infrastructure • Inherent conflict of interest • Competition for program dollars • Reviewers choose proposals for new field programs • rather than archiving • Need for independent funding stream

  17. Cultural Barrier – they don’t love metadata • Why bother? • It’s not really data • Everyone here already knows all about that • Standards are not helpful • Which standard to choose? • Standards are too bulky • Metadata creation takes too long • Which parameters to use? • Which values to use? • Which controlled vocabulary? • How can we create incentive?

  18. Cultural Incentives? • Career credit for data publications • AGU recognition for good electronic citizenship • Require submission to an archive • Hold up new proposals • RIDGE, MARGINS • Other ideas?

  19. Emerging Cyber-capabilities • Crossing discipline boundaries • Case study • SIOExplorer digital library • Design for scalability • Automate harvesting • Collection Builder’s Toolkit for other projects • Crossing institutional boundaries • Interoperability • Memorandum of Agreement – SDSC/SIO/LDEO/WHOI • DIGARCH proposal

  20. Case Study: • Building SIOExplorer • Community access • Data • Images • Documents • 150,000 objects • 500 GB • Implications for • multi-discipline collections Allow alternative base mapscrustal age, topography, sediment thickness, other

  21. Design to Overcome Island-Style Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress

  22. 4 Acronyms • MTF – Metadata Template File • Records digital library structure version • MIF – Metadata Interchange File • Contains metadata values • Domain-specific blocks • Export to xml • ADO – Arbitrary Digital Object • Data files • CCDS – Canonical Cruise Data Structure • Used in auto-harvesting • Organize data, generate metadata John Helly, SDSC

  23. MTF- key to scalable collection building • “Metadata Template File” controlled document • Maintain hierarchical relationships • Synchronize metadata and data • Automatic template-driven approach • Library continues to work • While project structures evolve • All code traverses mtf • Metadata file preserves which mtf version was used • Insures forward and backward compatibility • Allows life-cycle maintenance • Technology for long-term archive persistence

  24. Create MTF with COBE Design structure of collection Can also define metadata mapping Part of Collection Builders Toolkit

  25. View metadata (mif) with MOBE Browse through domain-specific metadata blocks Part of Collection Builders Toolkit

  26. Example:building the data-intensive • “SIO Cruises” collection • Publishing Authority • Geological Data Center Plone-managed workgroup website http://gdc.ucsd.edu/betahttp://plone.org/ Dru Clark, GDC

  27. Current Status: 615 cruises loaded • 100 cruises first year • Development • 189 cruises last month • Production • Almost completely automated operation

  28. CCDS- key to auto harvesting • “Canonical Cruise Data Structure” • Work from staging area • Map to 8 broad categories • Multibeam, underway, documentation, etc. • Adequate for 736 cruises over 40 years • Automatic template-driven approach • Migrate files to location in CCDS • Generate metadata • By location in CCDS • And by extracting directly from files • Lat, lon, date, etc • Adaptable for other disciplines

  29. Managing Data Flow into a Digital Library Rules-based template allows auto harvesting despite decades of cruise data “evolution” Individual cruise status Data flow status board, live on web, proprietary hold, auto-updated

  30. Automated QC Status Board for Multibeam • Automatic alerts for common problems • Shipboard files can be monitored from shore via satellite link Additional web tools for bulk metadata QC

  31. Community access with CruiseViewer • Graphical Java interface All SIOExplorer Collections • Metadata • Oracle or PostgreSQL • Data • Storage Resource Broker • User • Graphical search • Keyword search Discover content Browse metadata View or download objects Don Sutton, SDSC Search results for visualization objects

  32. CruiseViewer Session Search formultibeam swath sonar seafloor mapping files ~300 multibeam cruises since 1982 Right-click on track line for cruise overview

  33. Launch visualization experiences • Visualization file created for every multibeam swath file • Research • Sonar QC • Education • Download free iView3D viewer http://www.ivs.unb.ca/products/iview3d/

  34. Who uses • SIOExplorer? • 50,000 hits/month • 6 GB downloads Monthly web hits Results from SDSC webalizerhttp://nsdl.sdsc.edu/stats

  35. Broader Impact with ERESE National Teachers Workshops • Enduring Resources for Earth Science Education • Two-week summer workshops • 2004 and 2005 • Build inquiry-driven learning experiences 33 participants on R/V Sproul cruise, July 14

  36. CUAHSI Hydrologic Information System based on mtf-mif-ado technology 95 institutional members http://cuahsi.sdsc.edu/HIS/ HydroViewer, derived from CruiseViewer http://www.cuahsi.org/

  37. Other institutions using or evaluating mtf technology • WHOI – DIGARCH proposal pending • Bob Detrick • CCOM – UNH cruise and multibeam archives • Larry Mayer, Jim Case • MBARI – collection building to start Feb 2005 • Dave Caress, Andrew Chase • HAWAII – evaluating for R/V Kilo Moana and archives • Bruce Applegate • NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand • John Helly, Don Robertson

  38. New Projects Marine Metadata Interoperability Clearinghouse Metadata technology MBARI lead institution http://marinemetadata.org

  39. Shipboard • “Digital Library in a Box” • Portable realtime • digital library • acquisition system • Metadata and data • Laptop with • public domain tools • PostgresQL • Storage Resource Broker • MB-System • GMT • CruiseViewer Testing San Diego - Costa Rica, DANA01RR Oct 2003

  40. Pending DIGARCH Proposal • NSF CISE/Library of Congress • Digital archiving and long term preservation • Multi-institution Testbed for Scalable Digital Archiving • SIO/SDSC/WHOI • Extend SIOExplorer digital library approach to WHOI • Integrate SIO, SDSC and WHOI tools and data • WHOIexplorer • 30 years of cruise data • Alvin dives • Jason ROV surveys (200 DVD per cruise) • Interoperability across institutions

  41. Memorandum of Agreement • Cyberinfrastructure for Marine Research and Education • The purpose of this Memorandum of Agreement is to recognize common goals between our institutions in advancing a community cyberinfrastructure for the future of Marine Research and Education and make a continuing commitment to collaborate in this effort… • Interoperability in data and metadata for data centers and ocean-going vehicle operations including ships and submersibles • Data acquisition protocols in the field and laboratory • Publication of data • Education of graduate students and researchers in these methods • Jointly pursue funding opportunities Cyber MOA Workshop November 4, 2004 Signed by Directors: SIO Charles Kennel LDEO G. Michael Purdy WHOI Robert Gagosian SDSC Francine Berman CIESN Roberta Balstad

More Related