1 / 38

Information Management in a Non-Bibliograpic Environment: Scientific Data

Information Management in a Non-Bibliograpic Environment: Scientific Data. Joseph A. Hourclé 2007-Nov-20 FLICC Learning@Lunch. About Me. STEREO : Solar TErrestrial RElations Observatory. The Virtual Solar Observatory. The Virtual Solar Observatory. Federated Search of Solar Physics Data

felice
Download Presentation

Information Management in a Non-Bibliograpic Environment: Scientific Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management in aNon-Bibliograpic Environment: Scientific Data Joseph A. Hourclé 2007-Nov-20 FLICC Learning@Lunch

  2. About Me

  3. STEREO : Solar TErrestrial RElations Observatory

  4. The Virtual Solar Observatory

  5. The Virtual Solar Observatory • Federated Search of Solar Physics Data • 14 organizations (currently) • 4 more organizations being integrated • 62 instruments • Hundreds of distinct data collections • 10s of millions of records • Terabytes of Data

  6. The data is growing … • STEREO • Launched Oct 2006 • Over 1.5 million images @ up to 8MB • Hinode (Sunrise aka Solar-B) • Launched Sept 2006 • Over 3 million images @ up to 8 MB • SDO • Scheduled to launch Aug 2008 • 1 image per second @ 32 MB • 1.5TB/day dedicated connection

  7. Other disciplines have even more data • NVO : US National Virtual Observatory • LSST (Large Synoptic Survey Telescope) • Scheduled to start observing in 2012 • 7-10 TB/night, 3.2Gpix images • ~10 PB/yr • EOS/DIS : Earth Observing System/Data Information System • About 2TB/day, per satellite (8?) • Planned to be 16 PB

  8. … and we’re not the only one • Heliospheric • Magnetospheric • Radiation Belt • ITM (upper atmosphere) • NVO / IVOA : nighttime astronomy • PDS : planetary • EOS : earth

  9. What is Scientific Data?

  10. How is Scientific Data Gathered? • Scientist thinks up a problem • Scientist (and Engineers) create an instrument to conduct an investigation • The instrument collects data via sensors • Data are calibrated • Data are written into scientifically useful formats • Data are distributed to the scientists

  11. But really, what is data? • There is no formal definition. • It’s as ambiguous as the term “book” • Data may be shorthand for: • Data Collection • Data Series • Data Set • Data Product • Data Granule

  12. The problem with “data” • Every investigation has different data needs • Each investigation organizes and catalogs the data to answer their scientific question • What is “good” data for one group may not be useful for another • Because data is being collected continuously, there may not be a consistent boundary on one “granule” of data • Some data is tracked as individual values, and only packaged upon request • Mostly time-series data, not images

  13. Types of Data Archives • Instrument Archives • Maintained by the PI team • Little or no consideration towards re-use • Resident Archive • Maintained by a specific discipline • Re-use within the given discipline • Long-Term Archive • Required for federally funded studies • Focus on preservation, not use of data

  14. Active Archives • Still changing • May be ingesting from an active mission • May still be processing their data • May serve multiple editions or processed states of the data • Final Data in “Physical Units” typically isn’t available until one or more years after the mission • Not directly comparable with data from other instruments until then

  15. Isn’t this just Knowledge Management? • There is no knowledge in the raw data • But there is knowledge in the design of the instruments & sensors • What spectral range are the instruments sensitive to? • What are the instrument’s possible operating modes? • Knowledge of the instruments & sensors affect how the scientists interpret data • The scientists have to interpret the results to determine the knowledge • May be reluctant to have others catalog their data, as it requires understanding the science

  16. Multiple Operating Modes:Filters on SOHO/EIT 171Å 195Å 284Å 304Å

  17. Known Sensor Issues: SOHO/LASCO

  18. Knowledge Mgmt, cnt’d • We do have ‘Event’ and ‘Feature’ Catalogs • Scientists will record when/where they think something interesting is occurring, and share with others.

  19. Data Processing : Raw Image (Linear)

  20. Data Processing : Calibrated (Greyscale)

  21. Data Processing : Before Calibration

  22. Data Processing : Best Calibration

  23. Data Processing:CCD Aging

  24. CCD Calibration 195Å 171Å 304Å 284Å

  25. Higher Level Data

  26. The Problems … • Cross discipline translation is difficult • Concepts of what makes data useful differs between disciplines • Different disciplines use different search parameters • VSO : time, spectral range, location on sun • Always looking at the same object • VHO : location of observer, time, spectral range • Observatories are moving, in situ measurements • EOS : location of object observed • NVO : direction of pointing (assumed from earth)

  27. Problems, cnt’d. • Even when there is agreement, there are still problems • Which time is important? • Start time? • Average time? • Spacecraft time? • Which coordinate system is used?

  28. Problems, still cnt’d • Each discipline is working on solutions within their field • Build systems that suit the needs of their community • Each discipline has different “first class data” • Currently working on metadata standards so data can be discovered and used by other disciplines • SPASE; MMI; GEON • Some work on ontologies to help with discovery and use • VSTO; SWEET; GEON; SESDI

  29. Lots of Permutations

  30. I know what you’re thinking…

  31. And it mostly works

  32. How does this affect libraries? • The library is a changing organism • Data is relatively unanalyzed in LIS • Data connects to bibliographic records, and visa-versa • What data was used in this journal article? • Where can I get documentation on using this data? • Has anyone published anything using this data? • Data connects to other data • What other instruments observed a given event? • Is there an alternate version that better meets my needs?

  33. There’s funding for research • NSF: • CDI : Cyber-Enabled Discovery and Innovation • INTEROP :Community-based Data Interoperability Networks • IIS : Information and Intelligent Systems • DataNet : Sustainable Digital Data Preservation and Access Network Partners • NASA: • AISR : Advanced Info. Systems Research • ACCESS : Advancing Collaborative Connections for Earth Science Access

  34. Sunspot on 15 July 2002 from the Swedish 1-m Solar Telescope on La Palma

  35. http://virtualsolar.org/ http://stereo.gsfc.nasa.gov joseph.a.hourcle@nasa.gov

More Related