1 / 18

The Path Toward Data System Integration

The Path Toward Data System Integration. Raymond J. Walker Todd A. King Steven P. Joy. Science Archives in the 21 st Century University of Maryland April 25-26, 2007. “ Yes, Haven, most of us enjoy preaching, and I have such a bully pulpit!” Theodore Roosevelt Prepare for a sermon!.

booth
Download Presentation

The Path Toward Data System Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Path Toward Data System Integration Raymond J. Walker Todd A. King Steven P. Joy Science Archives in the 21st Century University of Maryland April 25-26, 2007

  2. “Yes, Haven, most of us enjoy preaching, and I have such a bully pulpit!” Theodore Roosevelt Prepare for a sermon!

  3. A Persistent Dream A global data environment in which all Earth and space science data are organized in a common way with “one stop shopping” for any data product. After decades of trying we are not very close to achieving that dream.

  4. Goals for Science Data Systems • Help scientists locate the data required for a given study. • Provide scientists with access to those data. • Assure that those data are useable. • Preserve the data forever. • Aid scientists in using the data.

  5. A Few Realities • Don’t try to build a centralized system.The data are distributed and will be. • It is all about science. Allow the science needs and the scientific community to drive the system. • Adopt community wide standards. The key to interoperability within a data system is the metadata. • No data model is perfect. New requirements emerge continually. • Leverage what already exists. The are a variety of valuable community assets.

  6. Can there be a Single Solution? Not yet…. because • Each community has distinct needs. • Each community has unique histories. • Science must continue while systems are deployed. • Changes must be evolutionary. • Leverage existing systems and assets. • The resources are limited. • Revolutions are expensive.

  7. Plus…The Data are Found Worldwide • More nations are active participants in space. • Each mission enhances and complements the current body of data. • All data are important. • Answers to current science questions require data from multiple sources. • Individual communities need autonomy. • Governmental • Project organization • Efficiency

  8. A Mature Data System(Planetary) • The Planetary Data System • Serving the NASA planetary science community for almost 20 years. • Mature data model. • Well suited for archiving. • IPDA (International Planetary Data Alliance) • U.S., EU, Japan, China, Russia, and India • Formed in 2006 to define a standard based on (inspired by) the PDS data model. • Expect draft to be vetted by the community in late 2007.

  9. What do you do if there are no accepted metadata standards and the data are highly distributed?

  10. An Emerging Data System(Heliophysics) • Needed a way to connect existing systems. • A new model that would be an “interlingua” was required. • SPASE (Space Physics Archive Search and Extract) • Concept defined in 2002. • Formed in 2003 to define a standard for data exchange for Space Physics. • International participation (U.S., France, Britain, Japan, Canada). Open to all. • Releases: • Version 1.0.0 released in November 2005 • Version 1.1.0 released in August 2006 • Version 1.2.0 release is eminent

  11. ResidentArchive VxO Individual Researcher VMO The Heliophysics Data Environment – December 2007

  12. Implementing a Space Physics Data System • Virtual Observatories • Provide standards based access by sub-discipline. • Aid data providers in making their resources available. • Serve as integrating portal to existing data repositories. • Mission data bases • Resident archives • Researcher data sets • Serve as integrating portal to services • See CoSEC (Collaborative Sun-Earth Connector) for a functioning example.

  13. When is Enough, Enough? • The metadata have to be relevant to the purpose. • Initially PDS developed a metadata standard which was very rich scientifically. • Data providers complained that it was too rich. That the effort required to generate the metadata was too large. • PDS then modified the metadata standards to be more in keeping with what the data providers could support. • Clarity comes from usage. • The threshold of participation must be kept low.

  14. Assure that the Data are Useable Data Quality • The data systems have a responsibility to provide the best quality data available. • You only learn about data by trying to do science with it. • Peer review has proven to be an important tool for assuring data quality. Data Processing • Most science is done with highly processed data. • Frequently only raw data plus algorithms and calibrations (or software) are archived. • Data need to be readily useable- secondary users often don’t have the resources to process raw data into physical units even with well documented data or software.

  15. Formats! Formats! Formats! • One of the most contentious issues during PDS’ 20 year history concerns data formats. • Long ago the astronomy community settled on the Flexible Image Transport System (FITS) as their main format. • FITS is inappropriate for many types of data (e.g. time series tables). • Many planetary scientists objected to FITS so PDS accepts most formats (provided they can be described by the PDS metadata standard). • The result is many formats and even accepted formats are sometimes hard to describe. • Is it possible and desirable to limit this ever growing list? YES!!!

  16. Preserve the Data Forever • Long term preservation remains a serious issue. • PDS has been archiving the data on “hard media” (CD, DVD). • They have media as old as 20 years. • They have found problems with the media (both stamped and write once) that are only a couple of years old (M. Martin and B. Harris). • The decay time is short compared to the time over which we had planned to renew media. • Tape media have even shorter lifetimes than CD and DVD. • Long term preservation is a common concern! • There is no current or anticipated hard media that meets our durability and capacity requirements. • How do you build an adequate preservation system?

  17. What should we do? For your discipline • Endorse and adopt a standard. • Don’t start over for each mission and project. • If the metadata standard in your discipline is inadequate work to improve it. • The standards keepers must be willing to work with the community and respond quickly. • Pick a minimal set of data formats • One size fits all does not work but neither does allowing all formats. • One for each type of data • Retrofit existing tools • We can’t afford to continually start over – we must evolve. • Make compliance a contractual obligation

  18. What should we do? For your discipline • Endorse and adopt a standard. • Don’t start over for each mission and project. • If the metadata standard in your discipline is inadequate work to improve it. • The standards keepers must be willing to work with the community and respond quickly. • Pick a minimal set of data formats • One size fits all does not work but neither does allowing all formats. • One for each type of data • Retrofit existing tools • We can’t afford to continually start over – we must evolve. • Make compliance a contractual obligation

More Related