1 / 59

Regional Databases and Archives: the Effects of Scale…

Regional Databases and Archives: the Effects of Scale…. A Presentation for “Scalable Information Networks for the Environment Workshop” October 31, 2001 San Diego, California Raymond McCord Oak Ridge National Laboratory*

danica
Download Presentation

Regional Databases and Archives: the Effects of Scale…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regional Databases and Archives:the Effects of Scale… A Presentation for “Scalable Information Networks for the Environment Workshop” October 31, 2001 San Diego, California Raymond McCord Oak Ridge National Laboratory* *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

  2. Credits • Concepts are derived from managing data for environmental projects over the past 25 years. • Variations of the concepts have been observed from these disciplines. • plant community research • impact assessment in marine systems • national acid rain surveys • Environmental monitoring and cleanup projects at DOE facilities • Military land use assessment • Climate change research (atmospheric research) • Ideas are freely traded with Dick Olson (ORNL)

  3. Presentation Strategy • Motivation and concerns • Archive overview • Definition, components, functions, why & why not, examples • Archives and scale • Effects of scale • Mitigate scale effects • Generate and manage metadata • Future: Archive issues to resolve

  4. My Motivation & Concerns The enemy is our behavior. Will we change or whine??? • Motivation • Describe observations about the effects of scale on Archives • Describe remedies to minimize scale effects • Minimize remedy pain • Concerns • Preaching to the choir!! • Nothing new will happen!! • Continuing unnecessary limits to future science!!

  5. You can’t keep running in here and demanding data every two years Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving. Source: American Scientist,Vol 886 p 525.

  6. Archives and Scale: Presumptions • Regional data live in Archives • Information sharing is important • The archiving can be improved • Archive “neurons” are metadata • Multidisciplinary data will foster broader ecological discoveries • The limited number of permanent data archives for ecological data will increase

  7. What Is an Archive?

  8. What Is a Data Archive? • A data archive is a permanent, electronic collection of datasets with accompanying metadata such that users of the data can acquire, understand, and use the data. • More than a long-term backup • More than an index or catalog with pointers to datasets stored elsewhere • For more details, see Michener, W. A. and J. W. Brunt. 2000. Ecological Data: Design, Management and Processing. Blackwell Science. 180 pp.

  9. Components of an Archive • Data and metadata • Storage devices • Information system • Network connections • Staff • Data/metadata preparation and review • Systems development and maintenance • User support

  10. Archive Functions • Store data • Submitted by others • Build catalog and structure • Maintain storage across technology generations • Review new data (QA, metadata) • “Advertise” contents • Find data for users • Query and browse logic • Distribute data • Provide access to data • References to documentation

  11. Data Centers at ORNL • CDIAC - Carbon Dioxide Information Analysis Center • ARM Archive - Atmospheric Radiation Measurement Program • ORNL DAAC - Distributed Active Archive Center for Biogeochemical Dynamics • NARSTO - tropospheric air pollution information for North America • OREIS - Oak Ridge Environmental Information System

  12. Atmospheric Radiation Measurement (ARM) Program • ARM research questions: • What happens to all of the sunlight energy? • How is light absorbed by clouds? • What does partly cloudy mean? Statistically? Spatially? • What types of clouds form? When and How? • ARM is a ‘once in a lifetime’ research adventure for atmospheric scientists • ARM research includes instrumentation, system development, data analysis, and modeling (climate and process)

  13. ARM Measurements Scope All data collection is highly automated -- a REAL BLAST!! Data collection is now a peer outcome with scientific discovery

  14. ARM Archive • ARM Archive stores and provides access to the entire accumulation of data • Currently 5 million files and 14,000 GB and growing • The ARM data in the Archive will be accessed for research for many years (decades) • Currently distributes 50-100,000 files per month (100-200 GB) • More information: • ARM Program www.arm.gov • ARM Archive www.archive.arm.gov

  15. Archive webUser Interface ARM Archive Schematic“Archive Input & Output” user copy Requestedfiles query specifications location DataRetrieval measurement date catalogmeta data filelist IncomingData Files DataReception Other ARM Systems MassStorage System backupdata files operationsmeta data

  16. Data and Metadata Submission Data/ Metadata Ingest Backup, Security, Migration Archive Development and Maintenance User Support Request pathways User Request Archive support User interactions Data Flow Data Metadata User Interface Network Core archive functions

  17. Why Archive?? “I am doing Science. Trust me.”

  18. Cycles of Research“An Information View” Archive of Data Publications Automation and review Selection and extraction Analysis and modeling Information review Measurement Collection Original Observations Secondary Observations 200 yrs 20 yrs Planning Planning Problem Definition (Research Objectives)

  19. Why Don’t I Archive My Data? • No incentives - what’s in it for me? • No acknowledgment - does a dataset = paper? • Give up publication rights - will somebody scoop me? • Poor planning - it was not in “the Plan” • No resources - who’s going to pay for it? • Lack of training - what do I do first? • Unsure about metadata content - how much is enough?

  20. Why Should I Archive My Data?(management hints!!) • Career advancement (give them credit) • you will get some recognition • you can publish data paper in ESA Ecological Archives • it may help me do science with broader scope • Professional incentives (give them training) • good scientific practice (create peer pressure) • Institutional incentives (have expectations) • required by the sponsor • Technological advances (give them systems) • its easier and there are more options

  21. Archiving Supports Science • Metadata required for archiving will improve data quality • Extends data usefulness • Increases your information base for doing research: • data volume and diversity • Permits replication of results A KEY concept of Science

  22. The Effects of Project Scale on Archives “Metadata are archive neurons??”

  23. Metadata Depends on Your “World View” • Investigator • Doesn’t need extensive formal metadata • Project • Metadata needed for project integration and modeling activities • Project data manager may help write metadata • Data archive • More detailed metadata (e.g., spatial coordinates) • More standardization (e.g., keywords) to communicate clearly with future users • Who writes the metadata?

  24. (In the beginning, was the measurement. It was formless and desolate. Without context…) Measurement

  25. Single Experiment View parameter name Measurement sample ID location date

  26. Research Project View parameter name media QA flag Measurement sample ID location date

  27. Long-term or Multidisciplinary View method parameter name Units media QA flag Measurement records generator sample ID location date

  28. Integrated System & Archive View words, words units method Parameter def. lab field Method def. method Units def. parameter name Units media date words, words. QA def. Record system QA flag Measurement records generator sample ID location date GIS org.type name custodian address, etc. coord. elev. type depth Sample def. type date location generator

  29. Another View of Scale

  30. Increasing User Scope Project Scale and Recorded Metadata Metadata PI Group Program Archive • Units • Method • QA flag • Media • Parameter name • Measurement • Date • Sample ID • Location • Generator • Records

  31. Data Maturation and Scale • Individual Investigators • collect data, quality assure, document, analyze, publish • Groups or Science Teams • collate data, enhance, synthesize, model, publish • Project Information System • collate data, review completeness, maintain data for project • Data Distribution and Archive Center • long-term archive, distribute freely to users • Master Data Directory • searchable index with pointers to data

  32. I will not wait. I will not wait. I will not wait. I will not … Preparing for Archiving

  33. Generic Environmental Data Model(Which Piece Is First…?) words, words units method Parameter def. lab field Method def. method Units def. parameter name Units media date words, words. QA def. Record system QA flag Measurement records generator sample ID location date GIS org.type name custodian address, etc. coord. elev. type depth Sample def. type date location generator

  34. Sequence of Information Birth words, words units method Parameter def. lab field Method def. method Units def. parameter name Units media date words, words. QA def. Record system QA flag Measurement records generator sample ID location date GIS org.type name custodian address, etc. coord. elev. type depth Sample def. type date location generator

  35. Research ~ Publishing ~ Metadata • Metadata design can be a “checklist” for research planning • Metadata preparation can be integrated with publication process • Metadata are an investment in current and future science

  36. Where to Archive Data?

  37. Archive Choices • What determines your options? • Sponsor requirements • Repository access • Metadata requirements • Scalable storage • Personal web pages and files • Project or network data centers • Federal data centers • Links “transcend” storage structures • Master directory • Mercury

  38. Personal Web Page • Its fun, rewarding, relatively easy, can share data quickly, can control access to data • Data issues?? • complete metadata • QA checks • Connected to basic archival center functions?? • ready access to data (24 h/d, 7 d/wk) • user support • data available on multiple media • secure, backed-up, long-term storage

  39. ESA Ecological Archives • Publishing datasets as peer reviewed, citable papers (with volume and page numbers) • Data papers are announced in abstract form in a print journal with data available electronically • Citation example • Esser, G., H.F.H. Lieth, J.M.O. Scurlock and R.J. Olson. 2000. Osnabrück net primary productivity data set. (Ecological Archives data paper E081-011). Ecology 81, 1177-1177. • Bill Michener, Editor • http://esa.sdsc.edu/esapubs/Journals_main.htm

  40. Master Data Directory • Provides search capability and pointers to a source of the data (Center does not archive data) • Maintains standard keywords/indices • Collects metadata from many sources • Examples • Global Change Master Directory (GCMD) http://gcmd.gsfc.nasa.gov • ORNL DAACMercury System http://mercury.ornl.gov

  41. Data and documentation User What is Mercury? 1. The data provider uses the Metadata Editor to create a metadata file containing links to the data and documentation NASA / ORNL Metadata Index 2. Mercury harvests the metadata and builds an index Mercury is used to assist an investigator with documenting data and making these data available to others. 5. User links to data provider’s server 6. Data and documentation are downloaded directly from the data provider 3. Users query the index 4. Full metadata are returned to the user, including links back to the data provider

  42. Regional Archives

  43. Sources of Regional Data • Carbon Dioxide Information Analysis Center • National Geophysical Data Center • National Environmental Satellite, Data, and Information Service • National Soils Data Access Facility • National Water Information System • Forest Inventory and Analysis • Breeding Bird Survey • Threatened and Endangered Species • Global Change Master Directory

  44. GSFC EDC SEDAC Upper Atmosphere, Global Biosphere, and Geophysics U. Colorado Land Processes Socio-economic JPL Cryosphere Ocean Circulation And Air-sea Interaction U. Alaska Sea Ice and Polar Processes LaRC Atmospheric Processes ORNL Biogeochemical Dynamics NASA EOSDIS Distributed Active Archive Centers

  45. Precipitation Topography Soil Carbon Cloud Amount II Clear-Sky Albedo LW Radiation Fossil Fuel Emissions Vegetation Biophysics (fPAR) Global scale, 280 parameters: surface, atmospheric, fluxes

  46. Future: Issues to Resolve • Size, diversity, and longevity • Accommodating change • Teaching good practices

  47. Issues: Size, Diversity, Longevity • Size • Online vs. Offline • Database vs. File structure • Multiple institutions • Too big for technology migration?? • Diversity • Increased logic and documentation for “finding data” • Spatial distribution • Increased potential for uniqueness conflicts • Longevity • Too old to explain or decode • Too much evolution of methods and practices • Asynchronous change in data and metadata

  48. Issues: Planning and Requirements • Plan for archiving early and ongoing • Avoids missing metadata • Avoids panic • Improves overall data quality and consistency • Consider the timing of requirements • Requirements • Standards: “to be or not to be?” • Documentation expectations • Accessibility “Its mine!! Its my data!! You CAN’T have it!!”

  49. Research Implies Change … Research Not always true for other information systems repeat… Discovery New information requirements New questions

  50. Issues: Accommodating Change • Change must be considered in the design • Things that will change • Access expectations • Logical hierarchy of information scope • New parameters • New disciplines • New study sites • New data sources or methods

More Related