1 / 34

What Makes a Data Archive Tick: Marrying Content and User Support

What Makes a Data Archive Tick: Marrying Content and User Support . Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science

long
Download Presentation

What Makes a Data Archive Tick: Marrying Content and User Support

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Makes a Data Archive Tick: Marrying Content and User Support Steven Worley National Center for Atmospheric Research Computational and Information Systems Laboratory May 17-21, 2010 Summer Institute for Data Curation for Earth and Environmental Science Graduate School of Library and Information Science University of Illinois, Urbana-Champaign

  2. How to make and keep the archive content relevant to the users? • How to engage the users?

  3. How to make and keep the archive content relevant to the users? Know your users • Define your focus community • Cannot serve everyone • Design service not to limit others • At decision points (e.g. changes in service) ask: • “Is this a significant benefit for my users?” • The case @ NCAR • Atmospheric, oceanic, and some related geo-science research • Graduate students and higher education • NCAR scientists, researchers @ universities with graduate degree programs in meteorology and oceanography • Over 50% of 6000+ unique users, annually, are outside focus group

  4. How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Attend seminars, symposia, meetings where they present their work • Corollary: Have science educated staff • The case @ NCAR – Research Data Archive • All have MS degrees, or greater • meteorology (6) • oceanography (2) • computing science (1) • exception – admin. (1)

  5. How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Routinely review journals, bulletins, and relevant news letters • Search for science strongly dependent on your data focus • Contact authors, offer data sharing service • @ NCAR

  6. How to make and keep the archive content relevant to the users? Understand their science, currently, and trends • Develop close contacts with a few key users • Seek ‘honest’ opinions about your service • Make your service known – presentations, publications • @ NCAR

  7. How to make and keep the archive content relevant to the users? Know how your users work • How do they prefer to handle data? • Digital files – write and run program codes to evaluate content • Digital files – specific formats that are application friendly • E.g. netCDF, GIS, WMO • ASCII text convenient for worksheets • Images of analyses (charts, line graphs, 2D/3D contoured plots) • @NCAR • Digital files are key • Some images for discovery, but not critical • Design the systems to deliver what users want

  8. How to make and keep the archive content relevant to the users? Choosing the content • At decision points (e.g. adding a new dataset) ask: • “Can we handle this efficiently?” • Does it supplement or extend the central data foci? • Does it address a new need or trend? • Are the formats aligned with user preferences? • If not, can we make a cost effective conversion? • Do you have staff (data scientists / stewards) that can understand the scientific content? • @ NCAR • Atmospheric, oceanic, related geo-sciences observations or analyses derived from observations to support climate and weather research.

  9. How to make and keep the archive content relevant to the users? Choosing the content • Evaluate user metrics • What datasets are most popular? • Who is using what – can you distinguish your focus group? • Are there any trends? • Caution: this is only part of the story • @ NCAR • Our user registration allows us to track this • Examples

  10. Unique Users by service path Users in four service categories • MSS to CISL HPC environment • Web to world-wide community • Orders – one off consulting assisted data preparation • TIGGE 6 thousand users annually • FY09: MSS=266, Web=5649, Orders=196, TIGGE=44

  11. Amount of data by service path Users in four service categories • MSS to CISL HPC environment • Web to world-wide community • Orders – one off consulting assisted data preparation • TIGGE 162 TB in FY09 • FY09: MSS=31, Web=120, Orders=9, TIGGE=2

  12. User ranked popular datasets Top 10 datasets/groups FY09 ~ 6000 Unique Users Annually NCAR-CSM Symposium on Climate and Energy

  13. How to make and keep the archive content relevant to the users? Remain flexible – expect constant change • Be ready to take opportunities when they come along • Re-adjust priorities • Resist ‘tight’ mission control • Take advice from advisory groups, but don’t depend on them exclusively • Use holistic approach • @ NCAR, unplanned for example • Arctic System Reanalysis – NSF sponsored research critical to assess the changes happening in the Arctic • Need controlled access to first prototype data – We do this!

  14. How to make and keep the archive content relevant to the users? Sustaining for the long-term • Richness and data value grow over time • Data assets tend to compliment each other – add value to many different research questions • Scientific publications lead to broader and increased interest • Definitive data citation is a work in progress • Staffing needs to be base/core funded • Grant directed funding can lead to a fractured, ad hoc, incomplete archive • Can be a major frustration for users • @ NCAR – the Research Data Archive • Began 40+ years ago • Today sustained by 9 persons

  15. How to make and keep the archive content relevant to the users? Collaborations • Participate/volunteer for committees and panels that tackle data issues (all sorts) • Learn from others, share knowledge • Share efforts and data with other organizations • No one group can do it all (don’t have resources and all expertise required) • @ NCAR (conf. like SIDC for EES) • Volunteerism: NAS, AMS, NOAA, WMO, NASA • National and International data agreements with: • European Centre for Medium Range Forecasting • Japanese Meteorological Administration • U.S. National Weather Service, National Center for Environmental Prediction

  16. How to Engage the Users? Data Discovery – how can people find you? • All 600+ RDA Datasets have metadata in GCMD • Automatically, exported via OAI – PMH • Similarly: RDA > CDP@NCAR > BADC in UK

  17. How to Engage the Users? Design your portal to evolve – it will/should • 2002 • Search • Navigation • List of menus • Unique layout of links • Picture of people

  18. How to Engage the Users? 2008 • Search • Two ways • Navigation • Links • News • Text • People

  19. How to Engage the Users?

  20. How to Engage the Users? • Primary design feature for web portal • Data Discovery – Find Data! • 2010 • All about search • Gone from top • people • text • news NCAR-CSM Symposium on Climate and Energy

  21. How to Engage the Users? Navigation once they arrive • Working principles • Uniform across web portal • Keep organizational elements out of prime visual territory • @ NCAR • Have user registration – only required to get data • All discovery metadata open – unlimited searching

  22. How to Engage the Users? The complete data knowledge package, and data cycle • What is a complete data knowledge package? • Rich metadata plus the data files! • One example • http://dss.ucar.edu/datasets/ds277.0/

  23. How to Engage the Users? The pieces that make rich metadata • Dataset navigation (Access, Documentation, Software) • Title • Summary

  24. How to Engage the Users? The pieces that make rich metadata • Period of data record • Update cycle • Scientific parameters (Variables) • Earth reference levels

  25. How to Engage the Users? The pieces that make rich metadata • Times – temporal increment • Data types – points or grids • Geo-spatial coverage • Source organizations

  26. How to Engage the Users? The pieces that make rich metadata • Related Internet sites • Publications • Acknowledgement statement

  27. How to Engage the Users? The pieces that make rich metadata • Volume – size of the dataset • Data formats • Related datasets in the NCAR collection • Consulting contact (email and phone) • A 2nd pointer to Data Access

  28. How to Engage the Users? The complete data knowledge package, and data cycle Data Cycle Facts • Datasets are re-published – new versions. • Datasets are corrected and extended in time or space. • Scientific analysis and publication will occur randomly along the data cycle. Data referencing is more challenging than traditional publication referencing because of the data cycle. How can you accurately trace/recover what has been used for publication?

  29. How to Engage the Users? The complete data knowledge package, and data cycle • @ NCAR • Don’t have systematic (organization-wide) way to handle the data cycle • We do not discard/delete old versions of data • Ad hoc approach • Currently, building a version tracking software • Versioning will be included in DOI implementation

  30. How to Engage the Users? Consultation Critical two-way communication • 1. Benefits for the user • Guidance to best available datasets • Consolidate research ideas into required data sources • Software assistance • Customized data preparation if necessary • 2. Benefits to the archive stewardship • Detect ways to improve our search process • Learn about data requirement trends • Occasionally, acquire new data resources from scientific efforts • Learn about data problems we might have

  31. How to Engage the Users? Provide research tool support and documentation • Provide users a starting point for data evaluation • Simple access programs – the languages used by the focus community • Pointers to applications (IDL, MatLab, NCL, NCO, etc.) • Specific example are VERY helpful! • Must maintain software/applications and documentation for the long-term. • Guarantee users will understand the meaning and have access.

  32. How to Engage the Users? Provide research tool support and documentation • @ NCAR • Remain aware of proprietary software taps, • E.g. for documents • will .xls be viable 50 years from now - .xlsx is now standard? Is .pdf any better? • Prefer data file formats that define everything to the byte/bit level • Computer code could always be written to access these. • All kinds of reports, project descriptions, and documents that explain the intent of the data are vital for the long-term. • Use dedicated document directories for each datasets

  33. How to Engage the Users? Follow-up aid • Notification service for significant dataset changes • If an error is corrected – should notify all users of the data • Subscription service • Inform users when new data is available • Prepare special products based on user determined template – e.g. past requests • @ NCAR • We have automated notification service • Provided users register accurately • We do not have subscription service - yet

  34. How to make and keep the archive content relevant to the users? • How to engage the users? http://dss.ucar.edu/

More Related