380 likes | 391 Views
Learn why metadata is essential for data scientists, with examples of metadata standards and tips on writing quality metadata records. Explore how metadata aids in open science, enhances data discovery and evaluation, and ensures the long-term preservation of data. Discover the value of metadata for data developers, users, and organizations.
E N D
Science Metadata Viv Hutchison US Geological Survey Core Science Analytics Synthesis & Libraries Denver, CO vhutchison@usgs.gov 5th NACP Principal Investigator’s Meeting Washington, DC January 25, 2015
Presenter: Viv Hutchison • US Geological Survey • Core Science Analytics Synthesis & Libraries program • Branch Chief, Science Data Management • Lead a team that works on application of the science data lifecycle for USGS scientists through best practices, tools, training • vhutchison@usgs.gov ORNL, Oak Ridge, TN
Topics • Why metadata? • Examples of metadata standards and how to choose one to use • Tips on how to write quality metadata records • Publishing metadata CC image by Alec Couros on Flickr
Data Collection CC image by Justin See on Flickr CC image by CIMMYT on Flickr CC image by SEDAC on Flickr CC image by acordova on Flickr CC image by ISAS on Flickr CC image by kukkurovaca on Flickr
From Field Notes to Datasets Average Temperature of Observation for Each Species
From Datasets to Published Papers CC image by Heather Kennedy on Flickr
Metadata is a critical part of the data picture CC image by I like on Flickr
Why Care About Metadata? • Fourth Paradigm: scientific breakthroughs will increasingly be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. • “Metadata must be preserved when scientific data is generated…” -- Jim Gray, The Fourth Paradigm • Further the time/space distance between data producer and re-use, the more detailed metadata that is required.
Metadata: Why Care? “Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it. Several times, I've seen colleagues called into court in order to testify about conditions they have observed. Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like to back up their testimony. It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.” Nelson Williams Eastern Region USGS Water
Metadata: Why Care? Senior climatologists were accused of manipulating important global temperature data The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work. Investigations emphasized need for data to be more open to ensure credibility and avoid future misguided controversy Metadata aids in open science
Metadata: Why Care? “Planet hidden in Hubble archives”Science News (Feb. 27, 2009) A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters “The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble. …Metadata is critical in maintaining data in archives – for understanding data you discover
Data developers Data users Metadata helps… Organizations The Value of Metadata
What is the Value to Data Developers? • Metadata allows data developers to: • Avoid data duplication • Share reliable information • Publicize efforts – promote the work of a scientist and his/her contributions to a field of study • Reduce Workload CC image by US Embassy Guyana on Flickr
What is the Value to Data Users? • Metadata gives a user the ability to: • Search, retrieve, and evaluate data set information from both inside and outside an organization • Find data: Determine what data exists for a geographic location and/or topic • Determine applicability: Decide if a data set meets a particular need • Discover how to acquire the dataset you identified; process and use the dataset CC image by ASEE on Flickr
What is the Value to Organizations? • Metadata helps ensure an organization’s investment in data: • Documentation of data processing steps, quality control, definitions, data uses, and restrictions • Ability to use data after initial intended purpose • Transcends people and time: • Offers data permanence • Creates institutional memory • Advertises an organization’s research: • Creates possible new partnerships and collaborations through data sharing CC image by mambol on Flickr
When data isn’t well managed… Time of publication Specific details General details Retirement or career change InformationContent Accident Death Time (Michener et al. 1997)
Memory Check i checked my 2002 email archives, and here is what i found out: it appears that the current 3rd generation algorithm was implemented into operations around Oct-Nov 2002 time frame. cannot say more precisely, as all email correspondence i am looking at, talks about this indirectly. (maybe it's what's refered to as the Phase II algorithm.) At the same time, we had implemented quite a few other changes fixing data bugs and formats: view angle problem, increased digitization in all channel's reflectances and AODs, etc. The jump is deemed due to introducing 3rd generation algorithm, which replaced the 2nd generation. The new numbers (~0.08) look more realistic than the previous ones (~0.05 or so). The changes seen in the data is close to the expected effect of this change. The 3rd gen alg takes into account the exact spectral response, whereas the 2nd gen is generic ("one size fits all"). hopefully this settles the issue.. Why? 50% change in global average
Information Entropy Sound information management, including metadata development, can arrest the loss of dataset detail. DATA DETAILS TIME
Still…There are Occasional Concerns About Creating Metadata Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data. CC image by waterlilysage on Flickr
Choosing a Metadata Standard Many standards collect similar information…factors to consider: Your data type: • Are you working mainly with GIS data? Rastor/vector or point data? Do you have biological or shoreline information in your dataset? - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile. • Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling? • If so, then consider using the ISO 19115-2 standard • Are you mainly working with ecological data? • Consider Ecological Metadata Language (EML)
Choosing a Metadata Standard • Your organization’s policies: do they state which standard to use? • What tools are available to create metadata? Examples of Tools: FGDC CSDGM: • Mermaid (NOAA) • Metavist (Forest Service) -- Online Metadata Editor (USGS) EML: • - Morpho (KNB)ISO: -- XML Spy or Oxygen --- CatMD Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats
Steps to Create Quality Metadata • Organize your information • Did you write a project abstract to obtain funding for your proposal? Re-use it in your metadata! • Did you use a lab notebook or other notes during the data development process that define measurements and other parameters? • Do you have the contact information for colleagues you worked with? • What about citations for other data sources you used in your project? CC image by on Google Images
Steps to Create Quality Metadata • Write your metadata using a metadata tool • Submitting to the DAAC? A metadata creation process in in place for you..
Steps to Create Quality Metadata • Review for accuracy and completeness • Have someone else read your record • Revise the record, based on comments from your reviewer • Review once more before you publish CC image by Shelly Munkberg on Flickr CC image by mujalifah on Flickr
Tips for Writing Quality Metadata • Do not use jargon -- define technical terms and acronyms: • CA, LA, GPS, GIS : what do these mean? • Clearly state data limitations • E.g., data set omissions, completeness of data • Express considerations for appropriate re-use of the data • Use “none” or “unknown” meaningfully • None usually means that you knew about data and nothing existed (e.g., a “0” cubic feet per second discharge value) • Unknown means that you don’t know whether that data existed or not (e.g., a null value) CC image by kruuscht on Flickr
Tips for Writing Quality Metadata Titles, Titles, Titles… • Titles are critical in helping readers find your data • While individuals are searching for the most appropriate data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs. • Treat the title as the opportunity to sell your dataset. • A complete title includes: What, Where, When, Who, and Scale • An informative title includes: topic, timeliness of the data, specific information about place and geography
Tips for Writing Quality Metadata • A Clear Choice: Which title is better? • Rivers OR • Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983) Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when) CC image by dolfi on Flickr
Tips for Writing Quality Metadata • Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner. Vague: We checked our work and it looks complete. Specific: We checked our work using a random sample of 5 monitoring sites reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections. CC image by PNASH on Flickr
Tips for Writing Quality Metadata • Use descriptive and clear writing • Fully qualify geographic locations • Select keywords wisely - use thesauri for keywords whenever possible Example: USGS Biocomplexity Thesaurus (over 9,500 terms) CC image by Marco Arment on Flickr
Tips for Writing Quality Metadata • Remember: a computer will read your metadata • Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / \ < > ~ • Do not use tabs, indents, or line feeds/carriage returns • When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters CC image by Ben on Google Images
Tips for Writing Quality Metadata • Fully define entities, attributes, units of measure • Ignore temptation to only fill in mandatory fields in the standard -- skipping sections of metadata standard labeled “mandatory if applicable” or “optional” are often critical portions of the standard • Example: Seven Major Metadata Sections: Section 1 - Identification Information* Section 2 - Data Quality Information Section 3 - Spatial Data Information Section 4 - Spatial Reference Information Section 5 - Entity and Attribute Information Section 6 - Distribution Information Section 7 - Metadata Information* Three Supporting Sections: Section 8 - Citation Information* Section 9 - Time Period Information* Section 10 - Contact Information* * Minimum required metadata
Share Your Metadata: Distribution • Share your metadata with other researchers Examples of metadata search portals: • DAAC • Distributed Active Archive for Biogeochemical Dynamics http://daac.ornl.gov/index.shtml • Data.gov • Federal e-gov geospatial data portal http://www.geo.data.gov • Metacat • Repository for data and metadata http://knb.ecoinformatics.org/index.jsp • DataONE • NSF-funded data infrastructure http://dataone.org
Summary • Metadata is documentation of data • A metadata record captures critical information about the content of a dataset • Metadata allows data to be discovered, accessed, and re-used • A metadata standard provides structure and consistency to data documentation • Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources • Metadata is of critical importance to data developers, data users, and organizations • Writing quality metadata is important because records are expected to last with the data over decades • Metadata completes a dataset. Creating robust metadata is in your OWN best interest!