1 / 5

Measurement Data Archive GEC10 March 2011

Measurement Data Archive GEC10 March 2011 Larry Lannom Corporation for National Research Initiatives http://www.cnri.reston.va.us /. Why Archive Experimental Results?. The obvious: for use by others or by yourself in the future The Fourth Paradigm Data-intensive science Emergent phenomena

myron
Download Presentation

Measurement Data Archive GEC10 March 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measurement Data Archive GEC10March 2011 Larry LannomCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/

  2. Why Archive Experimental Results? • The obvious: for use by others or by yourself in the future • The Fourth Paradigm • Data-intensive science • Emergent phenomena • Funding bodies increasingly asking for data plans • Citations from journal articles to data sets on the rise • Consistent archiving standards enhance the use of data over time and within a domain

  3. What is Metadata and Why Do I Need It? • Lots of miscommunication because • Metadata is not a type of data • Metadata isa type of relationship between two pieces of data • Neededfor Understanding and Finding • Understanding (sometimes called Descriptive MD) • How do I parse this? • How do I interpret this? • Finding (sometimes called Subject MD) • Finding one item in a population of 10 is easy • Finding one item in a population of 1M is impossible w/o some some way to distinguish them • Generally requires a human in the loop at some level • Sometimes the object is self-describing (journal article) • Automatic indexing/classification works for some domains

  4. Why is Metadata Hard? • To be effective it must be consistent, and consistently applied, within a given domain • What is the scope of the domain? • What aspects of the object need to be described? • What is the vocabulary, is it open or closed? • Even within a defined domain, there are many points of view • Especially true for any sort of subject description • May have to allow for multiple metadata objects for a single described object • Spending time on creating good metadata is Good For You • The best sources for good metadata are the creators/owners of the described object, but they may lack interest and training • Some types of metadata are difficult to automate, e.g., good title • Keep it simple – trade consistency and coverage for depth

  5. Misc Points • Precision and Recall useful concepts in searching • Precision: % of search results are on target • Recall: % of the correct result set did my search retrieve • Desirable tradeoff is situational • Consider University Libraries as reliable archive holders • Variety of approaches to managing a useful vocabulary of terms • Controlled vocabulary: set of terms – use these instead of slight variations • Taxonomy: parent-child relationships • Ontologies: introduce other types of relationships

More Related