50 likes | 180 Views
Measurement Data Archive GEC10 March 2011 Larry Lannom Corporation for National Research Initiatives http://www.cnri.reston.va.us /. Why Archive Experimental Results?. The obvious: for use by others or by yourself in the future The Fourth Paradigm Data-intensive science Emergent phenomena
E N D
Measurement Data Archive GEC10March 2011 Larry LannomCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/
Why Archive Experimental Results? • The obvious: for use by others or by yourself in the future • The Fourth Paradigm • Data-intensive science • Emergent phenomena • Funding bodies increasingly asking for data plans • Citations from journal articles to data sets on the rise • Consistent archiving standards enhance the use of data over time and within a domain
What is Metadata and Why Do I Need It? • Lots of miscommunication because • Metadata is not a type of data • Metadata isa type of relationship between two pieces of data • Neededfor Understanding and Finding • Understanding (sometimes called Descriptive MD) • How do I parse this? • How do I interpret this? • Finding (sometimes called Subject MD) • Finding one item in a population of 10 is easy • Finding one item in a population of 1M is impossible w/o some some way to distinguish them • Generally requires a human in the loop at some level • Sometimes the object is self-describing (journal article) • Automatic indexing/classification works for some domains
Why is Metadata Hard? • To be effective it must be consistent, and consistently applied, within a given domain • What is the scope of the domain? • What aspects of the object need to be described? • What is the vocabulary, is it open or closed? • Even within a defined domain, there are many points of view • Especially true for any sort of subject description • May have to allow for multiple metadata objects for a single described object • Spending time on creating good metadata is Good For You • The best sources for good metadata are the creators/owners of the described object, but they may lack interest and training • Some types of metadata are difficult to automate, e.g., good title • Keep it simple – trade consistency and coverage for depth
Misc Points • Precision and Recall useful concepts in searching • Precision: % of search results are on target • Recall: % of the correct result set did my search retrieve • Desirable tradeoff is situational • Consider University Libraries as reliable archive holders • Variety of approaches to managing a useful vocabulary of terms • Controlled vocabulary: set of terms – use these instead of slight variations • Taxonomy: parent-child relationships • Ontologies: introduce other types of relationships