200 likes | 211 Views
Explore quantification methods for metadata cost and value over time. Focus on self-generated health information and its impact on prediction accuracy. Consider the cost/value relationship and limitations of metadata.
E N D
Metadata Capital: Simulating the Predictive Value of Self-Generated Health Information (SGHI) IEEE Big Data 2014 Jane Greenberg, Adrian Ogletree CCI/Drexel University, Metadata Research Center Angela P. Murillo, Thomas P. Caruso, Herbie Huang, University of North Carolina at Chapel Hill
Metadata Capital Metadata duplication is inefficient, tedious • An economic concept (Weber, 1905; Smith’s, 1776) • Business and operations (net gains or losses) • Finances, goods and services, and public needs • Intellectual capital (Marr, 2005) • Social capital a tangible result, value can increase, or… • Metadata as an asset, a product • Reuse of good quality metadata increase value of initial investment Goals: Discover and advance application of methods for quantifying the cost and value of metadata over time; raise dialog … M.C – incremental Successive growth rates
Modified Capital-sigma notation Cost / value Reuse
Modified Capital-sigma notation Cost / value Robust metadata reuse a1toa24 Reuse of metadata
What about successive growth rate tied to a concept? A concept can be • in ~ vernacular to canonical • fall by the wayside, less popular • out (deprecated)
The Metadata Capital Initiative ~MetaDataCAPT’L~ • Explore methods for quantifying metadata cost and value over time. Metadata capital targets metadata as an asset containing contextual knowledge about data content Environments • Ontology development in collaboration with the National Institute for Environmental Health Sciences (NIEHS). • Self-generated health information (SGHI) monitoring daily activity in collaboration with the Research Triangle Institute (RTI).
The Team Discover and advance the application of methods for quantifying the cost and value of metadata over time; raise dialog Advance nascent work on “metadata capital” for data science Actively engage with the NCDS community 3. Connect NCDS metadata efforts w/the Research Data Alliance
Self Generated Health Information (SGHI) “SGHI or Self-Generated Health Information is information generated by mobile health (mHealth apps), wearables and smartwatches. Quantified Self efforts tend to provide examples of the use of this information.” (T. Caruso)
Data facets Validic API Category “simple, standardized connection between healthcare companies and mobile health...” Vendors/brands • BodyMedia • DailyMil • FatSecret • Fitbit • Fitbug • Fleetly • Garmin • Glooko • iHealth • Jawbone Up • ManageBGL • Fitness • Routine • Nutrition • Sleep • Weight • Diabetes • Biometrics • Tobacco Cessation • MapMyFitness • Moveable • MovesApp • MyGlucoHealth • Nike+ • Omron • RunKeeper • Strava • VitaDock • Withings
Total Fields Referenced (FitBit), toward SGHIx • X Availale: 39 • P (Pending): 3 • NA (not available): 42 (Caruso & Ogletree) metadata
Conclusion…other Valuation Approaches • Market cap of Facebook per user: $40 – $300 • Revenues per record per user: $4 – $7 per year • Facebook • Experian • Market prices of personal data: • $0.50 for street address • $2.00 for date of birth • $8 for social security number • $3 for driver’s license number • $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.
In the Fitbit data scenario, if a patient’s exercise data and environmental quality data can be combined with asthma condition data, we will get a better prediction of the way in which asthma evolves. Prediction error in the model when including the Fitbit data • h
Limitations • Modified capital-sigma is only one dimensional; all metadata properties/concept are not equal • Also, we know cost/value relationship is not 1:1. • Metadata is only as good as your data • not always true • What about successive growth rate may be the way to go
Concluding remarks • Interest….traction • Limitations: bad data, cost/value, more metadata • We should care about cost • Metadata capital can contextualize the discussion, provide a foundation • Generic formula for further research • Proof
The Team / acknowledgments • Tom Caruso, Health Information Liaison Research Associate, UNC-SILS/RTI • Self-generated Health Information (SGHI) • Rebecca Boyles, Data Scientist, NIEHS • Common Core Vocabulary • Jane Greenberg, SILS/UNC, MRC • Herbie Huang, Ph.D. student, Economics Dep. UNC • Austin Mathews, BSIS student, SILS/UNC, MRC • Angela Murillo, Ph.D. student, ,SILS/UNC, MRC • Adrian Ogletree, MSIS student, SILS/UNC, MRC • Erik Scott, Senior Software Dev./RENCI (Renaissance Computing Institute)