250 likes | 369 Views
The Brave New World of Data Harmonization in the Age of Informatics . Naveen Ashish , UC Irvine, Dec 1 2012. Introduction. Investigators will greatly benefit from integrated datasets from multiple institutions Each institution has few subjects (tests) Need certain cohort size
E N D
The Brave New World of Data Harmonizationin the Age of Informatics Naveen Ashish, UC Irvine, Dec 1 2012
Introduction • Investigators will greatly benefit from integrated datasets from multiple institutions • Each institution has few subjects (tests) • Need certain cohort size • Why do we have heterogeneity at all ? • Illustrate through actual datasets • True alignment is a challenge • Data Harmonization approach to the problem
Examples Why is AcKcal reported in one not the other ? Similarly for SpO2 VO2 units are different
Examples HR values are in entirely different range VO2, VCO2 values in entirely different range
Domain Description • Key entities are: • Subjects • Tests SUBJECT TEST
Subject • Issues • Reported/not reported • Set of attributes • Units HEIGHT WEIGHT AGE SEX SUBJECT RACE BSA
Tests VARIABLES Time TIME Work • Multiple aspects • Time • Work • Ventilation • Cardiac VO2 WORK VCO2 RER TEST O2 RR Vt VENTILATION VE BR CARDIAC V/Q
Approach • “Standard” set of attributes • Recipe for failure • Proposed
Documented Information • Choice of variables • Why do we have this particular set of attributes ? • Reason for exclusion of excluded (attributes) • Units • State • Key points anchored to variable • Eg., when heart rate reaches a value of …. • Demographics (subjects) • Special conditions • ….
Detail on Each Parameter • Details • Definition • Synonyms • Explanation • …. Respiratory Quotient (RQ) also-known-as This value, which is also sometimes known as the Respiratory Quotient (RQ), is the ratio of oxygen consumption to CO2 production. At an RER of 0.8 fat is the primary fuel source. As exercise intensity increases more carbohydrates are burned for energy. At an RER of 1.0 the individual is burning mostly carbohydrate. It is important for a good max test that an RER of 1.1 is reached to signify a good effort by the athlete. RER
Units • Related efforts in caBIG • CDE (Common Data Elements) • UCUM • Unified Code for Units of Measure • Approved 2008 • Goal • Harmonize existing caDSR value domains • Lab and Agent Unit of Measure • Forms Curation and UML Model applications • Purpose of Standard • To capture units of measure using UCUM expressions that are associated with a qualitative laboratory outcome or agent dose administration.
Data • Above information can help in alignment UNIT RANGE SALIENT VO2 PROGRESSION CORRELATION
Issues • Range • Is the data in the “expected” range ? • Progression • Do the value change in expected fashion ? • Salient points • “..stop at heart rate of 150 and ….” • Correlation • Expected correlation with other variables
Environment ORGANIZATION LOCATION LAB TEST DEGREE TEST PROTOCOL DEVICE INSP. TEMPERATURE ENVIRONMENT EXP. TEMPERATURE PRESSURE INSP O2 INSP CO2 FLOWMETER STPD To BTPS Base O2
An “ontology” ? • Concepts • Variables • Characteristics • Relationships • Check for similarity • Outliers • Progression • ….
Modeling • Protégé (http://protege.stanford.edu) • Concepts • Attributes • Descriptions • Units • Relationships
Transformation • Information Mediator • Transformation of data (sets) • For alignment • Data driven mappings • KARMA
Scientific Variables • IICurate • Burns et al., (USC Information Sciences Institute) • System for curation and documentation of scientific variables • Focus on information integration !
Vision COMMON DATA MODEL ALIGNMENT ENGINE ALIGNED DATA DATABASE FEEDBACK LOOP Data Source Administrator