E N D
About OMICS Group OMICS Group International is an amalgamation of Open Access publications and worldwide international science conferences and events. Established in the year 2007 with the sole aim of making the information on Sciences and technology ‘Open Access’, OMICS Group publishes 400 online open access scholarly journals in all aspects of Science, Engineering, Management and Technology journals. OMICS Group has been instrumental in taking the knowledge on Science & technology to the doorsteps of ordinary men and women. Research Scholars, Students, Libraries, Educational Institutions, Research centers and the industry are main stakeholders that benefitted greatly from this knowledge dissemination. OMICS Group also organizes 300 International conferences annually across the globe, where knowledge transfer takes place through debates, round table discussions, poster presentations, workshops, symposia and exhibitions.
About OMICS Group Conferences OMICS Group International is a pioneer and leading science event organizer, which publishes around 400 open access journals and conducts over 300 Medical, Clinical, Engineering, Life Sciences, Pharma scientific conferences all over the globe annually with the support of more than 1000 scientific associations and 30,000 editorial board members and 3.5 million followers to its credit. OMICS Group has organized 500 conferences, workshops and national symposiums across the major cities including San Francisco, Las Vegas, San Antonio, Omaha, Orlando, Raleigh, Santa Clara, Chicago, Philadelphia, Baltimore, United Kingdom, Valencia, Dubai, Beijing, Hyderabad, Bengaluru and Mumbai.
Harmonization of multiple observational studies:Non-identical ordered scales October 2014 Edwin R. van den Heuvel Professor of Statistics e.r.v.d.heuvel@tue.nl Department of Mathematics and Computer Science Eindhoven University of Technology
Content • Introduction • Content Equivalence • Motivating Example • Principles • General Model • Analysis Motivating Example
Introduction • Epidemiology is concerned with finding associations between risk factors and health outcomes • For some diseases or some risk factors large cohort studies are required • Such research can only be performed by combining several already existing population studies, e.g. • LifeLines in the Netherlands • HUNT in Norway • CORA in Germany
Introduction • Multiple cohort studies may not measure the same characteristic with the same instruments • Physical Activity • Memory • Quality of Life • Gene expression • Income • Combining individual participants data is a therefore challenge, variables can not just simply pooled
Introduction • Example Smoking • Study 1: Have you ever smoked (Yes/No)? • Study 2: Have you smoked the last year (Yes/No)? • Examples Income • Study 1: What is your gross income per months? • Study 2: Indicate your yearly salary category: <15000, [15000-30000], ….., ≥100000 • Variables from different studies • May not contain the exact same information • Require some form of manipulation to make them equivalent
Content Equivalence Retrospective harmonization: • Core variables are • The primary units of interest used in a statistical analysis • Depends strongly on the research question • A study can be viewed harmonized if the assessment items in that study can be used to generate a ‘valid’ equivalent to the required core variable • The goal is to achieve comparability or content equivalence • Heterogeneity between studies that would be caused by groups of individuals should not be eliminated
Content Equivalence DataShaper: • Provides an overview and algorithms for core variables that can be harmonized from different studies (http://www.datashaper.org/) • Example smoking: • The core variable could be “smoked last year” • Study 2 provide direct information • Study 1 requires an additional item or question: if and when individuals stopped smoking • Example income • Income variable in study 1 will most likely be reduced to a categorical variable that matches the categories from the income variable in study 2
Content Equivalence • Complex constructs can not be harmonized with the deterministic algorithms • Cognition (e.g. memory) • Physical activity • Etc. • Complex constructs are typically formulated from items of questionnaires: ordered scales • Currently, pooling individual data from different studies is performed on the basis of linear scaling methods: • Z-scores: scales are standardized within studies with the same mean and standard deviations • C-scores: scales are standardized within studies with respect to a reference or control group available in all studies
Content Equivalence • There is only limited statistical literature on harmonizing constructs or scales • Two publications provided a latent variable model for combining the items from questionnaires • Van Buurenet al (2005) • Bauer and Hussong (2009) • They use bridge items to connect the different studies with each other • They essentially assume that the observed items provide adequate information on the same construct • The current literature does not describe concepts and criteria for harmonization of complex constructs
Motivating Example • Three Canadian studies: • CCHS: Canadian Community Health Survey 10,263 older adults (≥ 65 years) across Canada • CSHA: Canadian Study of Healthy Aging 130,000 participants (≥ 45 years) from 10 Canadian provinces • NuAge: Quebec Longitudinal Study on Nutrition and Aging 1,793 older adults (68-82 years) living in Quebec • Measurements of memory on individuals • Health Utility Index (HUI): an indirect measure of memory • Rey Auditory Verbal Learning Test (RAVLT): short-term memory • Buschke Cued Recall Procedure (BCRP): free and cued memory • Memory was only observed on subsets of participants in the Canadian studies
Motivating Example • Rey Auditory Verbal Learning Test • Participants are read a list of 15-item words • They should listen carefully and are then asked to recall them • The number of correct words are counted • BUSCHKE Cued Recall Procedure: • Participants are shown a picture sheet with four images of objects belonging to four different categories • Participants need to point out and name the objects • The sheet is taken away and the examiner ensures that the words are properly encoded • After a distraction activity the participants are asked to recall the words: both free recall and cued recall • CSHA used 12 words and NuAge used 16 words • Can we harmonize memory from the three studies?
Principles Length measurements: • USA measures length in inches and Europe measures length in centimeters • Harmonization of length requires a change from one unit to the other unit • The relation is linear: 1 inch = 2.54 cm • This relation has varied many times in the past until it was settled only in 1959 (Astinet al., 1959) Principle 1: Existence of calibration model There exist a one-to-one mathematical relationship between the different true variables
Principles Measurement error: • Measurements are not without noise • Measurement errors for length are relative small: • Error variation: RSD (%) = 0.1% • Subject variations: RSD (%) = 5.5% • The intraclass correlation coefficient between length in inches and centimeters on one person is ICC > 0.99 Principle 2: Predictability An individual measurement in one variable can be predicted precisely by the measurement of the other variable
Principles Length is universal: • For length measurements there exist one unique reference standard: • Meter in Paris (although the definition is currently artifact free) • The measurement errors are • Small across the whole range of relevant outcomes • Almost independent from any characteristic (e.g. length, time, measurement devices, etc.) Principle 3: Invariance The calibration model and predictability are invariant or consistent across the world
General Model • Assume measurements Y1,i and Y2,i on subject i are obtained with instruments 1 and 2 • Assume that the measurements try to capture the latent variables Z1,i and Z2,i • The conditional distribution of Yhi given Zhi is given by • It describes the measurement uncertainty given the true value • Let Xi be a vector of p covariates for subject i • The conditional distribution of Zhi given Xhi is given by • It describes the distribution of the latent variable in a population
General Model • Two instruments measure the same characteristic when there exist a monotone continuous function ψ such that • The function ψ is referred to as the calibration model • The instruments measure the same thing but in different units • Typical forms for ψ are linear or log linear functions • Note that ψ is given by • The calibration model implies a form of measurement invariance (Meredith, 1993) in the sense that
General Model • Assume for now that the function ψ is known • From the observations we can only calculate ψ(Y1,i) • This leads to a joint distribution for (Y2,i , ψ(Y1,i)): • If Y1,i and Y2,i are independent conditioned on the latent variables • The joint distribution H is a measure for how well Y2,i can be predicted from Y1,i • The joint distribution H equals the marginal distribution of ψ(Y1,i) or Y2,i when Y2,i = ψ(Y1,i) : perfect prediction • A simpler predictability measure is the correlation coefficient between Y2,i and ψ(Y1,i)
General Model • In a few studies separate calibration models ψ1, ψ2, ….., ψkmay be estimable from the data • The invariance principle would require that all calibration models are identical ψ1=ψ2=…..=ψk • This form of invariance does not imply that the latent variable distribution is identical across studies: • With subject i(k1) from study k1 and subject i(k2) from study k2 • Factorial invariance in measurement reliability (Meredith & Teresi, 2006) does require equality • If two subjects have the same ability they have the same probability of correctly answering the items
Analysis Motivating Example Study HUI RAVLT Free BCRP Cued BCRP CCHS 5 15 NA NA CSHA NA 15 12 12 NuAge NA NA 16 16 Statistical Model • Let Zi be a latent variable for memory on subject i • Consider covariates sex (x1), age (x2), and education (x3) • Assume binomial distribution with logit link function
Analysis Motivating Example Statistical Model • Assume a linear calibration model • This calibration model leads to • The invariance principle within a study implies that a and b are independent of subject i, which leads to gh,0 = g0 • Another aspect of invariance is that the calibration model is independent of studies
Analysis Motivating Example Baseline Characteristics • Education is measured in “years of education” • Substantial difference between studies
Analysis Motivating Example Estimates Binomial latent variable model
Analysis Motivating Example • Rey is more difficult than the free recall of Buschke • Free recall of Buschke is easier than the Cued recall • Older age reduces memory: consistent in all studies • There is an effect of gender on memory performance in the CCHS and NuAge, but not in CSHA • The Buschke test discriminates between gender in the CSHA study but not in the CCHS and NuAge • The gain in memory in the latent variable from the free Buschke to the cued Buschke is different between studies • Thus the Buschke test is not consistent across studies violating the invariance principle
Analysis Motivating Example • Harmonization of Rey and Free Buschke seems more realistic, although we can not test for invariance • A subject-specific ICC was calculated for Rey and Free Buschke • Thus predictability of the memory scores for Rey and Free Buschke is quite good
References • Astin, A.V., Karo, H.A., and Mueller, F.H. (1959), Refinement of values for the yard and the pound, US Federal Register. • Bauer DJ, Hussong AM, Psychometric Approaches for Developing Commensurate Measures across Independent Studies: Traditional and New Models, Psychological Methods, 2009, 14(2), 101-125 • Meridith, W. (1993), "Measurement invariance, factor analysis and factorial invariance", Psychometrika, 58(4), 525-543. • Meredith, W., and Teresi, J.A. (2006), "An essay on measurment and factorial invariance", Medical Care, 44, S69-S77. • Van Buuren S, Eyres S, Tennant A, Hopman-Rock M, Improving comparability of Existing Data by Response Conversion, Journal of Official Statistics, 2005, 21(1), 53-72