Cohort and longitudinal studies: statistics

Cohort and longitudinal studies: statistics Kath Bennett 25th June 2007

Outline • Statistical methods • Prevalence, Incidence • Rates, standardised rates, case fatality rate • Relative rates/risks • Attributable risk and Attributable risk fraction • Regression and beta coefficients

Measuring disease/drug frequency • Prevalence is the proportion of a population that are cases at one time • = no of existing cases with disease/on drug Total population in same time period e.g birth defect prevalence rate = no of births with given abnormality Total number of live births

Measuring disease/drug frequency • Incidence is the rate at which new cases occur in a population during a specified period =Number of new cases/starting on therapy in period Population at risk in given period e.g. Bacteremia among OC users in study of 482 using OC, 27 developed bacteremia Incidence= (27/482)*100=5.6%

Inter-relation between incidence and prevalence • Prevalence = incidence* average duration

Rates commonly used in epidemiology • Crude rates • For entire population eg. Total deaths/population • Category specific (e.g. age-specific, gender-specific) • Eg. Cancer rates by age-category, 35-44, 45-54 • Age-adjusted or standardised rates • Allows for appropriate comparison when differing populations being studies. Reduces distortions in comparison due to age differences in populations.

Rates • Used to quantify the risk in a population. • Numerator is number of events • Denominator is • Population at risk • Person-years at risk or exposure • Usually presented as per 1000 persons or 1000 person-years at risk.

Rates • Provided the disease/event occurs randomly and independently, reasonable to assume follow Poisson distribution. • Estimated standard error of rate SE (rate)=number of events / pop at risk Approx 95% CI for number events (E) EL=(1.96/2 - E)2 EU=(1.96/2 + (E+1))2 Approx 95% CI for rate Lower CI= EL/Pop, Upper CI= EU/Pop

Rates • The number of deaths from CHD in Scottish men in 1995 was 1080, total population of men 612955. • Rate of CHD deaths per 1000 • (1080/612955)*1000 = 1.76 • 95% CI for number of deaths is EL=(1.96/2 - 1080)2 = 1016.55 EU=(1.96/2 + 1081)2 = 1146.4 • 95% CI for rates is • 1016.55/612955 , 1146.4/612955 • (1.66, 1.87)

Case fatality rate • Usually expressed as the percentage of persons diagnosed as having a specified disease who die as a result of that illness within a given period. • Sometimes 30-day or 1 year case-fatality rates quoted. • The case-fatality rate is not the same as mortality rate. • Is also referred to as fatality rate or case-fatality ratio.

Relative rate • Relative rate is ratio of 2 rates • i.e. (rate group 1)/(rate group 2) • Also known as rate ratio • Example, rate for CHD in men calculated as 1.76 per 1000 in 1995, and 0.483 per 1000 in women. • Relative rate =1.76/0.483 =3.65

Mortality from lung cancer by 5-year age-bands

Standardised rates • Direct method • Requires a standard population to which age-specific rates are applied • Multiply age-specific rates in pops A &B by standard population then compare • Indirect method • If age-specific rates for populations not possible • Based on applying age-specific rates for standard population to population of interest to determine ‘expected’ • Standardised Mortality Ratio (SMR) = Total Observed deaths / Total Expected

Direct standardisation – raw data

Direct standardisation

Indirect method

Indirect method • Observed 1781 deaths • SMR = 1781/2032.5 = 0.876 • SMR>1 more deaths than expected • SMR<1 less deaths than expected • Sometimes multiplied by 100.

Examples of cohort studies • Framingham study. Cohort of 5,209 men and women in 1948 between the ages of 30 and 62 from the town of Framingham, Massachusetts, to examine CVD development, returning every two years. • British doctors study. Cohort of male doctors started in 1951 to examine smoking and related mortality (Doll and Hill). • Million Women Study. Cohort of 1 million women asked about HRT use and followed up for cancer incidence.

Data from a cohort study are expressed as: Follow-up to see whether: Incidence in exposed (IE)=a/a+b Incidence in unexposed (IO)=c/c+d

Data from cohort studies are analyzed in terms of: 1. Relative risk (RR)= Incidence rate in the exposed group (IE) Incidence rate in the non-exposed group (IO) • Relative risks significantly higher than 1 imply that the factor under study is associated with an increased risk of disease • Relative risks significantly lower than 1 imply that the factor is associated with a decreased risk of disease. • The magnitude of the relative risk indicates the strength of the association.

Attributable Risk • Attributable risk implies that not all of the disease incidence is due to the exposure, as some nonexposed individuals may develop the disease. IO (Incidence in nonexposed group) = "background incidence" IE (Incidence in exposed group) = ”background incidence”+ Incidence due to the exposure • Therefore, the incidence in the exposed group which is attributable to the exposure can be calculated by subtracting: Attributable risk (AR): AR = IE – IO

Attributable risk percent: The percentage of the total incidence in the exposed group which is attributable to the exposure can be calculated by: AR% = IE - IO x 100 IE Short cut also = (RR-1)/RRx 100

The Population is not all exposed • How many cases per 1000 population are attributable to the exposure? • Do we know the Incidence in the total population (It)? • It =Ie x Pe+Io x (1-Pe) • Pe is the Prevalence of the exposure • Population Attributable Risk • PAR=It-Io • Short cut = AR X Pe

The Population • What proportion of the risk in the total population is attributable? • Population Attributable Risk %,PAR%, PAF • The Denominator is the risk in the total population (It). The numerator is the “extra” risk (PAR) which is It -Io • PAR% =(It -Io)/ It • PAR% = Pe(RR-1)/[Pe(RR-1)+1]

Example: Cohort study of smoking and coronary heart disease Cohort of initially healthy people Develop CHD Don't Develop CHD Total Smokers 84 2916 3000 Non-smokers 87 4913 5000 Incidence in Smokers = 84/3000 = 28.0 per 1,000/year Incidence in Non-Smokers = 87/5000 = 17.4 per 1,000/year RR = Incidence in Smokers = 28.0/1000/yr = 1.6 Incidence in Non-Smokers 17.4/1000/yr

Example: Attributable risk AR (incidence in exposed group attributable to the exposure) = Incidence in Smokers - Incidence in Non-Smokers = 28.0 - 17.4 = 10.6/1000/year AR% (% total incidence in exposed grp attributable to exposure) =Incidence in Smokers - Incidence in Non-Smokers x 100 Incidence in Smokers =(28.0 - 17.4) x 100 = 10.6 x 100 = 37.9% 28.0 28.0 37.9% of the morbidity from CHD among smokers may be attributable to smoking.

Example: Population AR (PAR) Population Attributable Risk PAR= It –IO = AR X Pe Suppose prevalence of smoking in the population is 40% We can calculate It =Ie*Pe+Io(1-Pe)= 21.6/1000 PAR = 21.6/1000 – 17.4/1000 =4.2/1000 Also = AR X Pe =(10.6/1000)X 0.4 =4.2/1000 Population Attributable Risk % PAR% =[(It-Io)/It] = (21.6-17.4)/21.6 = 19.4% Or PAR%=( Pe(RR-1) / [Pe(RR-1)+1] =0.4*(1.6-1)/[0.4*(1.6-1)+1]=19.4%

Methods to account for varying lengths of follow-up Since participants may enter or leave the study at various times due to death, emmigration or loss to follow-up, the time of observation is usually not uniform. This is accounted for by main methods: 1. Person - years of observation 2. Life - table method 3. Survival analysis, Cox proportional hazards

Regression

Is the linear relationship reasonable?

Regression • In broad terms, regression can be thought of as a statistical model which is used to help us get the ‘best guess’. • Formally, we assume that there is an underlying linear relationship between the variables, and our observations lie scattered about that line. • The actual value of ‘Y’ is ‘scattered’ about the expected or predicted value of Y.

Linear Model Illustrated • Line is ‘best’ fit in statistical sense. • Scatter is called residual variability.

Regression equations • The equation of a line can be written as • E[Y] = a + bX. • a = intercept (where line cuts Y axis) • b = slope of the line (can be +ve or –ve) • Software fits the ‘best guess’ for a and b. Regression Analysis: Leaving versus Mock The regression equation is Leaving = 7.75 + 0.817 Mock Predictor Coef SE T P Constant 7.749 3.122 2.48 0.023 Mock 0.81749 0.04284 19.08 0.000

More questions • Is there an effect of gender? • That is can one consider different lines for red and blue? • Should these lines be parallel or not? • What about ‘adjusting’ for other covariates such as age or ‘intelligence’? • More than one variable = multiple linear regression

Cohort and longitudinal studies: statistics

Cohort and longitudinal studies: statistics

Presentation Transcript

Collaborative Data Management for Longitudinal Studies

The Achievement Gap: Lessons from the Early Childhood Longitudinal Study – Birth Cohort (ECLS-B)

The role of fathers in child cohort studies

16: Odds Ratios [from case-control studies]

Cohort and case-control studies

Case-Control Studies (Retrospective Studies)

Cohort Studies for Outbreak Investigations

Evaluation of Census Data using Consecutive Censuses United Nations Statistics Division Demographic Statistics Section

Longitudinal studies: Cornerstone for causal modeling of dynamic relationships

ANALYTICAL STUDIES

Longitudinal Studies of Children’s Intelligence

Cohort Study

Epidemiology Kept Simple

Hierarchy of Clinical Evidence

Longitudinal Beam Dynamics Studies in EMMA

Growing Up in Ireland Conference

EP711 COHORT STUDIES

Chapter 7: Observational Cohort Studies

Cohort study

Design and Methods of Cohort Studies

Dependent Interviewing : Seminar, University of Essex 16-17 September 2004

Early findings from the NSW motor accident longitudinal cohort study