230 likes | 452 Views
A quick survey of Epidemiology and its methods. Seminar by: Diego Villarreal. Outline. What is epidemiology and what is it used for A quick glance at statistical hypothesis The normal distribution and p-values Example of an epidemiological survey conducted in India Concluding remarks.
E N D
A quick survey of Epidemiology and its methods Seminar by: Diego Villarreal
Outline • What is epidemiology and what is it used for • A quick glance at statistical hypothesis • The normal distribution and p-values • Example of an epidemiological survey conducted in India • Concluding remarks
What is Epidemiology? Epidemiology is often defined as the study of factors that determine the occurrence and distribution of diseases in a population Epidemiology is used to determine the course of medical treatments, public health and scientific research. Areas in which epidemiology plays an important role: Nutrition, Environmental Health, HIV etc… Jekel J.F, Katz D., Elmore J. Epidemiology, Biostatistics, and Preventive Medicine. W.B Saunders Company. Philadelphia USA 2001.
Epidemiology is often divided into two different groups: Classical Epidemiology and Clinical Epidemiology Classical Epidemiology- population oriented, studies the risk factors associated with a population. (HIV, Pb level in blood etc…) Clinical Epidemiology- Studies patients in health care settings in order to improve the diagnosis and treatment of various diseases. ( Drugs, nutrition etc.)
How do we know that the observations we make of a specific population are correct? Are they always the same? Can we quantify our “Correctness” Epidemiologists use tools such as probability and statistics to assess and validate their findings.
A little Probability and Statistics Before testing a hypothesis, we must set up the hypothesis in a quantitative manner. The measurements done in epidemiological studies must be a number of some sort. (i.e. number of patients that did not receive a drug and died, mean blood pressure in HIV patients, Pb levels of children in India etc…) In statistics, there are usually two types of variables; Continuous and Discrete. Continuous Variables can assume an infinite number of values (Pb levels, blood pressure, age, height) Discrete variables can only assume a fixed number of numerical values ( Sex, Pregnancy etc…) Montgomery C.D, Runger C.G. Applied Statistics and Probability for Engineers. Jon Wiley & Sons. New York NY: 2003.
The Null Hypothesis A hypothesis that is tested statistically is called a Null hypothesis (Ho). The null hypothesis usually takes the form of a “no difference hypothesis”, and we try to reject it with the gathered data. Example: We want to test the efficiency of a drug that reduces the death rates in two different groups. Ho: Death rate group A = Death rate group B or Ho : Death rate group A- Death rate group B = 0
The null hypothesis is tested against an Alternative hypothesis HA HA: Death rate A –Death rate B ≠0 The HO is rejected if there is an observed difference in the death rates of both groups. By rejecting Ho we automatically accept HA Notice that HA does not specify whether Death rate A > Death rate B Wassertheil-Smoller S. Biostatistics and Epidemiology. Springer. New York NY: 2004.
Errors Associated with accepting or rejecting a hypothesis: Rejecting the Null hypothesis incorrectly- Type I Error Failing to reject the Null hypothesis- Type II Error
It is important to understand that we can never eliminate the risk of making one of these errors. However we may lower the probability of making these errors. The probability of making a Type I error is known as the Significance Level of a Statistical Test Due to the intrinsic nature of the Null hypothesis, by lowering the probability of a Type I error you increase the probability of a Type II error To lower both probabilities, one must increase sample size.
The most widely used model for the distribution of a random variable is the normal distribution, which is described by the following mathematical expression: Where: µ = Mean σ = Standard Deviation
It is important to notice the following facts: 68% of the observations fall within 1 standard deviation of the mean, that is, between µ- σ and µ + σ . 95% of the observations fall within 2 standard deviations of the mean, that is, between µ- 2σ and µ+2σ . 99.7% of the observations fall within 3 standard deviations of the mean, that is, between µ- 3σ and µ+ 3σ .
P-values The p-value is an index of the strength of the evidence with regard to rejecting the null hypothesis. The p-value gives us an idea of whether or not our data arises from mere chance, or is indeed reliable and “true” By convention, if a p-value > 0.05, we say that the result is NOT statistically significant, therefore accepting the Null hypothesis.
Blood lead levels in Bombay:A Case Study As stated at the beginning, Epidemiology studies the occurrence and distribution of diseases in a population. In 2002 a study was done to compare the blood lead levels (BLL) of children in Bombay after the use of lead in gasoline was prohibited. The data collected was compared against existing data from 1997 (before Pb was prohibited).
In 1997 the Georges Foundation conducted a study of 291 children (ages 6-10) in Bombay to test the BLL of children in metropolitan areas in India This study showed the following: 61.8% (n=180) had BLL > 10 μg/dL 14.7% (n=43) had BLL > 20 μg/dL 2.7% (n=8) had BLL > 30 μg/dL 0.6% (n=2) had BLL > 40 μg/dL The study also pointed out that the mean BLL was in the range of 8.6-14.4 μg/dL Concentrations of lead in air in various locations ranged from 0.10 – 1.18 μg/m3 At the time of this study, seasonal variations (monsoon vs. non-monsoon season) were not studied. Nichani, V; Li-W.I; Smith M.A; Noonan G; Kulkarni M.; Kodavor M; Naeher L.P; Science of the Total Environment 2005 (In Press).
In the 2002 study, measurements were done in two different campaigns (Monsoon and non-monsoon season) A total of 754 children under 12 yrs were sampled. 276 (36.6%) during non-monsoon season and 478 (63.4%) during monsoon season. BLL were measured using an ESA Lead Care Portable Analyzer. The study locations were Panchseel Hospital (Mulund) and low socioeconomic areas in Mulund and Thane. This was done to include children with different socioeconomic status (SES). SES was determined by parental occupation and geographic location.
The dependent variable used in the analysis was BLL. Independent variables were age, sex, SES, and season. Sex, SES and season were treated as discrete variables, while age and BLL were treated as continuous variables. Since the distribution of BLL was not normal, the data was normalized using a Box-Cox transformation.
Sample t-tests showed that BLL’s differences across SES were significant (t=-5.9; p<0.0001), with lower SES having higher BLL BLL’s across seasons were also statistically significant (t=5.4, p <0.0001). Higher BLLs in the monsoon season. BLL between boys and girls were not statistically significant (t=1.1, p = 0.28) Age is associated with increasing BLLs to a small degree (p = 0.014). Nichani, V; Li-W.I; Smith M.A; Noonan G; Kulkarni M.; Kodavor M; Naeher L.P; Science of the Total Environment 2005 (In Press).
From the study done in India many things can be concluded. 1. Eliminating the Pb from gasoline is extremely important in lowering the BLL of individuals. 2. Children with lower SES are more susceptible to blood poisoning. 3. During monsoon season, the BLL’s of children tend to be higher. 4. Developing countries around the world MUST prohibit the use of Pb in gasoline in order to secure the health of their citizens.
Conclusion • Epidemiology is a great tool in order to shape public health policy • Statistics help epidemiologists determine whether or not observations within a population are relevant and significant • Epidemiological results MUST be used not only by scientists and doctors, but by politicians in order to have a healthy and productive society.