620 likes | 1.24k Views
Lecture 3: Methodology 1: . Understand Research/Articles. Assess the articles to find a right answer for your question Where do the subjects come from (Target population)? Is the method right? Can the variables evaluate the research question? What are the biases and confounders ?.
E N D
Understand Research/Articles Assess the articles to find a right answer for your question • Where do the subjects come from (Target population)? • Is the method right? • Can the variables evaluate the research question? • What are the biases and confounders?
Bias Bias is defined as any tendency which prevents unprejudiced consideration of a question. Bias is a systematic error, an inclination, a predisposition, partiality, prejudice, preference, or predilection that misleads results. Bias occurs during the design, measurement, sampling, procedure, or choice of problem studied. Many factors can cause bias, in which case the results of a study are artificially increased or decreased rather than identifying a true effect.
Type of Bias • Selection bias: The study in Cleveland is not representative of the US. • Measurement bias: The research design does not match the research question. • Expectancy bias (Experimenter-expectancy effect, Observer Bias): The researcher’s expectations affect the outcome of a study. • Social desirability bias: The tendency of respondents to answer questions in a manner that will be viewed favorably by others. • Observation bias: Memory Recall and Information, Interviewer and Misclassification • Losses to follow up bias: If the researcher lost many subjects in “follow-up,” the subjects who are in analysis are special subjects. • Response and non-response bias: Subjects answered what they wanted to say. Subjects did not answer what they did not want to say. • Reporting Bias: If researchers find their hypothesis was wrong, they don’t publish it.
How to control bias • To control bias: • Meta-analysis study • Random sampling/Random assignment • Blind • Single blind (subjects) • Double blind (subjects and evaluators) • Triple blind study (subjects, evaluators, statistician)
Random Sampling To achieve results with minimized selection bias • Draw from a population • Use random number generators (table, Excel, coin, etc.) Population (your targeted people in your project) • In your project, if you use your clinical patients, the population is only your clinical patients, not “patients/people”. • If you want to know about US population, take a randomized selection of the US population. • If you want to know about “population,” take a randomized selection of people from all over the world.
Population Who are they? Patients? Low-income? US population? Sample Randomized sampling Assignment Randomized assignment Control group Study group No-intervention/ no-exposure Intervention/ exposure Outcome Outcome
Random Assignment (Randomization) =COUNTIF(B2:B11,1) prejudice =RANDBETWEEN(1,2)
Confounders Correlates with both the Exposure (dependent variable) and the outcome (independent variable). Confounder (F) Exposure (brushing) Outcome (DMFT) Known confounders: includethese in the list of variables • Unknown confounders: seen after preliminary analysis - so conduct more research or statistically control the results.
Confounders? Child’s age Amount of meal Outcome (DMFT)
Reduce the Effects of Confounders • Equally assign confounders to both the study group and the control group. • Use subjects with similar characteristics (Matching) • Statistically control the results (regression analysis)
Data Quality Control During research procedures, ensure a high quality of data: even though you have a high quality study design, all your efforts will be destroyed if you have poor data quality. Major types of problems: • Missing data (e.g. Subjects could not answer) • Incorrect data (e.g. Data entry problems) • Excess variability (e.g. Lack of training or carelessness)
Missing data • Missing Completely At Random (MCAR): missing data unrelated to the variable or to any other variables (there is no reason) • Missing At Random (MAR)/Missing Conditionally at Random: missing data is not related to the missing variable, but it is related to other variable. (if we can control for this conditional variable, we can get a random subset. Low-income people tend to skip a question about their income.) • Not Missing At Random (NMAR)/Selective missing (there is a reason) • Missing by design • Not applicable • Dropping subjects (Refusal) • Inability to respond
Missing At Random (MAR)/Not Missing At Random (NMAR) It looks like there is relationship between the value of v5 and the missing of v6. It is MAR. If there is reasons for the missing of v6 in the question of v6, they are NMAR
Minimizing the Error in the Database • Do a systematic literature review to find possible necessary variables and the confounding factors you need to control. • Select variables used in previous research to avoid a lack of variables. • Make an analysis plan and test the plan using expected data to find missing variables you need for the statistics. • Conduct a pilot study, you might see unknown confounding factors and/or MAR/NMAR.
Minimizing the Error in the Database • Accurate data collection and entry Poor dataset quality will lead to poor results. Subject Paper data sheet Database
Minimizing Error in the Database • From subjects to record forms • The researcher promotes the subject’s mistake. e.g. Questions with conditions can make subjects confused. Make two questions if you want to ask a conditioned question.Example: “What was the last treatment procedure when you went to the dentist?”Instead, first ask, “Did you go to the dentist? Y/N” Subject makes mistake: This is always possible • Record forms to database • Data entering form should be user friendly • Verification (double check) or Double entry!!! (most accurate!)
Minimizing the Error in the Database Design protocol and create a manual to conduct research/enter data in a consistent manner. e.g. When the dentist is not sure whether enamel caries or dentin caries is present, call it enamel caries. Please circle your best answer 1 2 3 4 5 What does a blank cell mean in a dataset? Did not enter? Data missing? N/A? Or something else? Missing data=-9, N/A=-8…. A conservative option to avoid type I error. If you want to choose either 2 or 3, choose a less significant option for your hypothesis to avoid type I error, or, omit this subject.
Database • MS Excel • MS Access • SPSS • SAS • REDCap: It is an encrypted, password protected, web based database (IRB loves it!) https://dcru.case.edu/redcap/ • Considerations: • Confidential information • PHI (Protected Health Information) • Jump drive, Lap top (could be lost or stolen)
Data entering form You have to know you will make mistakes
Double Entry =IF('first '!A1='second'!A1,'first'!A1,"ERROR") Enter your data twice and use a function to find your mistakes
Statistics-The Number of Sampling- • If you are able to investigate the entire population or the total number of the population is relatively small, collect variables from the entire population. e.g. If you want to know about your patients’ dft, see all your patients’ charts. • If not, calculate the appropriate sample size. It should be “large enough.” • There are a lot of websites and different software. • Essential number of the sample size calculation: estimated mean and SD Testing procedure Calculate sample size Number of subjects? Number of subjects? Mean? Estimated mean SD? Estimated SD P value P value=0.05
Large Sample Sizes Are Good! Large sample sizes are good in order to see the difference, and it can get close to the population’s characteristic of interest. Large number Small number
Are Large Sample Sizes Good? Problems: • Takes more time • Leads to subject abuse • It might be clinically no-significant difference (World wide sex ratio at birth is 105 boys to 100 girls. Does it matter?) • Parametric analyses’ assumption is that data is the normal distribution. If the size is too large, it becomes significantly different distribution (the goodness-of-fit test) • “large enough” should be an ethically acceptable number
Statistics Type of data • Continuous data: e.g. age • Ordinal data: e.g. not at all, some, a lot • Categorized data: e.g. male, female • Parametric data: there is an assumption of an underlying normal distribution and homogeneity of variance (SD). (However, statistics software have adjustments of variance if sample groups have different variances). • Non-parametric data: There is no assumption of an underlying normal distribution.
Statistics • Parametric data analysis: using the mean and SD of the values • Continuous data: e.g. age • Ordinal data (5 or more options): e.g. not at all, some, a lot • Non-parametric data analysis: using the order (or "rank") of the values: Gold, Silver, Bronze • Ordinal data (4 or less options): e.g. not at all, some, a lot • Categorized data: e.g. male, female
Descriptive analysis • Parametric data analysis: use the mean and SD of the values • Mean, Standard Deviation (SD), Standard Error (SE), 95% confidence interval… • Non-parametric data analysis: use the order (or "rank") of the values • Percentage, Median, Mode, …
How to use tests of comparison Did you do sampling? Yes! Because I don’t have access to whole entire target population No! Because I have access to whole entire target population You need a test for comparison. You need descriptive analysis and simply compare your data.
Hypothesis • To test your data, you need a hypothesis. • Null Hypothesis (H0): A=B, mean/distribution of variable 1 is equal to variable 2. • Alternate Hypothesis (H1): A ≠ B, mean/distribution of variable 1 is different from variable 2.
Comparisons • Parametric data analysis: use the mean and SD of the values • t-test (two groups) • Independent t-test: e.g. male vs. female • Paired t-test: e.g. first visit vs. second visit • ANOVA (three or more groups) then Post-hoc multiple comparison (Bonferroni correction, Tukey test…) • Repeated measures test (e.g. first visit vs. second visit vs. third visit) SD Mean Value Value A B A B C
Comparisons • Non-parametric data analysis: use the order (or "rank") of the values • Mann-Whitney test (independent two groups) • Wilcoxon Signed Rank test (paired two groups) • Kruskal-Wallis test (independent three or more groups) • Friedman test (repeated measures) Value Value A B A B C
Box plot Median • The inter-quartile range (IQR) • The extreme values (within 1.5 times the inter-quartile range from the upper or lower quartile) are the ends of the lines extending from the IQR • Points at a greater distance from the median than 1.5 times the IQR are plotted individually as asterisks. These points represent potential outliers. 25% 25% 25% 25%
Bar graph vs. box plot Maximum Median Minimum but…35 is an outlier Data= 35, 49, 51, 52, 55, 56 57, 58, 59, 60, 65 Minimum on box plot -1 SD Mean +1 SD
Comparisons • Non-parametric data analysis: use the order (or "rank") of the values • Distribution comparison: • Chi-square: e.g. 2x2 table Chi-Square=4.464 df=1, p=0.035
Correlation Correlation analysis measures the relationship between two items: e.g. age and number of caries caries Production of sucrose
Regression Regression analysis is for estimating the relationships among variables. Y (dependent variable) is ‘driven by’ multiple variables Xn (independent variables) and the effect of each Xn is amplified by coefficients, a, b… then : Y=aX1+bX2+cX3….+u X3 Number of caries X2 X1 Xn
Regression Hypothesis: “The number of children’s caries is influenced by gender, parents’ educational level, frequency of brushing, frequency of flossing, and frequency of dental visit.” Y=aX1+bX2+cX3….+u Children’s caries= a(gender)+b(parents’ educational level)+c(frequency of brushing)+d(frequency of flossing)+e(frequency of dental visit)