110 likes | 271 Views
Data Issues: Quality and Analysis. By Ziyad Mahfoud, Ph.D. Associate Professor of Biostatistics Department of Public Health. Sample Size and Power.
E N D
Data Issues: Quality and Analysis By Ziyad Mahfoud, Ph.D. Associate Professor of Biostatistics Department of Public Health
Sample Size and Power • The sample size of the study is usually based on several factors such as the study design, type of main outcome, expected differences between cases and controls with regards to the main outcome, significance level (usually set at 5%) and the power (usually set somewhere between 80% and 90%) of the study. • The power of the study is the ability of the study to detect significant differences/associations when in reality those differences/associations exist. • In case there was a drop in the power of the study then this means, even if significant associations exist the study will have diminished ability to detect them.
The most common reason for a drop in the power of the study is the inability to reach the sample size needed for the analysis. • Not reaching “analyzable” number of patients could be due to: • Recruitment problems • Data quality problems
Recruitment Problem 1: The number of recruited people • We will keep a track of the number of recruited patients from each of the 5 sites • The numbers will be compared to those “expected” at the start of the study • If present, slow recruitment will be noted and the PIs will be informed for possible corrective measures.
Recruitment Problem 2: The type of recruited people • Ineligible patients: Patients should be at least 18 years old, with “Arab” as self reported ancestry and RA diagnosed according to the American College of Rheumatology (ACR) criteria • Controls will be recruited simultaneously with the cases. That is we will NOT wait till the end of cases recruitment to start with the controls. • Past recruitment, controls will be matched to cases based on age, gender and self reported ancestry. • To minimize recruitment of too many unnecessary controls, track sheets will be given to each center where age, gender and self reported ancestry of recruited people (cases and controls) will be recorded. This will allow each center to see from the cases and controls recruited who matches with who so that they have an idea of what type of controls they are still missing and need to recruit.
Data Quality Problem 1-missing data • Missing data especially on the main outcome result in the participant not contributing to the main analysis of the study • For example if for one participant we are missing his/her blood samples • Missing data on variables used for matching (age, gender and self reported ancestry) will render us unable to find a match and hence that patient will not contribute to the study analysis • Missing data on other covariates diminishes our ability to find associations between such variables and the main outcomes. • For example cigarette smoking status
Main outcome • Blood • Need to make sure that you get blood from each patient • Need to make sure that you get the required amount of blood from each sample • Need to make sure that it is well preserved and shipped.
Data Quality Problem 2- patient numbers/labels • Each participant has • “Demographic and clinical” questionnaire • “Ancestry” questionnaire • Blood samples • Labels on all three entities has to be present/or entered and be exactly the same . • Missing labels or unmatching labels (for the same patient) render data not usable for analysis
Quality Control starts with you since data is collected through interviews and not “self reports” by patients. • Choosing eligible patients and controls interviewing them and collecting the blood and storing it are all controlled by you. • In general interviews yield minimal number of missing data; for example, when a patient refuses to answer a certain question. • When obtaining informed consent from potential participants (Cases and Controls) you have to be very clear that the consent is for the questionnaire and for the blood sample. This will minimize later refusals by participants (cases and controls) to give blood.
What are we trying to do to help you? • Using surveygizmo for data collection/entry, the program does not allow you to skip very important questions (those pertaining to eligibility criteria, patient number..etc). Moreover, in some instances it will not allow you to put answers that are out of range such as age. • We will check the data periodically to track patients recruitment and look for quality control issues that we discussed here. • Quarterly news letters will be sent to all investigators and coordinators for updates. Also, monthly Skype calls to discuss progress and other issues that arise. • Recruitment or Quality control issues will be reported to the PI and the co-PI in the concerned center for correction measures to be taken.