100 likes | 272 Views
SSC 2006: Case Study #2: Obstructive Sleep Apnea. Rachel Chu, Shuyu Fan, Kimberly Fernandes, and Jesse Raffa Department of Statistics, University of British Columbia. Objectives .
E N D
SSC 2006: Case Study #2: Obstructive Sleep Apnea Rachel Chu, Shuyu Fan, Kimberly Fernandes, and Jesse Raffa Department of Statistics, University of British Columbia
Objectives • To compare the sensitivity and specificity of the Berlin Questionnaire (BQ) when comparing the first and second night RDI outcomes. • To determine if an abbreviated BQ can be developed with similar sensitivity and specificity. • To determine if sensitivity and specificity of the BQ are a function of gender. • To ascertain if a battery of questionnaires in addition to the BQ can improve sensitivity and specificity in women.
Objective #1 – First Night vs. Second night • The sensitivities and specificities for the first, second, and either nights are very similar • The first night does not appear to be any less accurate than the second night – in fact, the opposite may be true (the first night may be better).
ROC Curve Hypothesis Testing • We assessed the accuracy of study’s questionnaires by comparing the area under the Receiver Operator Characteristic Curve (ROC). • The ROC curve plots sensitivity vs. the false positive rate for all possible cut-offs. • The Area Under the Curve (AUC) is a method frequently used to assess accuracy of questionnaires. • In our case, the ROC curves are correlated. • We take a Non-Parametric approach developed by Delong, et al (1988) for hypothesis testing involving the AUC for two or more correlated ROC curves. • This method uses generalized U-statistics to estimate the covariance matrix.
Objective #2 – An Abbreviated BQ • We gave each question on the BQ equal weight, and removed the structure imposed by the questionnaire’s categories – effectively making it a questionnaire with a maximum score of 11. • We used a backwards selection-type algorithm. • Start off computing the AUC of the ROC curve involving all 11 items. • Do hypothesis testing on each of the 11 smaller 10 item questionnaires compared to the 11 item questionnaire. • Eliminate all questionnaires which have a statistically significant smaller AUC • Take the 10-item questionnaire with the largest AUC, and repeat for all 9-item questionnaires. • If all questionnaires are determined to have smaller AUCs, keep the larger questionnaire, and proceed to the group of questionnaires with a smaller number of items. • Repeat for all questionnaires with a smaller number of items. • After the final questionnaire has been selected, it was tested against the largest questionnaire (11 items).
Objective #4 – Battery of Questionnaires • To combine the data from each questionnaire, we computed the risk score: P(OSA | Questionnaire Responses) • The risk score has been shown to maximize the ROC curve at every point (McIntosh and Pepe, 2002). • The risk score was computed using logistic regression with each test as predictors in the model. • The fitted values from this model were then used as the composite questionnaire result. • ROC curves were then constructed, and the AUC was tested as previously described.
Objective 3: Gender Differences • Logistic regression was carried out with specificity and sensitivity as the response • Models consisting of subsets of the variables in various forms (binary, quartiles, continuous) were considered: Gender, Alcohol, Caffeine, Age, Systolic, Diastolic, BMI, Neck Size • Larger models were refined by dropping insignificant variables using the likelihood ratio test • For Sensitivity: • BQHigh = Berlin Questionnaire classifies a patient as High Risk • High Risk = RDI > 10 • P(BQHigh | Gender, High Risk, and other variables) • For Specificity: • BQLow = Berlin Questionnaire classifies a patient as Low Risk • Low Risk = RDI <= 10 • P(BQLow|Gender, Low Risk, and other variables)
Conclusions • Hypothesis #1: • The Sensitivity (0.59) and Specificity (0.43) of the Berlin Questionnaire were quite low compared to results found in Neltzer et al., 1999 (Sensitivity = 0.86, Specificity = 0.77) • Based on sensitivity and specificity, the second night does not demonstrate a higher correlation to the Berlin Questionnaire over the first night • Hypothesis #2: • We were able to reduce the original questionnaire to three items (#2, #5, and #10) • Such a questionnaire had a larger AUC compared to the original questionnaire (0.7 vs. 0.5, p-value < .01)
Conclusions • Hypothesis #3: • Sensitivity: BMI and neck size are important covariates when modelling Sensitivity vs. Gender. Being male vs. female reduces the sensitivity of the Berlin Questionnaire in all models, but gender effect is only significant when neck size is included in the model [95% CI for odds ratio: (4.43e-05, 0.29)] • Specificity: Gender does not appear to affect the specificity of the Berlin Questionnaire when accounting for the most important covariate, BMI (p-value 0.17). • Hypothesis #4: • Adding questionnaires to the Berlin Questionnaire did improve performance as measured by the AUC • In particular, the AIS Questionnaire was found to be particularly useful (despite AIS being negatively associated with sleep apnea) • The best composite questionnaire in all patients was the Berlin Questionnaire and AIS (AUC = 0.65). • In women, the largest composite questionnaire (all questionnaires) was not statistically different than the Berlin Questionnaire alone (p-value 0.34).
Acknowledgements • We would like to thank: • Dr. Alison Gibbs • Dr. Sharon Chung • Dr. Michael Schulzer • Dr. John Petkau • Dr. Harry Joe