580 likes | 761 Views
I. Evidence-Based Medicine for Diagnosis or Screening. With thanks to Dr. Mitch Levine. Outline. Evaluating a diagnostic test What is the question? Methodology for a valid study Results: what are the properties of the test? How should you apply test results to decision making?.
E N D
I. Evidence-Based Medicine for Diagnosis or Screening With thanks to Dr. Mitch Levine
Outline • Evaluating a diagnostic test • What is the question? • Methodology for a valid study • Results: what are the properties of the test? • How should you apply test results to decision making?
Diagnosis and screening • Similar methodology, clinical differences • Diagnosis • Patients who present with symptom or sign • Need to find underlying disease • Screening • asymptomatic individuals • “Early diagnosis” of disease
Breast papillary lesions • Management of patients with papillary lesion on breast NCB is controversial • Most asymptomatic, identified as an “abnormal, suspicious for malignancy” on screening mammogram
Papillary lesion can be benign or malignant • Some studies: even if NCB looks benign, “too many” are found to be malignant on the excision • Other studies report better accuracy
NCB: usual duct hyperplasia Excision: papilloma with DCIS NCB: misinterpretation or sampling?
Excision Invasive carcinoma adjacent to papilloma Papilloma Core biopsy Invasive cancer Core biopsy: sampling error
Issues • Is the most important issue not to miss any malignant lesion when a papillary lesion is present on NCB? • Then, take all patients to surgery for excision • Is there a tolerable error rate? • Is it also important not to subject patients to unnecessary surgery? • We accept that NCB for other breast lesions has an error rate • We should decide what minimum sensitivity and specificity we can tolerate: set sample size
Issues • What can we do to improve the reading of the NCB? • Increase number of cores? • Have only expert breast pathologists read breast NCB? • Apply IHC/special stains? • Standardize histological interpretation* • Devised a diagnostic algorithm • Presence/absence of myoepithelial cells • Amount of proliferative activity of ductal cells • Presence/absence of architectural atypia • Presence/absence of cytologic atypia
How to design the study • P= Population spectrum: Are the participants (or their tissue) representative of the population in whom you want to use the test? • Inclusion criteria: Patients with papillary lesion on breast NCB • Exclusion criteria: Any ADH or DCIS adjacent to the papillary lesion (since these have to go on to excision) • Using population extremes: only acceptable for test development
How to design the study • I= Intervention: this is the test you are investigating • Define the test • Diagnostic algorithm (for interpreting the papillary lesion on the NCB) • Define your cut-points / thresholds • Atypical papilloma or DCIS = positive test, OR • Papilloma with any proliferation of any kind = positive • Papilloma with UDH>10% , or ADH, or DCIS = positive • Papilloma with UDH ≥ 50% or ADH, or DCIS = positive
How to design the study • O= Outcome • In order to know if your test is identifying those with true disease and those without, you need to compare the test to something that that is a truthful indicator of whether disease is present or not: the “reference” or “gold” standard • Should be patient relevant • Sometimes it is not perfect, but should be reasonable • We chose surgical excision or, if no excision, follow up by clinical or radiologic means • Surgical excision defined as showing disease if ADH , DCIS or invasive cancer present • Read by 2 pathologists independently with consensus review (adjudication) • Follow up: chart review and /or contact with family doc
1. Breast papillary lesionsThe Question • In women with a papillary lesion on NCB, what is the sensitivity and specificity of a histologic diagnostic algorithm for identifying ADH or worse lesion, as assessed from the surgical excision? (LJ Elavathil) • P = women with a papillary lesion on NCB • I = use of a diagnostic algorithm when reading the core (“histologic diagnostic algorithm”) • O = ADH or worse in the excision specimen • Design: Retrospective
Additional critical issues for study validity: comparing test to reference standard • Independent blind test comparison to a reference standard • The NCB were read and designated as Positive or Negative without the pathologist knowing what the excision or follow up showed (masked or blinded to the outcome) • Independent assessment of the outcome • The pathologists reading the excision specimen did not know the result of the NCB • Rationale: if you know the test is positive, you may read the excision as positive in cases where the interpretation is not clear: makes the test appear better than it is
Test comparison to a reference standard - continued • The reference standard is used in all cases • All or almost all must undergo reference standard • Or at least an adequate random sample if test is positive • Thousands of individuals in screening trials • Rationale: otherwise can overestimate accuracy of the test • All tests are compared to the same reference standard • We had 2 standards (therefore partial verification bias), but >90% of cases had excisions so the bias was somewhat mitigated
Reproducibility / precision of the test • Ability of test to yield same result when applied to same / similar patient • Crucial when expertise is needed to do or interpret the test • Article should tell you how reproducible test results are • reproducibility for NCB: substantial agreement for 3 categories, despite varying levels of experience
Interpreting reproducibility • Mediocre reproducibility, but test still discriminates well • it very useful and can probably be applied to your setting, e.g. cervical Pap screening • If high reproducibility • Test is simple and unambiguous, or • Study interpreters highly skilled (may or may not do as well in your setting)
Understanding the results • You have decided the study has adhered to the previous criteria pretty well (validity of the study is good) and you so can believe the findings
Test Sensitivity = proportion of people with disease who have +ve test = probability of +ve test in people with the disease = probability of detecting disease = a / (a+c) Specificity = proportion of people without disease who have -ve test = probability of -ve test in people without the disease = d / (b+d)
Test Positive predictive value (PPV) = proportion of patients with +ve test who have the disease =probability that a patient with +ve test has the disease = a / (a+b) Negative predictive value (NPV) = proportion of patients with - test who do NOT have the disease =probability that a patient with -ve test does not have the disease = d / (c+d)
PPV and NPV only apply to patients with the same prevalence as the patients where the values were generated from • THEREFORE are not very useful! • Sensitivity and specificity are not affected by prevalence • Beware of clinical differences! • Prevalence of STI in general practice low • Prevalence of STI in STI clinic high, likely also greater disease burden
NCB study: threshold of >10% proliferation NCB Sensitivity = 55 / 58 = 95% (95%CI 86 - 99) Specificity = 63 / 103 = 61% (95%CI 51-71)
Effect of verification bias on sensitivity We had 151 excisions and 10 clinical follow up cases. All 10 FuP had –ve NCB and no disease. But FuP may not detect disease well. What if all10 really had disease, i.e. a false negative test NCB Sensitivity = 55 / 58 = 95% assuming FuP status is true Sensitivity = 55 / 68 = 81% assuming all FuP cases had disease If verficition bias is present, sensitivity of the test can be overestimated
Likelihood ratio Positive likelihood ratio = probability of + test in a patient with disease probability of + test in patient without disease = sensitivity = TP rate 1- specificity FP rate NCB study +LR = 0.95 = 2.5 (1- 0.61) Test *note: if there are more than 2 test results, there will be >2 LR
Likelihood ratio Negative likelihood ratio = probability of - test in a patient with disease probability of - test in patient without disease = 1- sensitivity = FN rate specificity TN rate NCB study -LR = (1-0.95) = 0.08 0.61 Test
Likelihood ratios • If LR > 1, there is an increase in the likelihood that the patient has the disease • If LR < 1, there is a decrease in the likelihood that the patient has the disease • If LR =1, there is no change in the likelihood that the patient has the disease (test is useless)
Converting LR to post test probability Study prevalence: 58/ 161 = 36% What if prevalence of disease in women over age 50 was 70% and a 75 y.o. women had a negative test? If patient was a surgical risk you may opt to delay surgery
Usefulness of LR • LR can help fine tune the risk of disease for an individual patient (because each patient can have a different pre-test probability for disease – different from each other and different from the study subjects) • Can help decide on management
How do get post test probabilities • Mathematical formula • Convert pre test probability to pre-test odds, multiply by LR to get post test odds, convert post test odds to post test probability • Use a normogram • Use on-line calculator
Another use for LR: Receiver Operating Curve (ROC) We dichotomized the test; but we didn’t know which cut off would be “optimal”
Best sensitivity, but at the cost of specificity Best accuracy Core biopsy Area under curve for study test = 0.93 Further from 0.50, (straight line, where LR =1), the better the test
Compare to accuracy of age, lesion size AGE Size 5, 10, 20 mm
Can you apply the test results to a patient and guide individual management? Using LR to generate post test probability Study prevalence: 58/ 161 = 36% What if prevalence of disease in women over age 50 was 70% and a 75 y.o. women had a negative test? If patient was a surgical risk you may opt to delay surgery
Things to think about:how would you or a clinician use your test results • Will it move you across a test-treatment threshold?
Making a diagnosis • Generate the diagnostic possibilities (differential diagnosis): pre-test probability of each diagnosis • Do tests, where appropriate, to increase or decrease the likelihood of each diagnosis to arrive at a post test probability
Pre-test probabilities:where do they come from? • Your knowledge base • Personal experience (beware of memory biases!) • Clinical characteristics • The literature
Test thresholds • Test thresholds vary among diseases • If missing disease has dire consequences, you may test even if you feel the probability of disease is low • p63 for sclerosing adenosis vs tubular carcinoma • p16 for serous vs endometrioid carcinoma of endometrium • Her2neu IHC threshold for calling negative: may be uncertain in some cases, may ask for FISH • If treatment is associated with important adverse effects, you may test to confirm your Dx, even if you feel the probability of disease is fairly high • confirmation of clinical and radiologic suspicion of malignancy • In situ Papillary cancer vs p63 breast papilloma • Her2neu IHC 30% threshold for 3+ may be uncertain in some cases; may want to confirm by FISH
Doing more good than harm • Threshold may be influenced by the test • You may more readily test if the test is easy to do, relatively harmless • BUT, tests can mislabel patients, give false sense of security, subject them to unnecessary additional procedures, cost money, confuse the physician • Especially important in screening well individuals and labeling them • HPV testing?
Using test results to assess the quality of your practice • A staff urologist lodges a complaint with the quality assurance committee of the hospital. He has had three patients with negative biopsies of the prostate in the previous year that have subsequently turned out to have prostate cancer.
The urologist believes that either the radiologists who performed the biopsies under ultrasound guidance or the pathologists reading the slides were not competent, and requests that the hospital investigate the problem and take appropriate actions.
Possible actions • Review the 3 biopsies • Go 1 step further
Disease Prevalence During the year that the 3 patients with negative prostate biopsies were found to have prostate cancer at a later date, hospital data reported: 230 positive biopsies and all 230 subsequent prostatectomies positive 321 negative biopsies, of these 3 were false negatives 551 total, prevalence 230/551 = 42%
Test Prevalence of disease (pre test probability) = 233/551 = 43%
Diagnostic tests are not perfect The prostate biopsies were performed with a biopty gun. From the literature, the procedure has the following properties: Sensitivity = 91% Specificity = 100% Positive LHR = Negative LHR = 0.09