Chantal D. Larose

Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian AnalysisPart of an Undergraduate Research course Chantal D. Larose

Overview • Introduction • Three Ingredients to the Analysis • ROC Curves • Bayesian Analysis • Results • Discussion

Introduction • Prostate cancer is the most common non-skin cancer in America • The risk of being diagnosed with prostate cancer increase with age. While 1 in 10,000 men under 40 years of age are diagnosed, the rate increases to 1 in 15 in men over 60. Normal Prostate Prostate Cancer

Introduction • America’s population is aging • In 2000, persons over the age of 65 made up 12.4% of the country’s population. The proportion is expected to increase. That means an increasing proportion of the population is at risk for prostate cancer. • Accurate testing could help thousands of patients.

Three Ingredients to the Analysis • Test - A simple, non-invasive procedure • Blood test to measure Prostate-Specific Antigen (PSA) levels. • Gold standard - Can be a complex, expensive, invasive • Determines the presence or absence of prostate cancer. • Our gold standard is a biopsy • Covariate - Additional information • May help us increase the accuracy of our prediction. • Our covariate is patient age.

There are two main questions: • How good is the PSA test alone at predicting the presence of prostate cancer? • How good is the PSA test at predicting prostate cancer when combined with information about a patient’s age.

ROC Curves • Receiver Operating Characteristic (ROC) curves plots ‘True Positive Rate’ versus ‘False Positive Rate’ • True Positive Rate: probability that Positive test result is correct • False Positive Rate: probability that Positive test result is incorrect

ROC Curves • The area under the ROC curve (AUC) serves as a measure of overall accuracy. • If the AUC equals… • 1, the test is perfect every time • 0.5, it is as good as a coin flip • 0, the diagnosis is the opposite of the test result. • We expect values between 0.5 and 1. ROC curve and its AUC

Bayesian Analysis: Three Key Parts The Prior Distribution • Knowledge about the target parameter before looking at the data. • One of two general types: • Informative prior: Holds information we know or suspect to be true, based on previous experiments or expert knowledge. • Noninformativeprior: Holds very little information. Best when we do not have previous experience or expert knowledge.

Bayesian Analysis: Three Key Parts • The Data • The data used in Bayesian analysis is represented with a likelihood. • The likelihood is combined with the prior distribution.

Bayesian Analysis: Three Key Parts • The Posterior Distribution • After combining the prior information with the data, represented by the likelihood, the result is the posterior distribution. • The posterior represents a natural updating of the prior knowledge based on the information from the data.

Bayesian vs. Frequentist Analysis • Bayesian analysis allows us to update our initial distribution assumption to account for the observed data. • The analysis also provides us the flexibility of directly analyzing the entire posterior distribution of the target parameter. • Thus, we can look at the mean, median, and other statistics that are useful to the questions we want to answer.

Bayesian vs. Frequentist Analysis One key difference between Frequentist and Bayesian methods is how population parameters are treated. Frequentist Approach Bayesian Approach considers population parameters as fixed, unknown constants. It is assumed that all the randomness lies in the data. takes the opposite view. The parameters are considered random variables which have their own distribution of possible values. The data is the known information. All resulting error comes from the distribution parameters.

Bayesian vs. Frequentist Analysis • When is Bayesian analysis more appropriate? • When a parameter of a distribution is itself a random variable, or when expert knowledge is available • Works well with small data sets, where maximum likelihood methods are not appropriate

Bayesian vs. Frequentist Analysis Advantages Disadvantages • Likely to have an idea about the prior distribution. • If expert knowledge is not available, we may use a non-informative prior. • Flexibility in choosing prior distribution • Works well with small data sets, where maximum likelihood methods are not appropriate • The Two Bayesian Problem: different priors produce different posteriors. • Addressed by noninformative prior.

Forming an ROC Curve • A parametric approach, using binormal distribution • The curve is a function of parameters aand b. • The values of a and b are also functions of two parameters, Beta and Variance. • These values are determined by a prior assumption and information from the data. The trickle-down effect

Forming an ROC Curve • In other words, there is a trickle-down effect from the data and first assumptions to the ROC curve. • To calculate this trickle-down effect, we use Bayesian statistical analysis. The trickle-down effect

Results • Our analysis starts with two noninformative priors • The majority of the information in the posterior will come from the data. • The data is organized into three groups: • All patients • Younger patients only • Older patients only

Results • Each group undergoes two analyses: • one uses only test data to predict the diagnosis, • one uses test and age data combined. • Each analysis produces an ROC curve. All together, six curves are calculated.

Discussion • Each ROC curve had a section which fell below the 50% accuracy line. • There was very little difference between pairs of ROC curves for a single age group • Differences in curves between older and younger patients were more pronounced. • This makes sense. Since younger patients have naturally low PSA levels, it is easier to detect high levels, and elevated levels are more likely due to cancer.

Discussion • Our progress so far has brought more questions. Among these include: • How would we use the Bayesian technique with an incomplete dataset (i.e. missing information for certain patients)? • Why does each ROC curve dip below the 50% accuracy line? • What effects do other covariates, such as ethnic background, have on the ROC curve?

Chantal D. Larose

Chantal D. Larose

Presentation Transcript

Chantal Lima & Carolina Cardenas

Chantal Mouffe On the Political

By CHANTAL Williams

Chantal Heath

Media Now: The Changing Media ( Straubhaar & LaRose )

Discovering Knowledge in Data Daniel T. Larose, Ph.D.

Chantal BAESKENS HSEQ Manager Gé Simons BV

Information about Chantal Martin Life.

Chantal Milot

Chantal R. Thorn, PhD Consulting chantal@chantalthorn

Chantal D. Larose

Chantal D. Larose

Presentation Transcript

Chantal Lima &amp; Carolina Cardenas

Chantal Mouffe On the Political

By CHANTAL Williams

Chantal Heath

Media Now: The Changing Media ( Straubhaar &amp; LaRose )

Discovering Knowledge in Data Daniel T. Larose, Ph.D.

Chantal BAESKENS HSEQ Manager Gé Simons BV

Information about Chantal Martin Life.

Chantal Milot

Chantal R. Thorn, PhD Consulting chantal@chantalthorn

Chantal Lima & Carolina Cardenas

Media Now: The Changing Media ( Straubhaar & LaRose )