170 likes | 181 Views
Analyzing Cancer Data. What type of data—counts or rates? Before you analyze, look at the data. If appropriate, check the data for linear trends . Do the data increase or decrease over time? Is a given rate significantly different from another rate? (For future consideration.).
E N D
Analyzing Cancer Data • What type of data—counts or rates? • Before you analyze, look at the data. • If appropriate, check the data for linear trends. Do the data increase or decrease over time? • Is a given rate significantly different from another rate? (For future consideration.)
1. What type of data (counts or rates)? • Which is appropriate for what and why? • Counts for resource capacity planning; • Rates for comparisons—over time or with other jurisdictions or between subgroups (e.g. race, ethnicity, gender, age).
2. Look at the data. Do you see a notable pattern?Example: Not obvious? Look at a picture.
What type of pattern? • Linear? • Unimodal? • Bimodal? • Random (no pattern)?
Is there really a significant linear trend in rates? Find out in Excel. Note that the data must be entered in columns, i.e. with the years in one column and the rates in another column, for Excel’s analysis tools to work.
Is the Analysis ToolPak installed in Excel? • From the menu bar, go to Tools | Add-Ins… . • Is the Analysis ToolPak listed? • If not, use your original Microsoft Office installation CD-ROMs to do a complete install of Excel, and then come back to this step. Otherwise, verify that the checkbox next to Analysis ToolPak is checked. • If it is not checked, check it and click Load. You may still need the installation CD-ROMs at this point; if so, follow the directions on the screen to install the Analysis ToolPak.
Go to Tools | Data Analysis…. • You will get this dialog box: Select Regression and click OK.
The Regression dialog box appears: The cells with the rates go here. The cells with the list of years (1996 – 2000) go here. Put the output here (upper left corner cell of output).
Excel computes the regression output: The smaller the number here, the more significant the trend (i.e. the less likely that the observed trend arose from mere chance variations in the data). A typical cutoff is 5% (0.05). This example shows a highly significant trend since this value is less than 0.01. (Next two slides show enlargements of key statistics.) Negative means a decreasing trend; positive means an increasing trend.
Detail of Regression Coefficient This number gives the amount by which the cervical cancer rate among white females in New Jersey changes each year. In other words, for each year from 1996 to 2000, the rate dropped by about half a person per 100,000.
Beware of Extrapolation! Based on our regression, if the cervical cancer rate per 100,000 white females goes down by about 0.5 per year, and it was about 10 per 100,000 in 2000, what would it be in 2020? Zero How about in 2025? Minus five (–5) Moral: Regression estimates are useful only within or very near the range in which the regression was estimated. In this case, the range was 1996 – 2000. Extrapolation outside of this range is not likely to give meaningful results.
If you cannot install the Analysis ToolPak, you can use the built-in Excel function LINEST, but it is much less friendly: • Select a 5 x 2 block of cells, starting at the upper left cell in the box, for instance A7:B11, as shown. • Type in the formula =LINEST(B1:B5,A1:A5,,TRUE) (assuming the data are in the upper left corner of the worksheet, as • shown). Here B1:B5 are the cells with the rates, and A1:A5 are the cells with the years. • End the formula with Ctrl-Shift-Enter (instead of the usual Enter). • The output will appear in the 5 x 2 block of cells:
Unlike Regression, LINEST does not automatically compute significance information. You can get that information from the numbers in cells A10 (45.5625) and B10 (3). Choose an empty cell—say C10—and type the formula =FDIST(A10,1,B10). The relevant significance will appear in cell C10. (Not shown here.) Negative means a decreasing trend; positive means an increasing trend.
Is a given rate significantly different from another rate? This is a complex area on which we will give you some guidance at a future time. There are, however, things you can do now that do not involve statistical tests….
If the county rates appear to be unreasonably different from state rates, check carefully for errors in your county’s data, but… • bear in mind that if your county has a small relevant population (for instance, there are very few Hispanic males), the rates are more likely to differ widely from state rates because each case has such a large impact. Look at the counts, too! • You can use the graphs to see if there are obvious differences between your county’s rate patterns and the statewide rate patterns; for instance, if a cancer rate is increasing for the state but decreasing in your county or vice versa.