Chi-Square Analysis

Chi-Square Analysis Test of Independence

We will now apply the principles of Chi-Square analysis to determine if two variables are independent of one another. We will use as an example a study at the University of Texas Southwestern Medical Center. They examined the incidence of hepatitis C and the occurrence of tattoos on the patients. Patients were selected from those seeking medical attention for unrelated disorders. In the US each year about 10,000 people die from hepatitis C, a viral infection of the liver, but it can be years after infection before the patient develops symptoms. We will see how analysis can help us to evaluate this situation. To learn more check this URL: http://www.sciencedaily.com/releases/2001/04/010405081407.htm

The data is presented below. Patients were given a blood test for hepatitis C and those with tattoos were asked whether they got the tattoo in a tattoo parlor.

Recall from our work much earlier in the year, that when data are presented in tables like this, we can easily compare the proportions of individuals in each category. Here, for instance, we might think that if the chance of having hepatitis C is independent of tattoo status, then a person’s risk (probability) of having hepatitis C is the same regardless of whether they have a tattoo. The same probability should apply to each category. We will perform a Chi-square test for independence. As with other statistical inference, we begin with a null hypothesis. In the tests for independence, this hypothesis will always be a statement that our two variables are independent. Our alternate hypothesis will always be a statement that the two variables are not independent. We must clearly state what the variables are.

Step 2: Step 1: H0: The tattoo status and hepatitis C status are independent. Ha: The tattoo status and hepatitis C status are not independent. Assumptions: Our data are counts. With a test for independence we need a representative sample if we are to apply our findings to a larger population. While these patients were not an SRS they were selected to avoid bias and should represent the general population.

We have the same criteria for the expected counts as we had for the goodness-of-fit test: 1. All expected counts must be one or more. 2. No more than 20% of the counts may be less than 5. The calculation of expected counts gets quickly complicated when there are several categories. We will use the graphing calculator to help us! We will use Matrix A to hold our data.

On the TI-83 graphing calculator press <MATRIX> and on the TI-83 Plus press <2nd> <MATRIX>. Now the instructions are the same: Select <EDIT> <1:[A]>. Your display may look different depending on whether you have old matrices stored, as I do. Change Matrix A dimensions to be 3 X 2. Ignore the values in the matrix, they are old data that will be replaced when new data is entered.. Now enter the data.

Our easiest method of finding all of the expected values is to run the test on the calculator and use the values it calculates and stores in Matrix B. With our data in [A] we now press <STAT> <TESTS> <C: -Test> <Calculate> <ENTER>. We’ll save this information for later. Now we view [B]. Press <MATRIX> <EDIT> <2:[B]>, and view the matrix of expected counts.

As we check the expected counts, we see that 2 out of 6 are less than 5. This is an assumption violation, and a serious one, as well. Don’t throw in the towel, though, at least not yet. Notice that the expected counts are not whole numbers. That is typical, and don’t be tempted to round them to whole numbers. If we look at our original categories, we may find a way.

Our totals are not very large for either category of tattoos. If we combine the two, we can increase our expected counts in the combined category. In doing so, we lose some ability to identify the source of the hepatitis C, should there be a connection between tattoos and the hepatitis.

Now we need to adjust our [A] and find a new [B] to check the expected counts.

With new data in [A], run the test again, and examine [B]. This time all expected counts are greater than 5, so we meet the assumption and can continue.

Step 3: Degrees of freedom are the number of rows minus 1 times the number of columns minus 1.

Step 4: Step 5: Notice that the distribution is shaped very differently with only 1 degree of freedom. (It is similarly shaped with 2 degrees of freedom and then changes completely with 3 df.)

Step 7: Step 6: Reject H0, a test statistic this extreme will rarely occur by chance alone. We have strong evidence that tattoos and the occurrence of hepatitis C are not independent. Further if we examine the actual contributions from each cell, we may be able to see the reason for our positive results.

There is no easy way with the graphing calculator to generate a long list of contributions when we have large tables of data. With a small table, such as ours, the task is not difficult. Enter the observed and expected counts in L1 and L3, respectively. Then in L3 calculate (L1-L3)2/L3, as we did in the goodness-of-fit test. We see that the largest contributor to the test statistic comes from those with tattoos and hepatitis C. We expected 8 and found 25. Our next largest contributor is those with hepatitis C and no tattoos. We expected 38, but found only 22.

This study gives strong evidence that tattoos and hepatitis C are in some way related. Does this mean that getting tattoos causes hepatitis C? Not necessarily. What this study shows is the relationship between hepatitis C and tattoos. The scientists conducting this study eliminated many possible lurking variables and concluded that more than IV drug use, getting tattoos exposed one to a great risk of hepatitis C.

Some further information about the epidemic of hepatitis C that has been tied to tattoo parlors is that the virus is spread by (1) needles not sterilized between tattoo customers, (2) containers of ink that became contaminated and then used on more customers, and (3) the practice of some tattoo artists to use the needles to prick the backs of their own hands to check for sharpness.

THE END

Chi-Square Analysis