A series of basic statistics by Tom Lang

5. Tests and Measures of Association

Introduction

Two variables are said to be related when a change in one is accompanied by a change in the other. When the variables are nominal (and sometimes ordinal) they are said to be "associated." When the variables are continuous (and sometimes ordinal) they are said to be "correlated." Here, we will cover association. Correlation is covered in another chapter.

Tests of Association

In Table 1, it is easy to see that variable A is strongly associated with variable B: every time A appears, so does variable B, and every time A is absent, so is B. Variable A is less strongly associated with Variables C and D, because the variables do not always occur together. Finally, Variable A is also strongly, but inversely, associated with Variable E: every time A appears, B does not, and every time A does not appear, B does.
Table 1 The Concept of Association. Variable A is strongly associated with variable B, inversely associated with Variable E, and most weakly associated with Variable D.
Association can be assessed with several tests, which are usually variations of the χ2 (chi-square) test. Here, I discuss three tests of association: the χ2 test of association or independence, the χ2 test of goodness-of-fit, and Fisher's exact test. I also mention the χ2 test for differences, which is not a test of association but rather of differences and that is often confused with a test of association. The χ2 Test of Association or Independence
The χ2 test of association or independence assesses the association among nominal (or sometimes ordinal) variables. The terms "association" and "independence" refer to "different sides of the same coin." Variables that are related are said to be associated, and variables that are not related are said to be independent.
For example, suppose we want to know if serum calcium concentrations (low vs. normal or high) are associated with osteoporosis (present or absent). This question is illustrated by the four cells in the data field of Table 2. If calcium concentrations and osteoporosis were perfectly associated, of 100 women, all would be represented in either the upper-left cell or the lower right-hand cell. On the other hand, if calcium concentrations and osteoporosis were perfectly independent, of 100 women, we would expect to see about 25 represented in each of the four cells. That is, if the association was no better than chance, the combinations of our two variables would be more or less evenly distributed over the four cells.
Table 2 The Percentage of Women Represented in Each Cell That Would be Expected by Chance if Osteoporosis is not Associated with Calcium Concentrations
The χ2 test of association or independence, then, compares the "mix of the proportions" we found in our data with the probabilities of getting that mix of proportions by chance. The test results in a P value indicating the probability that we would get the mix we found in our data by chance. In the above example, if all our data were in the upper-right and lower-left cells, the P value would be very low, whereas if the data were more-or-less equally distributed among the four cells, the P value would very high, indicating that chance is a much more likely explanation for the mix we got in our data.
Another example of the same principle is shown in Table 3. Here, the table has six cells. Again, if the mix of proportions in our data were the result of chance, we would expect each cell to represent about 16% of our sample. If the mix in our data was about equally distributed across all the cells, the P value would be large, and the data would be called independent.
Table 3 Table 3. The χ2 Test of Independence Compares the "Mix of Frequencies" of the Sample to That Expected by Chance.
Association, then, is usually determined only by the P value: a statistically significant P value indicates that the variables are associated, whereas a non-significant P value indicates that the variables are independent. There are measures of association, however, such as the phi (fee) coefficient, as described below.
The χ2 Test of Goodness-of-Fit

The χ2 test of goodness-of-fit is the same as the χ2 test of association or independence, with one difference. Instead of comparing the mix of proportions we got from our data to chance, it compares it to a known mix of proportions.
For example, suppose we are testing the hypothesis that handedness (whether someone is right- or left-handed) is related to some measure of skill, like throwing a ball. Before we test that hypothesis, however, we want to see if the handedness in our sample is representative of the general population. We know that about 80% of people are right-handed and about 20% are left- handed. If we assume that the proportion of handedness is equal between men and women, the mix of proportions would be like that in Table 4. (The sums of percentages at the end of the rows and at the bottom of the columns are called "marginals" because they are given on the right and bottom "margins" or edges of the table).

Table 4 The χ2 Test of Goodness of Fit Compares the "Mix of Frequencies" of the Sample to a Known Mix of Frequencies.
The χ2 test of goodness-of-fit would than assess the probability that the mix of proportions in our data would differ from the mix of proportions of handedness in the population.
Fisher's Exact Test
Fisher's exact test is a χ2 test that calculates an "exact" P value rather than an approximate one, as in other forms of the χ2 test. The difference is sometimes important because the χ2 test gives P values that are considerably lower than those of the more accurate Fisher's exact test when analyzing the same data. Fisher's exact test is often used for small samples, although it can be used for samples of any size.
The χ2 Test of Differences
Despite having a name similar to the tests of association described above, the χ2 test of differences assess the probability that the sizes of two groups differ from chance. For example, we might want to compare the proportion of patients in the treatment group who were disease-free (say, 21%) with the proportion of patients in the control who were disease-free (say, 44%). Again, the test would result in a P value indicating the probability that the 23% difference in proportions would have occurred by chance. If the probability is low, say, less than 5 times in 100 (that is, <0.05), we would probably conclude that the groups were, in fact, different at the end of the study. We would then probably assume that the difference was the result of the treatment.

Measures of Association

Risk, Odds, and Hazards RatiosAssociation is usually reported as present or absent, solely on the basis of the P value. However, there are measures of association that indicate the strength of the association. For example, the phi (φ, pronounced "fee") coefficient is a measure of association that ranges from -1 to +1, where +1 is a perfect (strong) association, zero is no association, and -1 is a perfect inverse association.
Ratios are also measures of association that are used to report risk. The most common in medicine are probably the odds, risk, and hazards ratios. In these three ratios, a value of 1 means that the risk in one group is the same as the risk in the other group. A number larger than one indicates that the group in the numerator is at greater risk, and a number smaller than 1 indicates that the group in the denominator is a greater risk.
Risk is simply the frequency with which something occurs. If 3 people of every hundred in a town fall off a bicycle each year, the risk of having a bicycle accident is 3%. If 2 of 100 people trip and fall while walking, the risk of falling is 2%. The risk ratio is just the ratio of the two risks: in this case, the risk of falling off a bicycle divided by the risk of falling while walking is 3/2, or 1.5, meaning that the risk of falling off a bicycle is 1.5 times as great as the risk of falling while walking.
A hazard ratio is interpreted the same as a risk ratio. The difference is that a hazard is a measure of risk over time. More precisely, it is the probability that if an event has not occurred in one period, it will occur in the next.
Hazards ratios are found in time-to-event studies with binary outcomes (lived or died; cured or not) and are the output of Cox proportional hazards regression, which is used in "time-to-event" or "time-to-failure" analysis. Importantly, the "time-to-the-event" from a given starting point is the outcome, not the event itself. For example, the time between hospitalization and death is what we are interested in, not in the death itself. However, Cox regression analysis is also commonly used to identify factors associated with death.
Odds ratios are interpreted the same way risk ratios are, but the two are different. The risk (probability) of drawing a heart from a deck of cards is 13/52 = 1/4 = 25%. The odds, however, is the probability of drawing a heart divided by the probability of not drawing a heart: 13/39 = 1/3 = 33%.
The Odds ratio is the odds for one group divided by that for another. Table 5 shows the calculations for the odds of having a heart attack among smokers and among nonsmokers, as well as the odds ratio that combines both odds into a single number. The risk of smokers having a heart attack is calculated as the number of smokers having heart attacks divided by the total number of smokers: 14/36 = 0.39. The odds of smokers having heart attacks is the number of smokers with heart attacks divided by the number of smokers who did not have heart attacks: 14/22 = 0.636. The odds of nonsmokers having heart attacks is: 5/33, or 0.152. The odds ratio is: 0.636/0.152 = 4.2, which means that the odds of smokers having a heart attack are 4.2 times as high as that of nonsmokers.
Table 5 Calculating Odds and Odds Ratios
Odds of heart attack among smokers:14/22=0.636
Odds of heart attack among non smokers:5/33=0.152
The odds ratio:0.636/0.152=4.2
The odds of smokers having a heart attack are 4.2 times as high as that of non-smokers.
As with risk and hazards ratios, when the odds ratio is 1, the odds are equal. An odds ratio greater than 1 indicates a harmful effect, and one less than 1, a protective effect.
Odds ratios are hard to understand, but they are the output of logistic regression analyses, which is a particularly useful statistical method.The Kappa (κ) StatisticAnother common measure of association is the kappa (κ) statistic, which assesses "agreement" among raters for multiple observations of the same subject. The kappa statistic is often used in evaluating diagnostic tests. It ranges from –1 to +1, where +1 indicates complete agreement, and -1 indicates complete disagreement.

Bibliography

Lang TA, Secic M. How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers. Philadelphia: American College of Physicians, 1997. Reprinted in English for distribution within China, 1998. Chinese translation, 2001. Second edition, 2006. Japanese translation, 2011; Russian translation, 2013.