More than two categories

In many cancer clinical trials, the categories of interest will have a natural ordering. For example, the assessment of response may fall into one of the following four categories: complete response, partial response, stable disease and progressive disease. These four categories have a natural ordering with complete response the 'best' result and progressive disease the 'worst.' Two commonly used approaches for analysing such data are the chi-square test for trend and the Mann-Whitney test. We first illustrate the chi-square test for trend by an example.

Example showing the chi-square test for trend

In the CR06 colorectal cancer trial introduced above, patients were assessed for response at twelve weeks and there were actually five categories as shown in Table 9.4.

 Response Control treatment (de Gramont) Experimental treatment (raltitrexed) Total Complete response 4 3 7 Partial response 55 43 97 Stable disease 94 86 180 Progressive disease 60 68 128 Dead at 12 weeks 39 50 89 Total 252 250 502

There is clearly a set of four ordered categories of response here from complete response through to progressive disease. It is not clear how best to take into account those patients who have died by twelve weeks - the time of assessing response. We could include them as a separate category, or we could include them in the category of progressive disease, or we could exclude them from this analysis altogether. We discuss this problem further below in Section 9.5.2. For the purposes of the example only, we shall include these patients as a separate ordered category, 'worse than' progressive disease.

To analyse these data we first need to assign scores to each of the groups and a simple scoring may be one for 'complete response,' two for 'partial response,' three for 'stable disease,' four for 'progressive disease' and five for 'dead at twelve weeks.' This scoring, although arbitrary, shows the order of the categories; and the question we wish to address is: do patients receiving the experimental treatment generally have higher (or lower) scores than patients receiving the control treatment? A higher score would mean a generally poorer response, while a lower score would mean a generally better response. Unfortunately, the simplest way of calculating the chi-square test for trend does not show the derivation and nature of the method. To help the calculations we have to define some quantities. We focus on the raltitrexed group and define the five 'response' groups by their score 1-5; the number of patients in each of these response groups is ^, thus T1 is 3; the total across both the raltitrexed and deGramont groups will be referred to as n, thus n1 is 7. Further, we let xi be the score allocated to each group with, for example, for the complete response group, x1 = 1. We then calculate the following quantities:

N =^2 Ui = Ui + U2 + U3 + U4 + U5 R = ^ Ti, i=1 i=1 5

The test statistic Xt2end is given by the following equation:

trend

Xrend =

The statistic X2end is then compared against the chi-square distribution with one degree of freedom. The reason for one degree of freedom is because in essence what we are doing is fitting a line to the responses in both experimental and treatment groups and then assessing whether the slope of this line is different across the two groups. Thus we are considering one variable. This example is worked through in more detail in Section 9.4.5.

The Mann-Whitney test

An alternative approach to analysing these data is to use the Mann-Whitney test. This is a more complicated test, which is based on ranks. The Mann-Whitney test is one of the 'non-parametric' tests, so called because it does not assume that the data come from

any particular known distribution (such as the Normal distribution). In this test we first consider the two groups (raltitrexed and de Gramont) as one, that is a single sample. We then rank the patients in terms of response, assigning the value one to a complete response, two to a partial response, three to stable disease, etc Then the sum of the ranks for the raltitrexed group is compared to the sum of the ranks of the de Gramont group. If there are generally better responses in the raltitrexed group, then the sum of the ranks for the raltitrexed group should be lower than the sum of the ranks for the deGramont group. Statistics calculated from the sums of the ranks in the groups allows us to perform a significance test on the null hypothesis that there is no difference in response to these two treatments. For more details the reader is referred to Altman  (p. 194) or Bland  (p. 223). The Mann-Whitney test is also often referred to as the Wilcoxon two-sample test - although they have different derivations, they are in fact the same test. The Mann-Whitney test and the chi-square test for trend will generally produce similar results. As a rule of thumb, the chi-square test for trend is preferred if the number of categories is small (say five or fewer), while the Mann-Whitney test is preferred when there is a larger number of categories (six or more). When using the Mann-Whitney test some care must be taken if there are a large number of ties of ranks. This becomes more likely as the number of categories decreases. In this situation the basic Mann-Whitney test needs to be adapted. This is complicated but possible. It should, however, be emphasized that not all computer packages allow for this situation and thus may give incorrect results, the reader should therefore confirm that the package they are using allows for ties.

Estimation when there are more than two categories

There are no natural, easily accessible, estimates when there are more than two categories. Thus, in such situations it is common to use slightly different approaches for hypothesis testing (for example the chi-square test for trend) and estimation. To obtain estimates, it is common to collapse categories into a binary yes/no response; for example, the categories of complete and partial response may be collapsed into one single 'response' category, while stable disease and progressive disease maybe collapsed into a 'no response' category. Then methods described above for estimating a single proportion for binary data to obtain estimates and confidence intervals, should be used. It is important to note that for the testing of the hypothesis, all the original categories should be retained as this maximizes the power of the test and retains maximum information in this analysis. 10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook