Binary data

Outcome data from a cancer clinical trial will often be in the form of categories. For example, at a given time of assessment, in an individual patient we may observe a complete response, partial response, stable disease or progressive disease. In analysing these data the questions that are of central interest are, is there a difference between the experimental and control treatments, and what is the magnitude of any difference? These questions are first addressed for the special case when there are only two categories -called binary data. Suppose we have conducted a clinical trial in which patients have received either an experimental treatment or a control treatment and the response to each of the treatments has been reported in the form of a yes or no result for each patient. Data from such a trial could be reported as in Table 9.2.

In the table, of the b + d patients receiving the experimental treatment b have responded, and of the a + c patients receiving the control treatment a have responded. We would like to test the evidence against the null hypothesis that there is no difference between the experimental treatment and the control treatment. We can address this problem in one of two ways. Here we present the most easily accessible, which displays the link between hypothesis testing and estimation.

The proportion of responders on the experimental treatment is pexperimental = b/(b + d), while the proportion of responders on the control treatment is pcontrol = a/(a + c). The observed difference between experimental and control is therefore: observed difference = pexperimental — pcontrol and this observed difference is the best estimate of the difference between the experimental and control treatments. If the numbers of patients in each group is not too small, then the standard error (SE) for this observed difference is given by:

Table 9.2 A table displaying whether patients have responded or not to the experimental and control treatments in a randomized trial


Control treatment

Experimental treatment


Total a+c b + d a + b + c + d and a 95 per cent confidence interval around this estimate is given by p experimental pcontrol 1«96 X SE(pexperimental pcontrol) to

^experimental pcontrol + 1«96 X SE(pexperimental pcontrol)* (9*4)

The value 1.96 comes from considering the standard normal distribution with mean zero and variance 1(see Section 5.4.1). Providing the trial is not too small, under the null hypothesis this is the relevant distribution for the statistics we calculate here. 2.5 per cent of this distribution lies above +1.96 and 2.5 per cent lies below -1.96; combining these probabilities gives a total of 5 per cent. Thus 95 per cent of the probability lies between -1.96 and +1.96, hence the 95 per cent confidence interval. To obtain a different width of confidence interval a different multiplier from 1.96 is used. For example to obtain a 99 per cent confidence interval we use 2.58 because 99 per cent of the standard normal distribution is contained within (-2.58 to 2.58), and to obtain a 90 per cent confidence interval we use the value 1.68.

In general, a 100(1 — a)per cent confidence interval is given by:

^experimental pcontrol — Z1-a/2 X SE(pexperimental pcontrol) to

^experimental pcontrol + Z1-a/2 X SE(pexperimental pcontrol). (9.5)

To help understand this, if we are calculating the 95 per cent confidence interval, then a = 1 — 0.95 = 0.05, and (1 — a/2) = 0.975. Thus, looking at the normal distribution at gives Z\-a/2 = 1.96.

To perform a test of the hypothesis that the underlying proportion of responders in the two groups is the same, we construct a similar framework, with interest focusing on pexperimental — pcontrol. However, we have an added consideration in this framework -the basic assumption is that in truth pexperimental and pcontrol should be the same, that is that the proportion of responders is the same on the experimental and the control treatments. The best single estimate of this proportion, p, is calculated by considering the two groups as one. In this instance a + b

It should be noted that this SE is not quite the same as the standard error used above to calculate the confidence intervals, because it is calculated on the basis that the null hypothesis, that these two proportions are the same is true. In contrast, the standard error when calculating confidence intervals is calculated on the basis of the observed differences in proportions.

The Z-statistic to test the null hypothesis is given by

Z pexperimental pcontrol (9 8)

To assess the significance of this result the equivalent p-value from this Z-value can be read from tables of the standard Normal distribution. As indicated above, such a table can be found at This website allows the user to enter the observed Z-statistic and produces an exact P-value. An example of the analysis of this type of data is presented below.

Example. In the randomized trial, CR06 [2], comparing three chemotherapy regimens for patients with advanced colorectal cancer where clinical response of the disease was a secondary outcome measure, the following results were seen for two of the regimens twelve weeks from randomization (Table 9.3). It should be noted that although response was classified in four categories we have collapsed them into two for the purposes of this example. It should also be noted that a number of patients on each arm died before twelve weeks, these have been included as non-responders in Table 9.3.

Table 9.3 Response by treatment in the CR06 randomized trial


Control treatment

Experimental treatment


(de Gramont)









Total 252 250 502

Following the methods outlined above,

Pexperimental = praltitrexed = 46/(46 + 204) = 0.184, Pcontrol = Pde Gramont = 59/(59 + 193) = 0.234.


Pexperimental - Pcontrol = 0.184 - 0.234 = -0.050. To calculate confidence intervals we have to calculate

SE(_pexperimental - _Pcontrol) = y -250--'--252- = °.°362

A 95 per cent confidence interval for the estimate -0.05 is therefore given by -0.05 -1.96 x 0.0362 to -0.05 + 1.96 x 0.0362, which gives -0.121 to 0.021.

To assess whether there is evidence of a statistically significant difference between these two treatments we need to calculate

F 502

r V 250 252

It is interesting to note that this SE is only very slightly different from the SE calculated above for calculating confidence intervals. The Z-statistic is given by, Z = -0.05/0.0363 = -1.377. The corresponding P-value for this (which can be obtained is 0.169. Note that this is a 2-sided p-value (see below), in that we are considering differences in both directions. This result suggests that, from these binary data, there is no good evidence that the response rates are different in the de Gramont and raltitrexed groups. This conclusion is supported by the calculation of the 95 per cent confidence interval which includes the value zero.

Estimating and testing a single proportion

Together with estimating differences across groups we maybe interested in estimating the proportion in just one group, say the experimental group. This can be done quite simply by setting c, d, and pcontrol all as zero in the above formulations. In this way estimates and confidence intervals can be obtained. We can also assess whether the pexperimental differs from a prespecified proportion of interest.

Example. In the CR06 colorectal cancer trial we maybe interested in just the experimental arm (raltitrexed) and to assess whether there is good evidence that the response rate for this arm is more than 10 per cent. We note therefore that pexperimental = 0.184 and

Thus, the Z-statistic for testing whether the observed value is greater than 0.10 is

Consulting the 1-sided normal distribution, the corresponding p-value to this Z-statistic is 0.0003. (The reason why we are peforming a 1-sided test is because we are only interested in whether the estimate is greater than 10 per cent). This result shows that there is good evidence that the response rate in the raltitrexed arm is greater than 10 per cent. If we are also interested in the evidence that this response rate is greater than 15 per cent, we replace 0.10 by 0.15 in the above equation to give Z = (0.184 — 0.15)/0.0245 = 1.39. The corresponding p-value for this is 0.082, suggesting that we do not have good evidence that the response rate is greater than 15 per cent.

Continuity correction for the Z-statistic

When doing tests and calculating p-values, the methods described above use the fact that when the size of samples is reasonably large the Normal distribution is a good approximation ofthe Binomial distribution, which is the appropriate distribution when we have binary data. However, it should be noted that the number of responders can only take integer values whereas a normally distributed variable can take any value. For example, the number of responders on the experimental treatment can only take values 1,2,3, etc. To ensure a better correspondence between the Normal distribution and the Binomial distribution a value of 1/2 is subtracted from each observed frequency. Thus, given by:

we should, for example, take away 0.5 from the values a, b, c and d. It is advisable to use the continuity correction routinely when comparing two groups or when comparing a single proportion against a prespecified value (such as in Table 9.2). If it is not used, the Z-statistic tends to produce too large a value, and hence p-values are too small. The difference will not be large when the size of the study being analysed is reasonably large. However, it can have a considerable effect when the study size is small. In particular circumstances when one of the values of a, b, c or d is particularly small (less than five say) an alternative approach, called Fisher's exact test, is more appropriate (see below). It should be noted we do not consider adjustments for confidence intervals in the same way as for the test statistic, because we are not calculating probabilities from the tail area of a distribution.

Chi-square (x2) test

To test the null hypothesis of no difference between the true proportions responding to the experimental and new treatments, many analysts perform a chi-square test. This is done by comparing the observed value in each cell, a, b, c and d with its expected value ea, eb, ec, and ed respectively under the null hypothesis and taking the sum of the squares between each respective pair. The calculation of the relevant statistic is:

The expected values ea, eb, ec, and ed are calculated on the basis that if the null hypothesis is true and there is really no relationship between response and treatment then the number in each cell of the table is just given by a simple multiplication of the row and column proportions in which that cell falls, the proportion of patients who are in that treatment group multiplied by the total number of responders in the trial as a whole. The X2 statistic obtained through this calculation is compared against a chi-squared distribution with one degree of freedom. The reason there is one degree of freedom is because once all the row and column totals (that is a + b, a + c, b + c, b + d) have been specified we only have to specify one of the cells, such as a, to calculate all the remaining cells b, c and d. In general, for a table with r rows and c columns the chi square test has (r - 1)(c - 1) degrees of freedom. In our case we have two rows and two columns and thus (2 - 1)(2 - 1) = 1 degree of freedom.

It can be shown that in most circumstances the X2 statistic is equivalent to the square of the Z-statistic, presented above. Thus the two methods will give the same results.

However, the chi-square test provides just a test of the null hypothesis and unlike the methods presented above does not directly provide estimates and confidence intervals. Thus it is generally not the preferred approach for binary data. The chi-square test is not a good approximation in certain circumstances, and a rule of thumb is that none of the expected values should be less than five. If this is the case, then an alternative method of testing the null hypothesis, called Fisher's exact test, should be performed. Finally, as for the test of independent proportions, a continuity correction is needed for small sample sizes.

Fisher's exact test for two proportions

As mentioned above, the chi-square test for two proportions requires that all the 'expected' values are greater than five. Because of the direct correspondence between the chi-square test and the Z-test of two proportions, this condition holds for the detailed approach presented above as well. The reason for this is that when we are dealing with data as presented in Table 9.2, the data are essentially discrete and we are using continuous distributions to assess the Z and chi-square statistics. In these circumstances, Fisher's exact test is recommended. The reader is referred to specialist statistics texts [1] for a description of this test.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook

Post a comment