Interim analyses

Nearly all randomized clinical trials in cancer will accrue patients over a period of many months and often years. During this period of accrual, events on the planned outcome measures will also be observed. During the period of accrual there is both an ethical and practical obligation to monitor the accumulating data from the trial to ensure that there is no large difference in the primary outcome measure between the arms. The ethical obligation is the need to minimize the number of patients receiving a clearly inferior treatment, while the practical obligation is to conclude the trial as soon as possible. Thus monitoring of the primary outcome measures is done in most trials and specific statistical analysis procedures have been developed to aid in such monitoring.

Example. In a randomized trial comparing CHART radiotherapy with conventional radiotherapy for patients with non-small cell lung cancer, patients were accrued from 1991 to 1995. During this time, information on toxicity, relapses and deaths was becoming available [34]. In particular, the number of patients entered and number of deaths at approximately annual intervals was as shown in Table 9.12. The data were analysed approximately annually in order to assess whether it was appropriate to continue the trial.

The fundamental problem with regularly and continually analysing the emerging data from a trial is that we are performing more than one statistical test. For example, if we perform two statistical tests on two independent outcome measures, that in truth are no different in the groups being compared, the probability that one of these tests is incorrectly significant at the 0.05 level is:

Thus the probability that one of these two tests is positive is 9.75 per cent, rather than 5 per cent if we were performing just one test. This problem increases as we increase the number of tests we do as seen in Table 9.13.

It can be seen from the table that the probability of incorrectly claiming significance, when in truth there is no difference, increases quickly as the number of tests increase.

Table 9.12 Data from the CHART lung-cancer trial showing the accumulation of deaths over the period of accrual of patients to the trial [34]

Year Number of patients

Number of deaths

1991 119

12

1992 256

78

1993 380

192

1994 460

275

1995 563

379

Table 9.13 Relationship between the number of independent tests performed and the probability of incorrectly claiming at least one significant result at the 0.05 level

Number of independent tests performed

Probability of one incorrect significant result

1 0.05

2

0.0975

3

0.143

5

0.226

10

0.401

100

0.994

This situation is analogous to the one of performing a number of interim analyses, for example in the CHART lung cancer trial a total four analyses were done over the years 1992-1995 on the primary outcome measure of length of survival. The situation is slightly more complicated for monitoring the accumulating results of a randomized trial as the tests being formed are correlated, because, for example, the data used in the second analysis include the data used in the first analysis and similarly the data used in the third analysis include the data used in the first and second analyses. As they assume independence this means that the probabilities shown in Table 9.13 are conservative estimates of the probability of incorrectly claiming significance at the 5 per cent level, given in truth there is no difference.

Numerous different solutions have been proposed for this problem [35-38]. However, although many of these approaches are technically complicated, the general approach can be summarized quite easily in the following way. Assume that when the trial has been designed, the type I error (significance level) is set at 5 per cent (see Chapter 4). If we plan to do just one analysis and actually perform just one analysis, then the p-value from this analysis is tested in the usual way against 0.05. Thus, if the observed p-value is less than 0.05, then we can claim a conventionally significant result. If the plan is to perform two analyses one during the course of the trial and the other at the planned end of the trial, then each of the p-values from these analyses need to be tested against significance (a) levels less than 0.05 to ensure the overall significance level over the two analyses is 0.05. One way of doing this is to aportion out the 0.05 significance (a) level, between the two analyses, perhaps with 0.001 at the first analysis and 0.049 at the second. This approach has the benefit that for the second, final and primary analysis, the a level used of 0.049 and is very close to 0.05, and we have also catered for an interim analysis. This approach also has the benefit that at the first (and interim) analysis the observed p-value has to be lower than 0.001 to claim a significant result and consider stopping the trial. It should be noted that if the trials were stopped in such a situation, because of the analysis plan and even though the observed p-value is less than 0.001, we should report only a conventionally significant result, i.e. that we have a result significant at the 0.05 level. It is easy to see that this approach can be extended to any number of interim analyses with the a level being spread over all the analyses - this is called the a-spending approach [38]. It should be emphasized that the more the a is used at the interim analyses, the less is available for use at the final analysis.

Quit Smoking For Good

Quit Smoking For Good

Quit smoking for good! Stop your bad habits for good, learn to cope with the addiction of cigarettes and how to curb cravings and begin a new life. You will never again have to leave a meeting and find a place outside to smoke, losing valuable time. This is the key to your freedom from addiction, take the first step!

Get My Free Ebook


Post a comment