Choice of primary outcome measure

In any one trial several outcome measures may be of interest, and indeed the decision about whether or not to use a treatment may well entail balancing its impact across a

Outcome measure

Examples of associated time-to-event measure

Death

Survival time

Response (tumour shrinkage)

Duration of response

Recurrence of disease

Time to recurrence

Relief of symptoms

Time to relief of symptoms/without symptoms

Quality of life 'scores'

Time to improvement/deterioration in scores

Toxicity Time with/without toxicity

Toxicity Time with/without toxicity number of outcome measures. However, it is generally necessary to focus on one, the primary outcome measure, for the purposes of sample size calculation. As a general rule, outcome measures in trials conducted early in the development of a treatment are usually tumour-based, i.e. they are chosen because they demonstrate measurable treatment impact - tumour shrinkage for example. A good response on these outcome measures, however, may not be of any noticeable benefit to the patient. A characteristic of phase III trials is that the primary outcome measures are those which clearly have the potential to have an impact on the patient, survival time being an obvious example.

In a definitive phase III trial the primary outcome measure would usually be the one which is most important in determining whether or not a treatment would be used outside of the trial, and this would often be survival time. A survival benefit alone may not be a sufficient condition to support the new treatment, but it is often essential to know the size of any benefit before considering other factors. Early randomized trials of a new treatment may use earlier intermediate outcome measures such as relapse or progression [2] which, in this situation, carry three main advantages. Firstly, the events are observed earlier than death, and so patients need not be followed-up so long for the purposes of the trial (though continued follow-up for long-term outcomes will often be valuable). Secondly, there will generally be more relapses or progressions than deaths -and as we will see later, it is the number of events rather than the number of patients which determines the size and power of a trial. Thirdly, as the majority of patients will receive no further treatment until relapse, the impact of treatment allocation can be assessed without the complicating factor of additional treatments before the event of interest. The disadvantage of such outcome measures is that one often does not know if a treatment's impact on the intermediate outcome measure, such as progression rate, will necessarily translate into differences in survival. An outcome measure can only be termed a surrogate outcome, and used in a definitive trial instead of survival, if there is good evidence that the impact of treatment on the intermediate outcome is predictive of the impact of the treatment on survival. One strict definition of surrogacy [3] is that the impact of treatment on the intermediate outcome entirely explains its impact on the definitive outcome measure. This is rare in practice, and the best one can hope for is a high degree of correlation. In this respect, it is worth noting that response is a poor surrogate for survival for several cancers. For example, using data from a meta-analysis in advanced colorectal cancer, Buyse et al. [4] demonstrate that although an increase in tumour response rate translates into an increase in overall survival for patients with advanced colorectal cancer, knowledge that a treatment has benefits on tumour response does not allow accurate prediction of the ultimate benefit on survival.

Despite numerous potential surrogate outcomes, and despite the potential for high correlations between the surrogate outcome and survival, there are many difficulties associated with the use of surrogates [5]. Therefore, survival will often be the outcome measure of choice for a definitive randomized trial. One reason for this is that trials are often comparing treatment policies, and if for example a combination of a new, perhaps rather toxic, adjuvant therapy with a standard treatment for relapse produces similar survival to a standard adjuvant therapy with the new treatment reserved for those patients who relapse, a prolongation of relapse-free time may not be sufficient to justify using the new treatment for all patients in the adjuvant setting. For example, the EORTC/MRC trial of immediate versus deferred radiotherapy for low grade glioma [6] found that immediate radiotherapy improved progression-free survival, but not overall survival compared with deferred radiotherapy. Proponents of deferred radiotherapy can use this trial to argue that reserving radiotherapy until absolutely necessary has spared many patients the side effects of the treatment without compromising their overall chance of cure.

Whichever outcome measure is chosen, it must be possible to assess the primary outcome measure objectively and consistently on every patient randomized if an unbiased estimate of the treatment effect is to be obtained. If it is really not possible to distinguish a single primary outcome measure, then one should ideally determine the sample size necessary to detect realistic and clinically relevant differences with respect to each outcome measure, and choose as the overall trial sample size the largest number required. This ensures that the trial is adequately powered for all the important outcome measures. An important point to remember though, is that carrying out multiple significance tests across a number of outcome measures increases the chance of one or more of these being 'significant' purely by chance; this can be compensated for by demanding a more extreme level of statistical significance (see also Section 9.5).

The importance of choosing the 'right' outcome measure to power a study is well illustrated with respect to equivalence, or non-inferiority trials. These trials hope to demonstrate that an experimental treatment, which is not expected to have superior efficacy, brings potential benefits such as reduced toxicity which all patients would benefit from, and which would outweigh small differences in efficacy. In this situation it is a common mistake simply to assume that the experimental treatment will have little impact on efficacy, and to base sample size on an outcome measure such as toxicity in which large differences are anticipated. Such a trial will be too small to detect important differences in efficacy that might well outweigh even substantial differences in toxicity. It will often be necessary to plan the trial to be big enough to detect the differences in efficacy that would be considered unacceptable whatever the differences in toxicity.

0 0

Post a comment