## Summary scores

An effective way of dealing with multiple scores over time is to derive a summary score for each patient, such as the mean, median, best or worst score. The advantages of this approach are that

♦ analyses are focused,

♦ statistics are valid,

♦ missing data can be accommodated,

♦ the methods are computationally straightforward.

There are a number of options to choose. For example, for each patient one could use:

♦ the worst score (the usual method of assessing toxicity),

♦ the best score (for example, the assessment of response),

♦ the last available score.

It should be emphasized that the conclusions may depend on the summary score chosen, thus some thought should be given to its choice. One might choose the worst score in a trial evaluating a new and potentially less toxic regimen, and the best score in a trial of palliative treatments where the aim is to reduce baseline symptoms. The mean or median might highlight the frequency of episodes or the duration of symptoms. However, the mean or median can often mask a proportion of patients with severe symptoms, and worst score takes no account of duration. The choice of summary measure may be determined by the pattern of severity over time. Thus a relevant summary measure for an increasing or decreasing line might be the regression coefficient, final value, or time to reach a particular value, while for a peaked curve, the mean, maximum or time to maximum would be more appropriate [24]. Constructing individual patient plots may clarify the summary measure to use and/or whether the data need to be transformed to give an approximate normal distribution (for example the log or cube root may be taken) before the summary score is calculated. The choice of summary measure will also depend on the patient population (e.g. lung or testicular cancer), the treatment (adjuvant or palliative) and the trial design (equivalence or difference). Hollen et al. [20] suggest using slope analysis as a summary. Thus, if there are three data points the slopes between points 1 and 2, 1 and 3, and 2 and 3, are calculated and the median taken.

The choice of summary measure requires a good understanding of the patterns of change likely to be observed. This of course emphasizes the point made in Chapter 6 about forming a clear hypothesis at the start of the trial and if necessary conducting a pilot study. For instance, two treatments might both cause nausea and vomiting, but the duration of this side effect maybe much longer in one than in the other. Summarizing by comparing the most severe effect experienced by the two groups may show no difference, comparing the duration of side effect may detect a clear difference.

The summary score AUC is a popular summary measure and an example of it is shown for each patient in the right-hand corner of each box in Fig. 9.7. It can be seen that to calculate this value we have to allow for missing intermediate response by connecting two datapoints with a straight line (e.g. CHART patient A). For some symptoms the AUC can be used to take account of both the length and quality of survival by plotting the patient's score over time and calculating the AUC [25]. This is an interesting method of summarizing data, and the advantage of using AUC with variables such as the performance status is that, if required, death can be scored as the worst category (5).

All summary measures suffer from the problems: (a) they assume that the precision of the summary measure is the same across patients, which is obviously not strictly true, and (b) there is no way of explicitly assessing the correlation between repeated measures. The summary measures 'worst score minus pre-treatment score' and AUC may also be influenced by the defined numerical value for each category of a symptom. A further problem specific to the AUC is that it can be influenced by long time periods between assessments.

## Post a comment