As mentioned above the appropriate test to perform when considering subgroups is a (single) test for interaction assessing the consistency of the overall effect across the groups, rather than a separate test for significance for each subgroup. Although the basic principles are the same, the exact details of the test vary with the type of data. The basic principles are that the estimate in each subgroup is compared with the overall estimate from the whole trial. The squared difference of this comparison is then weighted by the relative amount of information in the subgroup. The test statistic for the test of interaction is given by the sum across subgroups of these weighted squared differences. Under the null hypothesis that there is no interaction, i.e. that the effect is consistent across subgroups, this test statistic follows a chi-square distribution with one degree of freedom. Altman  gives more details for different types of data.
Example. The Sarcoma Meta-Analysis Collaboration  performed a systematic review and meta-analysis of fourteen randomized trials which compared surgery plus chemotherapy against chemotherapy alone for patients with soft-tissue sarcoma. As part of this meta-analysis nine factors were examined to assess whether the effect of chemotherapy was similar across the subgroups defined by these nine factors. The results of these analyses are presented in Fig. 9.5. The figure shows the nine factors considered ranging from age to use of radiotherapy. For each factor the results for each subgroup are displayed. To understand this plot consider the factor sex and its two obvious subgroups, female and male. The numbers alongside the subgroup give the number of deaths and the number of patients randomized to the chemotherapy and control groups. The horizontal line alongside each subgroup gives the estimate of the hazard ratio (centre of black square) and 95 per cent (inner ticks) and 99 per cent (outer ticks) confidence intervals around the estimate. The size of the square is proportional to the amount of information in that subgroup, so the larger the square the more events have happened in that subgroup. There are similar total numbers of deaths in the female and male subgroups, so the squares are similar in size. The vertical line running through the plot is the unity line, representing a hazard ratio of 1. From the plot it can be seen that the estimates of the hazard ratio for the female and male groups are approximately 1 and 0.7, respectively. Further, it can be seen that the 99 per cent confidence interval for the male group just touches the HR = 1 line, suggesting the p-value for the chemotherapy effect in this group is approximately 0.01, while quite clearly the result for the female group suggests no evidence for the effect of chemotherapy. Does this mean that chemotherapy
(no. events/no. entered) Chemotherapy No chemotherapy Hazard ratio
Extremity Trunk Uterus Others Histology Leiomyosarcoma Liposarcoma MFH Synovial Others
Extent of resection
Fig. 9.5 Subgroup analyses in the soft tissue sarcoma meta-analysis. Reprinted with permission from Elsevier Science (The Lancet, 1997, 350, 1647-54).
is effective in males and not in females? This would be an incorrect conclusion. The reason is that we are not posing the question correctly. The question that needs to be posed is, is there evidence that the effect of chemotherapy is different in the males and females? We can assess this visually by considering whether the confidence intervals for the effects in the two subgroups overlap markedly, and on inspection, they do. More formally we can test this, by performing a test for interaction, which simply compares the estimate in each subgroup with the overall estimate (appropriately weighted) and produces a chi-square statistic on one degree of freedom. If there is no interaction the results in the subgroups would differ only randomly from the overall result. For the factor sex, we obtain a chi-square statistic of 3.86 to give an associated p-value of 0.049. For such exploratory analyses, where we have performed a number of tests, this could be considered to be far from levels considered significant, and thus we would conclude
60/154 162/409 100/182
172/438 46/90 63/133 38/76
43/91 18/69 66/149 37/87 107/224
47/138 80/215 76/129
56/152 203/433 103/181
204/448 46/92 62/130 47/84
42/88 25/67 77/154 38/72 126/257
42/138 110/230 81/128
that there is no good evidence that the effect of chemotherapy is different in men and women. Thus our best estimate of the effect of chemotherapy in men and women is the overall estimate of effect seen in the whole trial. The same conclusion holds for all the factors examined in Fig. 9.5, there is no good evidence that the effect of chemotherapy is larger or smaller in subgroups examined, and thus for all subgroups the best estimate of the effect is the overall estimate for the whole trial. In fact this example illustrates the general point that in cancer trials it is very rare that different effects are found in different subgroups, and this is perhaps why any observed differences are viewed with scepticism.
For some factors such as tumour size and age, it is probably more appropriate to perform a test for trend rather than test for interaction. The reason for this is that the test for interaction poses no structure on the subgroups, looking for differences in any direction for all subgroups considered. This maybe appropriate for factors such as disease site and histology, where there is no natural structure between the subgroups. However, for factors such as tumour size we may reasonably expect a trend across the subgroups, with <5 cm showing the largest (or smallest) effect of chemotherapy, >10 cm showing the smallest (largest) effect of chemotherapy, with 5-10 cm somewhere in between the two. This ordering leads naturally to a test for trend which is an extension of the test for interaction. Examining the figure for tumour size this may superficially appear to imply that there is evidence of an effect of chemotherapy in the subgroup of 5-10 cm, but no effect in the <5 cm and >10 cm groups. This clearly does not make much practical sense, and the test for trend gives a p-value of 0.96, suggesting that we have no good evidence that the effect of chemotherapy varies (linearly) according to the size of the tumour. If there is a process of categorizing continuous data to form subgroups, it is good practice to generally have more than two groups so that a test for trend can be performed, since this will generally give more power to detect differences than categorizing into two groups and performing a test for interaction.
Example. An approach of reporting these succinctly is given in the ICON2 ovarian cancer trial  comparing carboplatin with the three drugs CAP. In this trial, although no subgroups were pre-specified, seven factors were investigated in an exploratory manner to assess whether the observed effect was consistent within subgroups of these seven factors (Table 9.11).
It should be noted that the test for interaction (and to lesser extent trend) is generally not a very powerful test, not least because we usually do not anticipate large differences between subgroups. Nevertheless, chance positive results can still occur, especially as many subgroups are often examined. In these circumstances, scepticism is important and the reader should ensure they follow the guidelines in Box 9.4, before the urge to report a 'new and exciting' result takes hold.
Was this article helpful?
Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.