## Continuous data

When comparing two groups of continuous observations the focus is usually the mean difference between groups. To perform a test of the hypothesis that there is no difference between groups we calculate the following statistic:

where Xe is the mean of the observations for the experimental group and Xc is the mean of the observations in the control group. The value SE(Xe — Xc) is the standard error of the difference between the means. The calculation of this standard error is straightforward, if a little cumbersome, and we have to define a number of terms first. Initially we need an estimate of a pooled variance, s2, s2 = (He — l)se2 + (nc — l)sc2 , (9.12)

where ne is the number of observations in the experimental group, nc, the number of observations in the control group, se, the standard deviation of the experimental group, sc, the standard deviation of the control group. The SE of the difference between means is then given by

ne nc

The statistic t which is obtained from this calculation is compared against a t-distribution (rather than Normal distribution used above) to obtain a p-value for this test. However, similar to the chi-square distribution, the t-distribution varies according to the degrees of freedom. In general, the degrees of freedom are given by the total number of observations (across both groups) minus the number of parameters estimated in the numerator of the statistic. Thus the degrees of freedom is given by: number of observations in the experimental group + number of observations in the control group — 2. The value two comes from the fact that we have estimated two means, one for each group. A 95 per cent confidence interval for the difference between means is

Xe — Xc — to.975 X SE(Xe — Xc) to Xe — Xc + to.975 X SE(Xe — Xc), (9.14)

where io.975 is the 97.5 per cent percentile from the t-distribution with the degrees of freedom given as described above. To calculate a 90 per cent confidence interval io.975 is replaced by t0.95, the 95 per cent percentile of t-distribution. The t-distribution is similar to the standard Normal distribution, in that it is symmetrical around zero. The major difference is for small sample sizes (small numbers of degrees of freedom) the t-distribution has fatter tails (at either end) than the Normal distribution. For larger sample sizes (large numbers of degrees of freedom) the t- and Normal distributions are very similar. Thus, when comparing groups with reasonable numbers of patients (for example more than twenty-five patients in each group), the t-statistic calculated above can be regarded as a Z-statistic, and compared as described above with a Normal distribution.

The methodology above assumes the data being analysed come from an approximately Normal distribution (see Chapter 5). Where there is doubt about this, alternative methods that do not assume any particular distribution for the data are generally applied. These are known as non-parametric tests, and the Mann-Whitney test described in the previous section is one such example. In fact the Mann-Whitney (or equivalently the Wilcoxon two-sample test) is again the appropriate test to use when one wishes to compare data from two groups where the Normality assumption is questionable. This is often the case with very small sample sizes. As for the categorical data example, the individual values from the two groups are combined and ranked and the sum of the ranks in the two groups compared. Though widely applicable, such tests are best used when there is clear evidence of lack of Normality, as they are less powerful than the t-test when the Normality assumption holds. The appropriate summary statistic for such data is the median; confidence intervals can be calculated but this is not straightforward, nor is it routinely offered by many statistical analysis packages.

### More than two groups

Where more than two groups are being compared, the general approach is to carry out a 'global' test which examines whether there is any evidence of substantial differences between the groups (without indicating where this lies). Only if this test is positive are further, pairwise, comparisons carried out to isolate and explain the main differences. This is discussed further below with respect to time-to-event data. For Normally distributed data, the appropriate global test is an extension of the two-sample t-test, known as Analysis of Variance, often abbreviated to ANOVA. The non-parametric equivalent is the Kruskal-Wallis test. For details of both, we refer readers to a statistics text book such as Altman [1].

## Post a comment