## The theory underlying sample size calculations using the Normal distribution

Suppose we wish to design a trial in which the main outcome measure is the difference in means of a particular variable in a group receiving standard therapy and a group receiving an experimental therapy. We will assume there will be n patients in each group, that the significance level has been set to be a (one-sided for simplicity) and the power 1 — ¡3. We will also assume that the SD of the distributions from which the two means are calculated is approximately the same, S, hence the SD of the means is S/^/n. This quantity is referred to as the Standard Error of the mean (SE).

Under the null hypothesis of no difference (Ho), the difference in means (d = m1 — m2) is assumed to follow a Normal distribution with mean zero and standard deviation

Under the alternative hypothesis (H1), the difference d is assumed to follow a Normal distribution with mean S and standard deviation, x S/^Jn, as under the null hypothesis.

To determine the sample size which satisfies both error constraints, we need to find the critical value, D, such that:

under H0, p(d < D) = 1 — a, under H1, p(d > D) = 1 — 3.

That is,

(D — 0)/( J2 x S/Jn) = Z— therefore D = ZX—a x (J2 x S/Jn), (D — S)/( J2 x S/Jn) = —Z1—3 therefore D also = S — (Z1—3 x (J2 x S/Jn)). Hence: Z1—a x (^/2 x S/ jri) = S — (Z1—p x (*J2 x S/*Jn)).

This can be rearranged to give the number of patients, n, required in each group to satisfy the defined error rates as

For a 2-sided test, a slight approximation is involved, but the equation can simply be written as

The quantity (Z1—a/2 + Z1—p)2 is fundamental to many sample size calculations, and the most commonly used values are given in Table 5.2. In general, in this chapter we

 Significance level (a) Power (1 — p) 2-sided 1-sided 0.80 0.85 0.90 0.95 0.01 0.005 11.679 13.048 14.879 17.814 0.02 0.01 10.036 11.308 13.017 1 5.770 0.05 0.025 7.849 8.978 10.507 12.995 0.1 0.05 6.183 7.189 8.564 10.822

present sample size formulae in the simplest terms, which will often mean that slight approximations are involved. These should therefore be used only as a rough guide to sample size, and more complete tables or software should be used for final calculations.

0 0