## Sample size tables and software

Many papers, books and statistical packages present tables, nomograms or software for calculating sample sizes for the most common types of data; for example Machin et al. [20] present sample size formulae and tables for a range of common trial designs and outcome measures. Software is to be preferred for final calculations, particularly for very low or high event rates, since tables cannot cover every eventuality although they are fine for a quick check to get a ball park figure and give an immediate impression of the impact of changing some of the 'input factors.' The widespread availability of sample size software is on the whole a good thing, but anyone inexperienced in its use can make some potentially serious mistakes. Thus, rather than give a summary of the availability of software in what is a rapidly moving field, we describe some features to be aware of and some of the most common pitfalls; these are most evident for time-to-event data.

### General potential pitfalls

1. Does your table/software generate the number of patients required in total (assuming a 2-arm trial) or does it give the number per arm? For example, Freedman [22] provided the first tables for calculating sample sizes when two event-free curves are to be compared using the logrank test. These indicate the total number of patients required in a 2-arm study. Machin et al. [20] in their book of tables for sample size calculations include their versions of these tables, but have chosen to give the number of patients required per arm. Without care, you may derive a sample size that is half, or double the actual number of patients required.

2. Do you need to provide actual anticipated results in each group (under the alternative hypothesis), or the control group result and the estimated difference? Using the same examples above, Freedman's tables are displayed in a cross-tabulated form, with the control group event-free rate (p1) defining the rows, and the difference anticipated between this and the experimental arm rate i.e. p1 — p2 defining the columns. Others show p2 across the columns. Mix the two up, and instead of finding, correctly, that to detect a survival increase from 20 per cent in one group to 30 per cent in the other requires around 500 patients, you may derive the figure eighty-five patients, the number required to detect a 30 per cent increase from 20 to 50 per cent.

3. Many packages allow one to specify a withdrawal rate, but it is important to note exactly what is meant by the term. In Section 5.4.4 we distinguish between what we refer to as non-compliance and drop-out, but either may be referred to by others as withdrawal. In particular, the sample size adjustment made by most sample size packages to account for withdrawal is actually as we describe in Section 5.4.4 for drop-out, i.e. it assumes withdrawals are not assessable for the outcome measure of interest and simply need to be replaced by an equal number of assessable patients. As noted in Section 5.4.4, adjustment for patients who withdraw from treatment but remain assessable for the outcome measure of interest requires a different approach.

### Potential pitfalls specific to time-to-event data

1. There are two specific pitfalls when considering time-to-event data. Firstly, remember that almost invariably you will be required to enter the event-free rate, e.g. survival rate, and not the event rate, e.g. death rate since the logrank test is based on comparing event-free rates. This too can have a major impact on the estimated sample size, particular when the control group has a very high or very low event rate. Suppose you wished to determine the sample size necessary to detect a 10 per cent improvement in local recurrence rate, reducing it from 20 to 10 per cent. Entering these figures into standard packages for sample size calculation will generate a figure of 322 patients in total (90 per cent power, 5 per cent significance level). If instead you entered them correctly, as 80 and 90 per cent, the required number is actually 450.

2. The second thing to be aware of is whether the tables or software generate the number of patients required, or the number of events. Clearly these can be very different. It

0 0