## What size of type II error is acceptable

Here the question is, how certain do you want to be of detecting a specified treatment difference if it really exists? By 'detecting a difference' we mean finding that in the trial analysis, the difference between treatments is associated with a p-value at or below the chosen type I error rate.

This quantity, often denoted by ¡3, is also known as the probability of a false-negative result, that is the probability of wrongly concluding that there is no difference between treatments. In fact the quantity 1 — 3 is used more widely. This is the probability of detecting a targeted difference between treatments, if it really exists, and is known more commonly as the power of a study. Clearly one wants this to be high, as close as possible to 100 per cent; as usual the actual level chosen must take into account the impact that increasing the power has on sample size. This is illustrated in Fig. 5.2. Power of at least 80 per cent, but more usually 90 per cent, is generally considered acceptable, meaning that if there is a true difference between treatments of the size targeted, you would find this to be statistically significant in 80 per cent (90 per cent) of trials. As for type I errors, the consequences of a wrong decision need to be considered for the specific trial, and should influence the actual level chosen. For example in an efficacy trial, the consequence of low power is an increased risk of concluding there is no benefit to the experimental treatment, and therefore of remaining with the current best control. However, in a non-inferiority or equivalence trial in which the experimental treatment is not expected to improve the primary outcome but must be no more than a specified amount worse, the consequence of low power is different. Here there is an increased risk of concluding the experimental treatment is approximately equivalent (if p-values alone are, wrongly, used to judge equivalence) therefore recommending the experimental treatment when in fact it is inferior. It is particularly important therefore to retain high power (>90 per cent) in an equivalence or non-inferiority trial.

3000

3000

Fig. 5.2 Number of patients required for different power levels. Note: Example from a 2-arm trial, anticipated 5-year survival rate in group 1= 50 per cent, in group 2= 60 per cent, two-sided 5 per cent significance level assumed.

80 85 90 95 100

0 0