Onesided or twosided test

A related issue which causes some confusion is that of whether one should use a 'onesided test' (essentially, is A better than B?) or a 'two-sided test' (do A and B differ?).



Fig. 5.1 Number of patients required

Fig. 5.1 Number of patients required

SE 500 - 1-sided for decreasing type I error levels. Note: Example from a 2-arm trial,

anticipated 5-year survival rate in group

0 0.02 0.04 0.06 0.08 0.1 Significance level (type I error rate)

A two-sided test is intended for use when it is plausible that the treatment difference could favour either treatment. This is often the case for new cancer therapies which, despite being introduced with the hope of being able to improve outcome over standard treatment, may turn out to have a detrimental effect. Designing a trial with the intention to use a two-sided test allows one to interpret any result simply as evidence for or against either treatment. A one-sided test is intended for use when it is implausible that one treatment (let us say the experimental treatment) could be worse than the control arm with respect to the primary outcome measure. If the actual result favours the existing treatment, one would under these circumstances have to attribute this purely to chance, and conclude only that the experimental treatment was no better than the control. Thus, while many cancer trials are designed, conceptually, with one-sided alternative hypotheses (for example that the experimental treatment improves 5-year survival by 10 per cent), it is often appropriate to design them with a 2-sided test in mind, because an adverse effect of the experimental treatment cannot be ruled out. The results are then straightforward to interpret (and, as most trials use 2-sided tests, to compare with other trials). A trial designed for a two-sided test allows one to draw one of three conclusions: A is better than B, A is worse than B or A and B are not substantially different. A onesided design and test enables one to draw one of two conclusions; A is better than B or A is not better than B. This choice may be justifiable; non-inferiority trials are often a case in point where it is sufficient to be able to say that A is no worse (by a specified amount) than B. However, one-sided tests are often viewed with suspicion that comes largely from their frequent misuse. Suppose a trial was designed with a two-sided alternative hypothesis; results favour the experimental treatment, but the (2-sided) logrank test on the primary outcome measure gives a p-value of 0.10. Had the trial been designed with a one-sided formulation, the equivalent p-value would be 0.05, which suggests much stronger evidence of benefit to the experimental treatment. If the decision to use a onesided test is only made after observing a trend favouring the experimental treatment this is clearly biased. It is the frequency with which this is done (particularly when the two-sided p-value lies in the range 0.06-0.10) which gives the one-sided test a bad name, and causes many trials which quote a one-sided test result to be viewed with suspicion.

These arguments alone may in fact be considered justification for using 2-sided tests unless the arguments against are compelling.

0 0

Post a comment