Is the result or conclusion falsely positive

Authors of 'negative' trials will often spend some time discussing the possible reasons why the results were not as they had hoped, and will search for and discuss possible flaws in the design or conduct which may have contributed to the result being falsely negative. Once these have been dismissed, the discussion may move on to the quality of the evidence used to justify the study in the first place. Rather less time is generally spent by the authors of 'positive' trials discussing if their results are falsely positive. Of course this is understandable; the trial would not have been done if they had not had some reason to believe the intervention would be effective. However, the need for critical review and discussion of such trials - by both authors and 'consumers' - is just as great as for negative trials. More so in fact, since these are the trials that are most likely to change practice.

It is important therefore to consider all the important elements of trial design and conduct described in the CONSORT checklist. Specific items which tend to over-estimate a treatment benefit and therefore require particular intention include: (a) exclusion of patients who fail to complete protocol treatment (particularly in trials which compare standard treatment with or without an additional treatment component), (b) examination of multiple outcomes, with or without selective reporting of 'positive' outcomes and (c) early stopping of a trial without adherence to pre-specified stopping rules or guidelines.

Point (a) represents a failure to analyse a trial by intention to treat, and the biases this can cause are described in Section 9.4.1. Point (b) refers to the fact that the probability of a false positive increases with the number of hypothesis tests that are performed. If a trial is reported as positive on the basis of one positive result amongst an analysis of multiple outcome measures, or assessment times, then it deserves particular caution. While both these points are fairly widely understood, point (c) is perhaps less well recognized. Interim analyses represent a form of multiple testing. The size of difference in treatment effect between two treatments will fluctuate during the course of a trial, most extremely when total numbers are small, iterating to the 'correct' result as the sample size approaches the target number. The more often you examine the data the greater the chance of analysing it when there is, by chance, an extreme difference. Formal stopping rules, as described in Section 9.5, guard against this by demanding extreme p-values to justify stopping, but even so the treatment effect in a trial which stops early will generally be over-estimated. What maybe less obvious, is that regular examinations of data without formal testing raise the same issues. This can be a particular problem in a single centre study, where the responsible clinicians know all the patients in the trial and are aware of their treatment allocation and their status. Effectively, such a trial is subject to almost continuous monitoring, and if a formal analysis is carried out based on concern of a difference emerging, it requires as cautious an interpretation as if regular formal analyses had been carried out.

If a trial meets all the standards of CONSORT, then is it safe to assume that it is a true positive? From trial design, we know that the probability of concluding a difference exists when in truth the treatments do not differ in efficacy is simply the significance level, a. If the trial has been designed with a two-sided hypothesis, then the probability that a trial will show a benefit to one of the treatments, when in fact it is no better than the other treatment, is a/2. For many trials, this is 0.025, which is reassuringly small. However, another factor to bear in mind is the prevalence of truly effective treatments. Where this is high, the proportion of positive results that are true positives will be high; where it is low (more often the case in oncology), the proportion of positive results that are true positives will be low.

It is impossible to tell for sure if a given positive result is a true positive or a false positive, but considerations of internal and external validity may again give some support.

To illustrate some of these issues, we discuss a trial comparing surgery alone versus chemoradiation followed by surgery in operable oesophageal cancer patients [13], which reported significant benefits to the chemoradiation after stopping early.

The trial had a planned sample size of 190 patients, based on detecting an absolute benefit of 20 per cent (with 80 per cent power). No formal stopping rules were described, but the authors stated 'Early indications of a clinically relevant difference between treatments suggested that an interim analysis should be undertaken. The trial was closed six years after it began because a statistically significant difference between the groups was found.' At trial closure, 113 patients had been randomized. There has been much discussion of this trial, mentioning in particular the much poorer than expected results in the surgery alone group and the possibility of imbalance in the initial staging of the patients (only post-operative staging was reported). In fact, reviewing the Walsh trial against the CONSORT checklist, reveals several points that are not adequately described, including the method and timing of randomization. It is perhaps the statement about monitoring which has attracted least attention, but is a serious cause for concern. In a single centre trial such as this, it is perhaps inevitable that a form of almost subconscious continuous monitoring goes on, and understandable that in the apparent absence of formal stopping rules, concern about a possible difference should lead to a formal analysis. This could have been avoided had an independent data monitoring and ethics committee been convened to review the data regularly (see Section 8.10), or even to review the data at the point of concern and advise on a course of action. The authors, in the absence of independent advice, may have felt ethically bound to close the trial. Certainly, having done so, it was important that the results were published. What the reader can do is consider, given the circumstances, how convincing their p-value of 0.01 was in the context of what was effectively continuous monitoring. Although this trial has been cited as evidence of the benefit of pre-operative chemoradiotherapy by some, others felt the need for confirmatory trials to be performed.

0 0

Post a comment