To date, no published comparative study of newer antidepressants has enrolled a sufficiently large group of patients to have the power to reliably detect the differences between two effective treatments according to a recent critique.32 One possible exception to this is the NIMH-sponsored Sequenced Treatment Alternatives to Relieve Depression (STAR*D) project, which will enroll 5000 patients in a comparative treatment trial.34 Unfortunately, owing to the cost and resources required to conduct studies of sufficient size, the average RCT evaluating antidepressant effects is woefully underpowered. For example, in a recent review of 186 RCTs examining the efficacy and tolerability of amitriptyline in comparison with other antidepressants, the average number of patients per treatment group was 40.35 In an analysis of pivotal studies (i.e., well-designed, well-controlled studies on which the FDA bases decisions about the efficacy of NCEs) for seven newer antidepressants, only 65—75 patients were included per study arm.32 Thus, the average study comparing two effective antidepressants would have less than 20% power to find a real, albeit modest (i.e., 10%), difference in response rates. Put another way, the likelihood of a false-negative finding (i.e., a type II error) would be four times greater than the chance of observing a statistically significant difference.

It is apparent that specific treatment effects have declined in recent decades. This may be due to selection bias at work that differs from that of a generation ago. The sample size, the number of centers, treatment arms, dosing (e.g., flexible dosing versus fixed), and different expectation biases all potentially influence results. For example, in the 1960s, more trials evaluated hospitalized patients who are generally less responsive to placebo and who appear to have a more robust response to antidepressants.32 Beyond the issue of inpatient/outpatient status, older studies were more likely to enroll patients with BPAD, psychosis, and recurrent melancholic subtypes of depression. In addition, the efficacy of antidepressant interventions was less well understood then (which may have lowered expectations of the patient or clinician) and fewer potential participants had ever received an effective course of pharmacotherapy.

Contemporary trials, on the other hand, may be enrolling a somewhat different population: highly selected ambulatory patients who are often contacted through the mass media. These subjects may be less severely depressed and are rarely treatment naive.32,36 Attempts to lessen these problems by restricting enrollment to patients with relatively high levels of pretreatment severity have often, in fact, accentuated them by inadvertently causing an inflation of entry depression scores.36 Many clinical trials use entry criteria based in part on a minimum score for the same instrument used to evaluate efficacy. Investigators may be motivated, consciously or not, to increase baseline scores slightly in order to enter subjects into the trial. Such scores may then decrease by that same amount once the subject is entered, thus contributing to what appears to be a placebo effect (if not analyzed appropriately).32

Another factor influencing the apparent effectiveness of antidepressants is the so-called 'file-drawer effect': the bias introduced by the tendency to publish positive but not negative studies. This bias is most evident when comparing reviews of published studies with reports that are based on data sets that have been submitted to the FDA for regulatory review.32 For example, on the basis of studies conducted for the registration of new antidepressants from fluoxetine to citalopram the effects of antidepressants appear to be only about half the size (relative to placebo) once the unpublished studies are taken into account.

