Realizations in Biostatistics: March 2009

Wednesday, March 25, 2009

Challenges in statistical review of clinical trials

The new 2009 ASA Biopharm newsletter is out, and the cover article is important not for the advice it gives to statistical reviewers (I'm assuming the point of view of an industry statistician), but for a glimpse into the mindset of a statistical reviewer at the FDA. Especially interesting is the use of similar results in a trial as a warning sign for potential fraud or misrepresentation of the actual data.

Friday, March 6, 2009

I was wrong: SAS does have decent random number generation

Here and other places I've been dissing on SAS's random number generation capabilities as unsuitable. Well, it turns out that I'm at least half wrong. If you use the ranuni or other ran___ series of random number generators, you still have the 2³²-1 period generator. SAS's implementation is top of the line of this class of generators, but the class is unsuitable for anything but the simplest of tasks (or teaching demonstrations). Serious modern clinical trial simulation tasks require a period of at least 2⁶⁴.

Enter SAS's RAND function. RAND takes a string argument that identifies the distribution (e.g. uniform or normal) followed by enough numerical parameters to identify the class member (e.g. normal requires either 0 or 2 numerical parameters&emdash;0 parameter gives you an N(0,1) distribution and 2 parameters identify the mean and variance). The special thing about RAND is that it is based on the Mersenne twister algorithm, which has a period of 2¹⁹⁹³⁷-1 and very good "randomness" properties.

So I hereby recant my criticism of SAS's PRNG capabilities.

Simplicity is no substitute for correctness, but simplicity has an important role

The test of a good procedure is how well it works, not how well it is understood. -- John Tukey

Perhaps I'm abusing Tukey's quote here, because I'm speaking of situations where the theory of the less understood methodology is fairly well understood, or at least fairly obvious to the statistician from previous theory. I'm also, in some cases, substituting "how correct it is" in place of "how well it works."

John Cook wrote a little the other day on this quote, and I wanted to follow up a bit more. I've run into many situations where a more understood method was preferred over one that would have, for example, cut the sample size of a clinical trial or made better use of the data that was collected. The sponsor simply wanted to go with the method that was taught in the first year statistics course because it was easier to understand. The results were often analysis plans that were less powerful, covered up important issues, or simply wrong (i.e. exact answer to the wrong question). It's a delicate balance especially for someone trained in theoretical statistics corresponding with a scientist or clinician in a very applied setting.

Here's how I resolve the issue. I think that the simpler methods are great for storytelling. I appreciate Andrew Gelman's tweaks to the simpler methods (and his useful discussing on Tukey as well!), and think basic graphing and estimation methods serve a useful purpose for presentation and first-order approximations of data analysis. But, in most practical cases, they should not be the last effort.

On a related note, I'm sure most statisticians know by know that they will have the "sexiest job" of the 2010 decade. The key will be how well we communicate our results. And here is where judicious use of the simpler methods (and creative data visualization) will make the greatest contributions.