Wednesday, September 14, 2011

Help! We need statistical leadership now! Part I: know your study

It’s time for statisticians to stand up and speak. This is a time where most scientific papers are “probably wrong,” and many of the reasons listed are statistical in nature. A recent paper in Nature Neuroscience noted a major statistical error in a disturbingly large number of papers. And a recent interview with Deborah Zarin, director of ClinicalTrials.gov, in Science revealed the very disturbing fact that many primary investigators and study statisticians did not understand their trial designs and the conclusions that can be drawn from them.

Recent focus on handling these problems have primary been concerned with financial conflicts of interest. Indeed, disclosure of financial conflicts of interest has only improved reporting of results. However, there are other sources of error that we have to consider.

A statistician responsible for a study has to be able to explain a study design and state what conclusions can be drawn from that design. I would prefer that we dig into that problem a little deeper and determine why this is occurring (and fix it!). I have a few hypotheses:

  • We are putting junior statisticians in positions of responsibility before they are experienced enough
  • Our emphasis on classical statistics fills a lot of our education, but is insufficient for current clinical trial needs involving adaptive trials, modern dose-finding, or comparison of interactions
  • The demand for statistical services is so high, and the supply so low, that statisticians are spread out too thin and simply don’t have the time to put in the sophisticated thought required for these studies
  • Statisticians feel hamstrung by the need to explain everything to their non-statistical colleagues and lack the common language, time, or concentration ability to do so effectively

I’ve certainly encountered all of these different situations.

Tuesday, September 13, 2011

The statistical significance of interactions

Nature Neuroscience recently pointed out a statistical error that has occurred over and over in science journals. Ben Goldacre explains the error in a little detail, and gives his cynical interpretation. Of course, I’ll apply Hanlon’s razor to the situation (unlike Ben), but I do want to explain the impact of these errors.

It’s easy to forget when you’re working with numbers that you are trying to explain what’s going on in the world. In biostatistics, you try to explain what’s going on in the human body. If you’re studying drugs, statistical errors affect people’s lives.

Where these studies are published is also important. Nature Neuroscience is a widely read journal, and a large number of articles in this publication commit the error. I wonder how many articles in therapeutic area journals make this mistake. These are the journals that affect day to day medical practice, and, if the Nature Neuroscience error rate holds, this is disturbing indeed.

Honestly, when I read these linked articles, I was dismayed but not surprised. We statisticians often give confusing advice on how to test for differences and probably overplay the meaning of statistically significant.

We statisticians have to assume the leadership position here in the design, analysis, and interpretation of statistical analysis.

Monday, September 12, 2011

R to Word, revisited

In a previous post (a long time ago) I discussed a way to get a R data frame into a Word table. The code in that entry was essentially a brute force way of wrapping R data in RTF code, but that RTF code was the bare minimum. There was no optimization of widths, or borders, or anything like that.
There are a couple of other ways I can think of:
  • Writing to CSV, then opening in MS Excel, then copying/pasting into Word (or even saving in another format)
  • Going through HTML to get to Word (for example, the R2HTML package)
  • Using a commercial solution such as Inference for R (has this been updated since 2009?!)
  • Using Statconn, which seems broken in the later versions of Windows and is not cross platform in any case
  • Going through LaTeX to RTF
I’ve started to look at the last option a bit more, mainly because LaTeX2RTF has become a more powerful program. Here’s my current workflow from R to RTF:
  • Use cat statements and the latex command from the Hmisc package (included with every R installation) or xtable package (downloadable from CRAN) to create a LaTeX document with the table.
  • Call the l2r command included with the LaTeX2RTF installation. (This requires a bit of setup, see below.)
  • Open up the result in Word and do any further manipulation.
The advantages to this approach are basically that you can tune the appearance of the table from R rather than through post-processing of the RTF file. The main disadvantage is that you have to do a lot of editing of the l2r.bat file (if you are on Windows) to point to the correct directories of MiKTeX, Imagemagick, and Ghostscript. (And you have to make sure everything is installed correctly as well.) It’s rather tricky because you have to include quotes in the right places. However, that setup only has to occur once. The second disadvantage is that if you use the system() command to call l2r.bat, you have to use a strange syntax, such as system(paste('"c:\\Program Files\\latex2rtf\\l2r"',"c:\\path\\to\\routput.tex")). I suppose you could wrap this up in a neat function. At any rate, I think the effort is worth it, because the output is rather nice.
I must admit, though, here is one area where SAS blows R away. With the exception of integration into LaTeX, and some semisuccessful integration with Excel through the Statconn project, R’s reporting capabilities in terms of nicely formatted output is seriously outpaced by SAS ODS.