Sunday, June 27, 2010

Reproducible research - further options

Mostly just a list of possible reproducible research options as a follow up to a previous entry. I still don't like these quite as much as R/Sweave, but they might do in a variety of situations.

  • Inference for R - connects R with Microsoft Office 2003 or later. I evaluated this a couple of years ago and I think there's a lot to like about it. It is very Weave-like, with a slight disadvantage that it really prefers the data to be coupled tightly with the report. However, I think it is just as easy to decouple these without using Inference's data features, which is advantageous when you want to regenerate the report when data is updated. Another disadvantage is that I didn't see a way to easily redo a report quickly, as you can with Sweave/LaTeX by creating a batch or shell script file (perhaps this is possible with Inference). Advantages - you can also connect to Excel and Powerpoint. If you absolutely require Office 2003 or later, Inference for R is worth a look. It is, however, not free.
  • R2wd (link is to a very nice introduction) which is a nice package a bit like R2HTML, except it writes to a Word file. (Sciviews has something similar, I think.) This is unlike many of the other options I've written about, because everything must be generated from R code. It is also a bit rough around the edges (for example, you cannot just write wdBody(summary(lm(y~x,data=foo))). I think some of the dependent packages, such as Statcomm, also allow connections to Excel and other applications, if that is needed.
  • There are similar solutions that allow connection to Openoffice or Google Documents, some of which can be found in the comments section of the previous link.

The solutions that connect R with Word are very useful for businesses that rely on the Office platform. The solutions that connect to Openoffice are useful for those who rely on the Openoffice platform, or need to exchange documents with those who rely on Microsoft Office but do not want to purchase it. However, for reproducible research in the way I'm describing these solutions are not ideal, because it allows the display version to be edited easily, which would make it difficult to update the report if there is new data. Perhaps if there were a solution to make the document "comment-only" (i.e. no one could edit the document but could only add comments) this would be a workable solution. So far, it's possible to manually set a protection flag to allow redlining but not source editing of a Word file, but my Windows skills are not quite sufficient to have that happen from, for example, a batch file or other script.

Exchanging with Google Docs is a different beast. Google Docs allows easy collaboration without having to send emails with attachments. I think that this idea will catch on, and once IT personnel are satisfied with security this idea (whether it's Google's system, Microsoft's attempt at catching up, or someone else's) will become the primary way of editing small documents that require heavy collaboration. Again, I'm not clear if it's possible to share a Google document with putting it into a comment-only mode, which I think would be required for a reproducible research context to work, but I think this technology will be very useful.

Posted via email from Randomjohn's posterous