Friday, October 17, 2008

Using R in consulting: playing nice with Microsoft Word

As I use R more in consulting, I'm finding the need to make the quick transition from R to Microsoft products (usually Word) more serious. (I'm using a Windows platform, but I'm sure the challenges on the Mac would be similar.) I simply don't have time to do the text manipulations necessary to convert text to Word tables, for example. There are a few solutions I've tried:

  1. RWeave and LaTeX2RTF - a bit clunky and producing disastrous results. I really like RWeave, but it's the LaTeX to RTF conversion that's the weak link in the chain here. It simply doesn't work well enough for consulting use. Also, it takes too long to set up when you just want to copy a couple of tables over to Word.
  2. odfWeave - I'm interested in this solution, but there are a couple of issues. Microsoft support of open document format (ODF) isn't quite there yet, I have to install OpenOffice writer as an intermediary. And I haven't quite gotten odfWeave to work yet, as it requires a command line zip utility and wzcline (the command line interface to WinZip required as a separate download) doesn't seem to work very well yet. Maybe some of the other command lines would work better, but there was a warning as of this writing that the zip utility recommended in the odfWeave help files has a security flaw.
  3. HTML - I haven't tried the full R2HTML package, but it still seems that this is too much for what I need. It's probably possible to use this to output a basic html file which can then be read into Word, but as of right now that requires too much setup for what I'm doing.
So, basically, I've come up with my own lightweight solution, the my.html.table function which takes a matrix, data.frame, or array and outputs the html code for a table in the console or a file. Here's the function:


# function: my.html.table
# purpose: convert a matrix, array, or table to an html table
# inputs:
# file - NULL (default) to output to screen, or file name to output to file
# header - F (default) to use for table header, T to use (ignored if
# colnames is F)
# rownames - F (default) to leave out the rownames attribute of the table, T to
# include it in the first column
# colnames - F (default) to leave out the colnames attribute of the table, T to
# include it in the top row (affects header)
# ... - additional parameters are passed to format, affecting the body of the
# table only (not the headers or row names)
# interactions: header is used only if colnames is T
# output: html text (output to a file if file is not NULL)
# tip: use this function, and copy/paste to Excel, and then copy/paste the Excel
# table to easily create a table in MS Word
# note: for more functionality, use the R2HTML package, as this function is
# intended for lightweight use only
my.html.table <- function(tab,file=NULL,header=F,rownames=F,colnames=F,...) {
if (!is.null(file)) sink(file)
nr <- nrow(tab)
cat("<table>\n")
if (header) {
th.tag <- "<th>"
th.tag.end <- "</th>"
} else {
th.tag <- "<td>"
th.tag.end <- "</td>"
}
if (colnames) {
cat("<tr>",th.tag,sep='')
if (rownames) {
cat(th.tag.end,th.tag,sep='')
}
cat(colnames(tab),sep=paste(th.tag.end,th.tag,sep=''))
cat(th.tag.end,"</tr>\n",sep='')
}
for (i in 1:nr) {
cat("<tr><td>")
if (rownames) {
cat(rownames(tab)[i],'</td><td>')
}
cat(format(tab[i,],...),sep='</td><td>')
cat("</td></tr>\n")
}
cat("</table>\n")
if (!is.null(file)) sink()
}


And here's an example:


a<-c(0.5290,0.5680,0.6030,0.6380,0.7050,0.7000,0.7090) b<-c(0.0158,0.0157,0.0155,0.0152,0.0144,0.0145,0.0144) c<-c(87.1350,108.5070,128.6900,149.6190,170.7140,190.6750,211.9240)
foo <- rbind(a,b,c,d,e,f,g)
colnames(foo) <- paste("Col",1:7)
rownames(foo) <- c("Lbl 1","Lbl 2","Lbl 3","Lbl 4","Lbl 5","Lbl 6","Lbl 7") my.html.table(foo)
my.html.table(foo,rownames=T,colnames=T,header=T,digits=2)
my.html.table(foo,width=5)


To easily create a Word table from a matrix, data.frame, or array, just use my.html.table with suitable parameters (including anything that can be passed to format), copy the html from the console, paste into Excel, select and copy from Excel, and paste into Word. It's a little clunky, and maybe through the use of R's clipboard connection I can cut out a step or two, but it's a vast improvement over manual formatting and the bloated solutions listed above.

If you have a better way, feel free to comment.

By the way, using the right-click menu on graphs to copy as a metafile works wonderfully.

If reports are to be updated regularly using changing/updated data, this of course is the wrong solution, and one of the literate programming/weave solutions or appropriately programmed R2HTML is much better.