Thursday, October 30, 2008

Stats-based healthcare

Op-Ed Contributors - How to Take American Health Care From Worst to First - NYTimes.com

I agree with the sentiment of the article. We can stand to bring more empirical evidence into our healthcare and use our information technology to improve care. However, I don't agree with the notion that it will be easy. The reason we have more data on our fantasy baseball third baseman is simply that data is easier and less expensive to collect.

The effort that Beane, Gingrich, and Kerry describes will be quite hard, as there are lots of difficulties in electronic medical records that need to be sorted out. But I think it is a necessary effort and will be worth the investment.

Wednesday, October 29, 2008

One major is not included here

10 most popular majors and what they pay - CNN.com
Statistics, of course, wasn't included. However, I can tell you that, depending on what brand of statistics you study (eg industrial, biostatistics, econometrics), the law of supply and demand is on the side of the jobseeker. The downside is that usually a bachelor's degree isn't enough, and the (very) good salaries start with a master's degree.

Friday, October 17, 2008

Another solution to the R to Word table problem

Last time I used an HTML solution. This time, I create an RTF file:


# function: my.rtf.table
# purpose: convert a matrix, data.frame, or array into a rtf table
# output: text for RTF, possibly written to a file
# inputs:
# tab - a table, dataframe, or array (needs rownames and colnames)
# outfile - name of file (or console if NULL, which is default)
# rtffile - if T (default) then add {\rtf1 to beginning and } to end, making
# a full RTF file, if F then leave these off
# header - if T (default) then bold the table header
# ... - passed to format for the table body only
# tips: output to tempfile and use WordInsertFile(...) from the svViews
# package to easily convert a table to Microsoft Word
my.rtf.table <- function(tab,outfile=NULL,rtffile=T,header=T,...) {
if (!is.null(outfile)) sink(outfile)
tab.nrow<-nrow(tab)
tab.ncol<-ncol(tab)
if (rtffile) {
#begin RTF document
cat("{\\rtf1\n")
}
#populate header row
cat("\\trowd\\trautofit1\\intbl\n")
j <- 1
for (i in 1:(tab.ncol+1)) {
cat("\\cellx",j,'\n',sep='')
j<-j+1
}
cat("{\n")
# loop through and write column headers
cat(" \\cell\n")
for (i in 1:tab.ncol) {
if (header) {
cat('\\b ',colnames(tab)[i],"\\b0\\cell \n",sep='')
} else {
cat(colnames(tab)[i],"\\cell \n",sep='')
}
}
cat("}\n")
cat("{\n")
cat("\\trowd\\trautofit1\\intbl\n")

j<-1
for (i in 1:(tab.ncol+1)) {
cat("\\cellx",j,'\n',sep='')
j<-j+1
}
cat("\\row }\n")

#write table contents
for (k in 1:tab.nrow) {
cat("\\trowd\\trautofit1\\intbl\n")

j<-1
for (i in 1:(tab.ncol+1)) {
cat("\\cellx",j,'\n',sep='')
j<-j+1
}
cat("{\n")
cat(rownames(tab)[k],'\\cell\n',sep='')
for (i in 1:tab.ncol) {
cat(format(tab[k,i],...),"\\cell \n",sep='')
}
cat("}\n")
cat("{\n")
cat("\\trowd\\trautofit1\\intbl\n")
j<-1
for (i in 1:(tab.ncol+1)) {
cat("\\cellx",j,'\n',sep='')
j<-j+1
}
cat("\\row }\n")
}
if (rtffile) {
# end the rtffile
cat("}\n")
}
if (!is.null(outfile)) sink()
}


You'll need the package svViews (part of the SciViews series of packages) for this one.




library(svViews)
myfile <- paste(tempfile(),".rtf")
my.rtf.table(table,outfile=myfile)
# use svViews commands to set up Word, if needed
WordInsertFile(myfile)
unlink(myfile)


Basically, what the above does is create a temporary file to hold the RTF document, then write the RTF code to recreate the table, then add the RTF file to Word. The table can then be manipulated as desired. Unfortunately, as the SciViews is Windows only, the automation part of this process is Windows only, but the file creation is not. Since myfile is a string variable, an R function can be written to execute a series of Applescript commands on the Macintosh.

Using R in consulting: playing nice with Microsoft Word

As I use R more in consulting, I'm finding the need to make the quick transition from R to Microsoft products (usually Word) more serious. (I'm using a Windows platform, but I'm sure the challenges on the Mac would be similar.) I simply don't have time to do the text manipulations necessary to convert text to Word tables, for example. There are a few solutions I've tried:

  1. RWeave and LaTeX2RTF - a bit clunky and producing disastrous results. I really like RWeave, but it's the LaTeX to RTF conversion that's the weak link in the chain here. It simply doesn't work well enough for consulting use. Also, it takes too long to set up when you just want to copy a couple of tables over to Word.
  2. odfWeave - I'm interested in this solution, but there are a couple of issues. Microsoft support of open document format (ODF) isn't quite there yet, I have to install OpenOffice writer as an intermediary. And I haven't quite gotten odfWeave to work yet, as it requires a command line zip utility and wzcline (the command line interface to WinZip required as a separate download) doesn't seem to work very well yet. Maybe some of the other command lines would work better, but there was a warning as of this writing that the zip utility recommended in the odfWeave help files has a security flaw.
  3. HTML - I haven't tried the full R2HTML package, but it still seems that this is too much for what I need. It's probably possible to use this to output a basic html file which can then be read into Word, but as of right now that requires too much setup for what I'm doing.
So, basically, I've come up with my own lightweight solution, the my.html.table function which takes a matrix, data.frame, or array and outputs the html code for a table in the console or a file. Here's the function:


# function: my.html.table
# purpose: convert a matrix, array, or table to an html table
# inputs:
# file - NULL (default) to output to screen, or file name to output to file
# header - F (default) to use for table header, T to use (ignored if
# colnames is F)
# rownames - F (default) to leave out the rownames attribute of the table, T to
# include it in the first column
# colnames - F (default) to leave out the colnames attribute of the table, T to
# include it in the top row (affects header)
# ... - additional parameters are passed to format, affecting the body of the
# table only (not the headers or row names)
# interactions: header is used only if colnames is T
# output: html text (output to a file if file is not NULL)
# tip: use this function, and copy/paste to Excel, and then copy/paste the Excel
# table to easily create a table in MS Word
# note: for more functionality, use the R2HTML package, as this function is
# intended for lightweight use only
my.html.table <- function(tab,file=NULL,header=F,rownames=F,colnames=F,...) {
if (!is.null(file)) sink(file)
nr <- nrow(tab)
cat("<table>\n")
if (header) {
th.tag <- "<th>"
th.tag.end <- "</th>"
} else {
th.tag <- "<td>"
th.tag.end <- "</td>"
}
if (colnames) {
cat("<tr>",th.tag,sep='')
if (rownames) {
cat(th.tag.end,th.tag,sep='')
}
cat(colnames(tab),sep=paste(th.tag.end,th.tag,sep=''))
cat(th.tag.end,"</tr>\n",sep='')
}
for (i in 1:nr) {
cat("<tr><td>")
if (rownames) {
cat(rownames(tab)[i],'</td><td>')
}
cat(format(tab[i,],...),sep='</td><td>')
cat("</td></tr>\n")
}
cat("</table>\n")
if (!is.null(file)) sink()
}


And here's an example:


a<-c(0.5290,0.5680,0.6030,0.6380,0.7050,0.7000,0.7090) b<-c(0.0158,0.0157,0.0155,0.0152,0.0144,0.0145,0.0144) c<-c(87.1350,108.5070,128.6900,149.6190,170.7140,190.6750,211.9240)
foo <- rbind(a,b,c,d,e,f,g)
colnames(foo) <- paste("Col",1:7)
rownames(foo) <- c("Lbl 1","Lbl 2","Lbl 3","Lbl 4","Lbl 5","Lbl 6","Lbl 7") my.html.table(foo)
my.html.table(foo,rownames=T,colnames=T,header=T,digits=2)
my.html.table(foo,width=5)


To easily create a Word table from a matrix, data.frame, or array, just use my.html.table with suitable parameters (including anything that can be passed to format), copy the html from the console, paste into Excel, select and copy from Excel, and paste into Word. It's a little clunky, and maybe through the use of R's clipboard connection I can cut out a step or two, but it's a vast improvement over manual formatting and the bloated solutions listed above.

If you have a better way, feel free to comment.

By the way, using the right-click menu on graphs to copy as a metafile works wonderfully.

If reports are to be updated regularly using changing/updated data, this of course is the wrong solution, and one of the literate programming/weave solutions or appropriately programmed R2HTML is much better.

Friday, October 10, 2008

Liberal vs. conservative visualization

Memeorandum Colors: Visualizing Political Bias with Greasemonkey - Waxy.org

This is more visualization related (still very important, but not exactly what I do), and it's political bias rather than statistics, but this is just so cool I had to link to it.