Thursday, February 28, 2013

Bad statistics in high impact journals

Better Journals… Worse Statistics? : Neuroskeptic

In the linked blog entry, Neuroskeptic notes that high impact journals often have fewer statistical details than other journals. The research reported in these journals is often heavily amended, if not outright contradicted, by later research. I don't think this is nefarious, though, nor is it worthless. The kind of work reported in Science and Nature, for instance, generates interest and, therefore, more scrutiny (funding, studies, theses, etc.).

But as with all other research, if statistical details are included it might direct subsequent research in these topics a bit better.

Wednesday, February 20, 2013

The burst of the Big Data bubble, and do we need the hype, anyway?

So, now I'm seeing some buzz over Twitter that the Big Data disillusionment is starting now. Frankly, I've been wondering when this would happen. Of course, the next stage involves making strategic investments in Big Data resources, and having these resources quietly being used effectively, at least if Big Data follows technologies such as neural networks, Java, etc. So the theory goes, all surviving technologies follow a pattern of hype, disillusionment, and then quiet acceptance.

Did we really need this period of hype? I can understand companies hype up a technology to maintain interest while they try to make their offerings mature, and overhyping usually leads to disillusionment, but I wonder if there is a different path. R, Python, and some other open projects seem to have flattened the hype hill and disillusionment valley, probably because the larger number of people hacking the inside generates its own interest and maturity mechanism.

Anyway, I look forward to the maturing of big data at least until the privacy concerns generate widespread panic.

Friday, February 15, 2013

Sloppy journalism with interactive graphics is still sloppy journalism

The Guardian recently discussed the "declining linguistic standards" in State of the Union addresses. I thought  this was an interesting exercise, but something seemed wrong about the article, and it turns out this is one case where the data do not really speak for themselves. There's a lot of interpretation and understanding behind cultural trends in the use of the English language in America, as well as the evolution of the presidents' intentions behind the address. There are a few important points:

  • The author correctly points out that Woodrow Wilson essentially changed the format of the address through precedent from written document to speech. Right after Wilson's first speech there is a huge drop in the "education level" (hang on for a discussion of this terminology) of these addresses. As I recall, Wilson is the only American president with a Ph.D.
  • The index used - Flesch-Kincaid (FK), is questionable. Good on The Guardian to use a single measure for all speeches, but I have to wonder if it is wise to use the same measure for speeches and written addresses. Furthermore, FK is very sensitive to the placement of punctuation (it weights sentence length heavily). For instance, as a friend pointed out, one of Wilson's speeches has a FK grade level of over 17, but if you replace one of the semi-colons in the speech with a period, the FK grade drops to 12. This subtlety is lost in speech format, giving FK an extremely high uncertainty (this same friend calls FK "utterly useless" for speeches).
  • The audience of the SOTU address has changed. Though it's a constitutional duty of the president, the delivery as a speech is not, and it only has to be delivered to Congress. However, most modern addresses have been in the form of televised speeches, and have to be understood by a wider and less politically savvy audience.
  • Cultural trends in the use of spoken and written English in America involve shorter sentences over time in general.
  • In this case, a more sophisticated natural language processing analysis might reveal some interesting trends. For instance, how do wartime speeches compare to times of peace? Are there any natural categories of speeches that fall out? What are the outliers? How does this compare to polls?
In short, we have some interesting data that needs heavy qualification and critical analysis, that is just presented on a page and capped with a headline that gives an overly simplistic interpretation.

Monday, February 11, 2013

Operational details can be pesky

Recently, I was working with a team to finalize a clinical trial protocol. I raised some concerns about their strategic matters, and my concerns were dismissed as "operational details."

The thing about those pesky operational details is that, if something doesn't work due to an operational detail, you might have to modify your strategy. And if enough of these pesky operational details get in the way,  you may have to rethink your strategy.