Realizations in Biostatistics: Web polls in blog entries

Sunday, August 19, 2007

Web polls in blog entries - I don't trust them

I distrust web polls. While there are more trustworthy sources of polling such as Surveymonkey, these web surveying sites have to be backed up with essential the same type of operational techniques found in standard paper surveys. The web polls I distrust are ones that that bloggers put in their entries in their blog entries to poll their readers on their thoughts of certain issues. Sometimes they will even follow up with an entry saying "this isn't a scientific poll, but here are the results."

A small step up from this are the web surveys, such as John Mack's First Ever Pharma Blogsphere Survey®™©. They have a lot of the same problems as the simple web poll, and few of the controls necessary to ensure valid results. So I'll discuss simple one-off web polls and web surveys together.

Most of the problems and biases with these web polls aren't statistical; rather, they are operational. The data from these is so bad that no amount of statistics can rescue them. It's better not to even bring statistics into the equation here. Following are the operational biases I consider unavoidable and insurmountable:

Most web polls do not control whether one person can vote multiple times. Most services will now use cookies or IP addresses to block multiple votes from one person, but these services are imperfect at best. Changing an IP address is easy (just go to a different Starbucks, and cookies can be deleted). Cookies are easily deleted.
Wording questions in surveys is a tricky proposition, and millions of billable hours are spent agonizing over the wording. (Perhaps 75% of that is going a bit too far, but you get the point.) Very little time is generally spent wording the question of a web poll. The end result is that readers may not be answering the same question a blogger asks.
Forget random sampling, matching cases, identifying demographic information, or any of the classical statistical controls that are intended to isolate noise and false signal from true signal. Web poll samples are "People who happen to find the blog entry and care enough to click on a web poll." At best, the readers who feel strongly about an issue are the ones likely to click, while people who are feel less strongly (but might lean a certain way) will probably just glaze over.
Answers to web polls will typically be immediate reactions to the blog post, rather than thoughtful, considered answers. Internet life is fast-paced, and readers (in general) simply don't have the time to thoughtfully answer a web poll.

Web polls and surveys might be useful for guaging whether readers are interested in a particular topic posted by the blogger, and so they do have a use in guiding the future material in a blog. But beyond that, I can't trust them.

Next step: an analysis of the John Mack/Peter Rost kerfluffle.