Wednesday, October 31, 2012

Willful statistical illiteracy

The fine folks over at Simply Statistics have a very good educational article about the difference between the probability of winning an election and vote share. This article stems from a controversial column over at Politico criticizing Nate Silver and his election forecasts.

Twitter responses are even worse. Conservative filmmaker John Ziegler calls Nate Silver a “hyper-partisan fraud” who is “not an expert on polls.”


Glenn Thrush mentions a “conservative 538:”


And it’s not hard to find other examples.

I’ve run into this reaction a bit, especially when it comes to politics. There are a large group of people, who will dismiss any evidence going against their beliefs. I guess the punditry wasn’t so dismissive of Silver in 2010.

At any rate, I give a recommendation I rarely give: read this Politico article and the comments (ignore the “conservatives aren’t bright” nonsense, which is the same stuff coming from the left).

And let’s thank Nate Silver, RealClearPolitics, and all the honest pollsters who try to shine some data on this election.

Monday, October 29, 2012

The most valuable thing about my little stat blog network project

So, I decided to construct the linking graph through blogrolls, and finally settled on using a manual process. The best part of this project is really finding out for myself all the great content out there!

Monday, October 22, 2012

SNA class proposal

I’ve been taking several classes through Coursera (nothing against the other platforms; I took two of the original three classes via Stanford and just stuck with the platform). The latest one is Social Network Analysis, which has a programming project. Here is what I have posted as a proposal:

Ok, I've been thinking about the programming project idea some, and at first I was thinking of analyzing the statistics blogging community, mostly because I belong to it and I wanted to see what comes out. The analysis below can be done for any sort of community. I've developed this idea a little further and wanted to record it here for two reasons. First, I simply need to write it down to get it out of my head and in such a way that the public can understand it. Second, I'd like feedback.

As it turns out, I took the NLP class in the spring and think there's some overlap that can be exploited. (This comes up nicely in the Mining the Social Web and Programming Collective Intelligence books.) There are measures of content similarity, such as cosine similarity, which are simple to compute and reasonably work well to see how similar content is. Content can then be clustered based on similarity. So, then, I have the following questions:

  • What are the communities, and do they relate to clusters of content similarity?
  • If so, who are the "brokers" between different communities, and what do they blog about? There are a couple of aggregators, such as StatBlogs and R-Bloggers, that I imagine would glue together several communities (that's their purpose and value), but I imagine there are a few others that are aggregator-like + commentary as well. Original content generators, like mine, will probably be on the edges.
  • Is it better to threshold edges based on a number of mentions, or use an edge weight based on the number of mentions?
  • If I have time, I may try to do some sort of topic or named entity extraction, and get an automated way of seeing what these different communities are talking about.

Saturday, October 20, 2012

Nate Silver on The Daily Show

Watch it!

There’s an interesting conversation about how the campaigns use analytics in get out the vote efforts. It doesn’t go a lot in depth, but I think this is an important aspect of campaigns that will come out into public view in the next couple of election cycles.

Of course, you can find his blog at