Saturday, January 14, 2012

Faster reading through math

Let’s face it, there is a lot of content on the web, and one thing I hate worse is reading halfway through an article and realizing that the title and first paragraph indicate little about the rest of the article. In effect, I check out the quick content first (usually after a link), and am disappointed.

My strategy now is to use automatic summaries, which are now a lot more accessible than they used to be. The algorithm has been around since 1958 (!) by H. P. Luhn and is described in books such as Mining the Social Web by Matthew Russell (where a Python implementation is given). With a little work, you can create a program that scrapes text from a blog, provides short and long summaries, and links to the original post, and packages it up in a neat HTML page.

Or you can use the cute interface in Safari, if you care to switch.

Wednesday, January 4, 2012

Competing in data mining competitions

I’m competing in several data mining competitions over at Kaggle. So far, I haven’t really done well, but I am learning a lot. Here’s what I’m getting out of it:

  • Variety in applying statistical techniques to real-world problems
  • Clarifying for myself what the bias-variance tradeoff really means
  • Trying new techniques, such as those I got out of the free online machine learning class
  • Humility

If you’re into statistics, you should try it! Kaggle isn’t the only competition forum in town, but it’s a good one. (Tunedit has one competition in classification of biomedical papers, and KDNuggets regularly announces contests from sites.