Showing posts with label professionalism. Show all posts
Showing posts with label professionalism. Show all posts

Monday, March 5, 2012

Why I hate p-values (statistical leadership, Part II)

One statistical tool is the ubiquitous p-value. If it’s less than 0.05, your hypothesis must be true, right? Think again.

Ok, so I don’t hate p-values, but I do hate the way that we abuse them. And here’s where we need statistical leadership to go back and critique these p-values before we get too excited.

P-values can make or break venture capital deals, product approval for drugs, or senior management approval for a new design of deck lid. In that way, we place a little too much trust in them. Here’s where we abuse them:

  • The magical 0.05: if we get a 0.51, we lose, and if we get a 0.49, we win! Never mind that the same experiment run under the same conditions can easily produce both of these results. (The difference between statistically significant and not significant is not significant.)
  • The misinterpretation: the p-value is not the probability of the null hypothesis being true, but rather the long-run relative frequency of times that data from the similar experiments run under the same conditions will produce a test statistic that is at least the value that you had in your experiment, if the null hypothesis is true. Got that? Well, no matter how small your p-value is, I can get a wimpy version of your treatment and get a smaller p-value, just by increasing the sample size to what I need. P-values depend on effect size, effect variance, and sample size.
  • The gaming of the p-value: in clinical trials it’s possible to make your p-value smaller by restricting your subject entry criteria to what brings out the treatment effect the most. This is not usually a problem, except to keep in mind that the rarified world of early phase clinical studies is different from the real world.
  • The unethical gaming of the p-value: this comes from retrospectively tweaking your subject population. I guess it’s ok if you don’t try to pass this off as real results, but rather as information for further study design, but you can’t expect any scientific validity to tweaking a study, its population, or its analysis after the results are in.
  • Covariate madness: covariates tend to decrease the p-value by partitioning the variation in drug effect. That’s great if you want to identify segments of your patient population. But if you do covariate selection and then report your p-value from the final model, you have a biased p-value.

Statisticians need to stay on top of these issues and advocate for the proper interpretation of p-values. Don’t leave it up to someone with an incomplete understanding of these tools.

Monday, December 19, 2011

A statistician’s view on Stanford’s public machine learning course

This past fall, I took Stanford’s class on machine learning. Overall, it was a terrific experience, and I’d like to share a few thoughts on it:

  • A lot of participants were concerned that it was a watered down version of Stanford’s CS229. And, in fact, the course was more limited in scope and more applied than the official Stanford class. However, I found this to be a strength. Because I was already familiar with most of the methods in the beginning (linear and multiple regression, logistic regression), I could focus more on the machine learning perspective that the class brought to these methods. This helped in later sections where I wasn’t so familiar with the methods.
  • The embedded review questions and the end of section review questions were very well done, with some randomization algorithm making it impossible to guess until everything was right.
  • Programming exercises were done in Octave, an open source Matlab-like programming environment. I really enjoyed doing this programming, because it meant I essentially programmed regression and logistic regression algorithms by hand with the exception of a numerical optimization algorithm. I got a huge confidence boost when I managed to get the backpropagation algorithm for neural networks correct. Emphasis on these exercises was on the loops, which you could code using “slow” loops (for loops, for instance), but then really needed to vectorize using the principles of linear algebra. For instance, there was an algorithm for a recommender system that would take hours if coded using for loops, but ran in minutes using a vectorized implementation. (This is because the implicit loops of vectorization were run using optimized linear algebra routines.) In statistics, we don’t always worry about implementation details so much, but in machine learning situations, implementation is important because these algorithms often need to run in real time.
  • The class encouraged me to look at the Kaggle competitions. I’m not doing terribly well in them, but now at least I’m hacking on some data myself and learning a lot in the process.
  • The structure of the public class helps a lot over, for example, the iTunes U version of the class. But now I’m looking at the CS 229 lectures on iTunes U and am understanding them a lot more now.
  • Kudos to Stanford for taking the lead on this effort. This is the next logical progression of distance education, and takes a lot of effort and time.

I also took the databases class, which was even more structured with a mid-term and final exam. This was a bit of a stretch for me, but learning about data storage and retrieval is a good complement to statistics and machine learning. I’ve coded a few complex SQL queries in my life, but this class really took my understanding of both XML-based and relational database systems to the next level.

Stanford is offering machine learning again, along with a gaggle of other classes. I recommend you check them out. (Find a list, for example, at the bottom of the page of Probabilistic Graph Models site.) (Note: Stanford does not offer official credit for these classes.)

Saturday, April 12, 2008

This is why I keep my internal emails clean

Every once in a while I will express frustration about a client in an email, but I don't use profanity in emails and chats (these things are logged, too). Why?

Well, while my emails are unlikely to get plastered all over the interwebs, I do realize they can reach a wider audience than their intended audience. Don't forget, the things are discoverable, too. That means the lawyers can get 'em.