Sunday, November 4, 2012

Sometimes, saving CPU time is worth it for small data jobs

There appears to be a conventional wisdom, one that I myself have espoused on several occasions, that for “most” statistical computing jobs that developer time is more precious than CPU time. (The reason I write “most” in quotes is that there are some people who work in environments where Big Data or large jobs is the norm, or they are developing high performance computing libraries, and they have to squeeze every last bit of performance out of the CPU.)

However, sometimes it can be worth it to save a few extra minutes small jobs, especially if they are run over and over. At one point today, I had an algorithm that I wrote inefficiently using Python’s built-in lists. I decided to stop the job and rewrite using the NumPy libraries, which took me an extra half hour. At first, I thought the time was wasted, but I have ended up running the code several times for various reasons. Those save minutes have now, a couple of hours later, saved me more time than I spent rewriting.