why is it so hard to avoid premature optimizations?
I am amused each time I learn about a completely unnecessary "optimization" that has quietly snuck into one of my programs. These changes happen automatically while programs are being composed, often before they have been run, and almost always independent of profiling. They make programs longer, more difficult to read and sometimes even slower. So why do I write such programs? It is because I, like most programmers, am shockingly bad at intuiting the location of real bottlenecks.
To illustrate this I am going to use as an example a program written by someone else. I am using this program not because I feel it contains particularly egregious unnecessary optimizations 1. Rather, I think it is an interesting example because of just how far off I was in understanding what makes it fast.I have been spending a fair amount of time with Clojure lately. When approaching a new language, I will often start by solving some problems I am already familiar with so that I can focus on learning the language's idioms 2. I had solved some Project Euler problems and also wrote a solution to the number combo problem that does not make use of local mutable state. Then I remembered the neat solution jcl had posted in a Hacker News comment thread and started thinking about how I could implement this algorithm 3 in Clojure.I had initially assumed the memoization and float math were important so I started out trying to decide a) which of Clojure's reference types I was going to use for the tuple cache and b) how I was going to validate that the floating point arithmetic was correct 4. As it turned out, neither are actually necessary for a fast solution 5.First off, if one eliminates memoization and just repeats the recursive calls as needed, the solution is just as fast 6! While the main function is still called considerably more times, most of these hit the base case (i.e. 1-tuples containing single values) and return quickly. Not memoizing also eliminates a few lines for the cache check and lets you return the calculated solution set directly instead of having to first write it to the cache.Switching to exact arithmetic from float math was even more illuminating. Not surprisingly, simply using bignums and arbitrary-precision fractions for the division steps works out to be around 25x slower than the original. However, the vast majority of the division operations do not produce intermediate results that eventually yield integer solutions 7. In fact, we can actually ignore every division operation that does not have an integer result and still generate all of the answers! This eliminates the need for the (rather expensive) arbitrary-precision fractions and makes the solution 28x *faster* than the original that used floats and memoization 8.The real reason the float version is so much faster is not that using exact arithmetic is too slow for calculating the intermediate values we actually need. Rather, it is because the penalty for calculating all of the dead-end intermediate values is so much smaller. If one can avoid that penalty some other way the answer will be just as fast.The point of this discussion 9 is that I often give up on trying to write clear code too quickly. The final implementation, aside from being the fastest, is also the shortest and the easiest to read. While those three do not always coincide, I seldom find myself in need of profiling a program because it runs unacceptably slow, but often experience difficulty reading code that does not closely enough reflect the problem it is solving. Rather than automatically asking myself "is this fast enough?", I need to cultivate the instinct to ask "is this clear enough?".footnotes:
- It was certainly simple enough, fast enough, and far easier to read than my solution to the same problem.↩
- If you're not going to learn to write idiomatic code, why bother learning a new language?↩
- Of course, the primary reason jcl's program is fast is because it uses a good algorithm. By orienting around the set of values that can be computed rather than the set of equations that can be evaluated it performs orders of magnitude fewer calculations.↩
- That is, it's possible--though probably unlikely--that one of the solutions generated only looks correct because of loss of information in the floating point operations. One could, for instance, go through the generated solutions and evaluate equivalent integer expressions with exact arithmetic to verify that they give the desired results.↩
- Although I had initially worked through this series of solutions in Clojure, I am going to show them in Python because jcl's original solution is Python and using a different language does not add anything to the discussion that I would like to focus on.↩
- The average over 5 runs of original.py and no-memo.py were within 1% of each other on my machine.↩
- This is of course also true for intermediate values resulting from float divisions in the original.↩
- You might think that I am no longer comparing apples to apples, but bear with me a moment.↩
- Not, as I may have lead you to believe, that one ought to spend obscene amounts of time polishing simple programs.↩