Saturday, December 10, 2011

Premature optimisation is the root of all misquotes

One annoying phrase that crops up again and again is that well known Knuth quote : "premature optimisation is the root of all evil". This is originally from Knuth's Structured Programming with go to Statements. Now given the very large number of people who quote this axiom I must ask hands up those who have actually read the original paper? Well that is a poor of show hands.

The majority of times I encounter this it is almost always in either the context of:
  • "its running ok with our set of test users so lets not worry about revisiting any of the code to see if we can make improvements unless it breaks, as management have got a bucket of features to implement"
  • "we don't want to think about how we use data and the GC is so good it all sort of magically just works no matter how we write stuff (or so Brian Goetz assures us)"
  • or more usually "I have no idea what I am talking about but I read this cool quote from some guy called Knuth everyone goes on about on some tech blog so it must be right".
So let us examine really what Knuth is talking about in this context. The paper is about the much maligned goto statement, whether certain classes of programmes can be written without them and further whether some programming solutions can be efficient without using it. 

The first problem he presents in this paper is to illustrate this is:
"Let's suppose that we want to search a table A[1] . . . A[m] of distinct values, in order to find where a given value x appears; if x is not present in the table, we want to insert it as an additional entry. Let's suppose further that there is another array B, where B[,] equals the number of times we have searched for the value A[i]."

His initial suggested solution using goto is:

for i := 1step 1until m do. if A[i] = x then go to found fi;
not found: i := re+l; m := i; A[i] := x; B[i] := 0;
found: B[i] := B[i]+1;

The suggested solution without goto is:

A[mq-1] := x; i := 1; 
while A[i] != x do i := i+1; 
if i >m then m:=i; B[i]:=1; 
else B[i] := B[i]+1 fi;

So what I hear you say, goto has been and gone. However, Knuth then goes on to discuss the performance aspects of this fragment. 

"If we assume that programs are translated by a typical "90 % efficient compiler" with bounds-checking suppressed, the corresponding run-time figures are about 14n + 5 and l l n + 21. .... we save about 21%, so the elimination of the go to has also eliminated some of the running time." He expands this solution; introducing loop doubling, discusses whether values can be accessed directly in hardware registers, and covers eliding range checking for subscripts and redundant tests in inner loops. 

Hmm... now how does this fit with our initial statement here, as Knuth seems to be talking about relative performance of small elements like loop structures. 

If we read the rest of the paper it should gradually dawn on us that Knuth is actually talking about the fact that goto is not always needed to improve efficiency (as was commonly held) and in cases where it is more efficient (as he outlines in later sections) it is better applied in areas that are critical, so as not to break up a more structured coding approach.

Additionally, a large part of the early optimisation discussion is also based on the assertion that empirically most of the time in non-io bound programmes is spent in 3% of the code so why bother with the other 97% when the programme is not re-used? A question to ask yourself is does my app fit this distribution? or is it more evenly distributed or exhibit some Pareto type distribution?

This paper is almost entirely rooted in the hot topic of the day around readability of programmes without gotos and how structured programming should be adopted as an alternative. Note: stuctured programming in this sense is mostly about use of for loops and while loops rather than interleaved goto statements, which is perhaps not quite what you were expecting.

This obviously does not have the same meaning today (apart from languages still with gotos) and yet it has become a sort of folklore idiom in modern development for programmers irrespective of the language and optimisations suggested.

Too often programmers use the short form of the statement as an excuse to do no testing and optimising, think about what algorithms, data structures and allocation patterns they are coding for, or to enable them to write things in such a way that ignores many well understood optimisations. Even in code that is known in advance to be executed a massively disproportionate amount of times.

While code is often pumped out to fit current need (and we are all guilty of that especially on large projects), we should strive to try and make our initial code as efficient as is possible (especially when we know it is going to be heavily used) and where we cannot, to revisit it at the first opportune moment. The more inefficiencies we leave in the code, the slower and more memory hungry our programme tends to become. Starting out with this in mind usually means more upfront thought and in my experience actually less not more code. As the most efficient code is the lines that do not exist, and the best way to deal with memory garbage is not to create it in the first place.

Indeed, Knuth returns to the concept of efficiency later in the discussion:
"I'm saying that a similar thing should be considered standard practice for all but the simplest software programs: A programmer should create a program P which is readily understood and well-documented, and then he should optimize it into a program Q which is very efficient. Program Q may contain go to statements and other low-level features, but the transformation from P to Q should be accomplished by completely reliable and well- documented "mechanical" operations.
At this point many readers will say, "But he should only write P, and an optimizing compiler will produce Q." To this I say, "No, the optimizing compiler would have to be so complicated (much more so than anything we have now) that it will in fact be unreliable."

That really does not fit with the the current usage of our initial quote. Knuth is actively advocating in this paper that optimisations should be continually applied to any but the most trivial programs. So how do we resolve this apparent contradiction of what is normally understood by the original statement? 

If we examine the conclusion of the paper we can see that Knuth has a different context when he wrote the paper than most programmers working today :

"In our previous discussion we concluded that premature emphasis on efficiency is a big mistake which may well be the source of most programming complexity and grief. We should ordinarily keep efficiency considerations in the background when we formulate our programs. We need to be sub- consciously aware of the data processing tools available to us, but we should strive most of all for a program that is easy to understand and almost sure to work. (Most programs are probably only run once; and I suppose in such cases we needn't be too fussy about even the structure, much less the efficiency, as long as we are happy with the answers)"

Further than that, the entire paper is about the use of goto in a structured programming environment and whether any efficiencies from it are worth the trade off in structure and readability it introduces. So if we fast forward from 1974 to now, the next time someone utters this idiom it might be worthwhile considering if they are right and whether you really need to re-valuate your use of goto, or more likely it is a case of:

goto no_idea_what_they_are_talking_about;

Taking the paper as a whole it seems that Knuth suggests that optimisation should be done on programs run more than once, optimising a program should be the next logical step after debugging and we should aim for as much efficiency as possible where we can achieve it.

A better quote that sums up the paper is "Runtime analysis and optimisation should be standard practice for all but the simplest programmes (and you don't often need to use goto)".







No comments:

Post a Comment