Do not read just code, learn algorithms.

Sunday, April 4, 2004

Joseph Cooney wonders which sourcecode should be read by a programmer to learn to become a better programmer. I'd say: none. At least, not the code for which you don't have the design documents or algorithm descriptions. A lot of code is very bad and it's pretty useless to just read code. The reason for that is that code is the end phase of programming software. What's way more important is the algorithm or set of algorithms the code has to represent. Only then you can learn something, because you can then see the start (algorithm) and end (code) of a transition every developer has to make a lot of times. Only with the algorithms in your hand you can check if the code you're reading is good code or not: if it doesn't describe / represent the algorithm(s) it has to represent, the code is buggy, bad and should be rewritten. You don't know that if you read sourcecode without the algorithm descriptions.

Some developers like to read other people's code to learn 'new tricks'. If a developer says that to you, you immediately know the developer doesn't understand what software engineering is all about. Software engineering and programming isn't about 'tricks'. It's about algorithms and their implementations in sourcecode. The more 'tricks' are used to implement an algorithm, the more an implementation will become unmaintainable and more complicated to understand, while an algorithm is describable in normal text, so the sourcecode should be as simple as the algorithm description. Yes, this results often in dull code, which is not the stuff the developer looking for tricks will get exited about. He/she will probably describe the code as 'not that good', because no trick is used.

If you want to learn something, read the book 'Algorithms' by Robert Sedgewick. Very old (1988), but very good and very real even today. The code is in C, but what's important is to learn to write these algorithms in VB.NET or C# so you'll learn these algorithms back to forth and back so next time in your own code you can make the better decisions. Last time I read it was in 1991 and I opened it last week again, what a wealth of information.

Looking at other people's code, just to learn things is, in my humble opinion, more or less a waste of time. Look at the theory behind the code, you'll learn much more from that. "Why" is it constructed that way, is way more important than "How" is it constructed. And trust me, the Why is not picked up from sourcecode, but from the algorithms the code has to represent.

14 Comments

Hi Frans - thanks for your thoughts. I know of a couple of .NET developers that have been breaking out the old C books recently looking at algorythms (comparing the performance of different approaches I think), but I'm not sure if that is going to help me. Maybe I have too narrow a veiw of what algorythms are, but I'm not exactly called on to implement my own sorting functions on a day to day basis, put it that way. I'm more interested in looking at mundane stuff. How do they comment their code? Variable names? What sort of patterns are they using? Do they use interfaces much? How well factored is it? Are their types mutable? How big are their types? I guess I would probably prefer to read design documents in addition to code, but these are rarely available.

I agree with you re: "tricks". I want "as simple as possible, but no simpler".

JosephCooney - Sunday, April 4, 2004 2:20:00 PM

Couldn't agree more with Frans. Programming is about "why" in the first place and "how" in the last.

Ren&#233; van den Berg - Sunday, April 4, 2004 3:30:00 PM

Glad to see you post a bit more regularly. I often find myself agreeing with your POV (particularly about OS).

Thanks.

Johnny Hall - Sunday, April 4, 2004 3:45:00 PM

"How do they comment their code? Variable names? What sort of patterns are they using? Do they use interfaces much? How well factored is it? Are their types mutable? How big are their types? I guess I would probably prefer to read design documents in addition to code, but these are rarely available. "

Ok, but how are you going to judge the quality of code by simply looking at it? Say you see a lot of interfaces, is that good or bad? You can't say, because the reason interfaces are used is not formulated in the interface code.

Say you see some big classes. Is that bad? You can't say. Perhaps breaking up the classes is semantically wrong, but you can only understand that if you know what kind of design the classes have to represent.

I used the example of an algorithm, as most code written is algorithmic code, which includes datastructures. For example the book by Sedgewick is not solely about sort algo's. (just a small part is). It contains f.e. algorithms which are parallellizable, all kinds of tree algo's etc. which can be handy today in an OO environment as well :). When you see code implementing such an algorithm, is it then good code? You can't say, unless you have the algorithm description too :)

Frans Bouma - Sunday, April 4, 2004 4:24:00 PM

That maybe true 10-20 years earlier. Today with so many APIs and SPECS around i believe the value of pure algorithms as you describe them is close to nothing. I have learn much more about WIN32 API, OO techniques etc reading Delphi's VCL source code than any book i have read on the same subject. Too sad that MS is still keeping BCL source code closed

Panos Theofanopoulos - Sunday, April 4, 2004 6:05:00 PM

@ Panos: so you believe you're not developing software by applying all kinds of algorithms? Read books solely about algorithms, and you'll know what I'm talking about.

I can describe any system with solely algorithms. These are not depending on some kind of language, paradigm or other hype-/time depending aspect. Which means I can implement them in for example C, VB, C# or Haskel. Algorithms stay the same, the implementation just is different per language.

@ Steven:

So after reading and analysing a lot of code you have a lot of data. Is that data also information? Can you learn something from it? I say: not a single bit. I'd go even further: because sourcecode has to (!) describe the algorithm, the sourcecode is irrelevant. WHEN (not IF) we are able to check if an algorithm is implemented correctly in a given language, the code is just there to make the algorithm executable, and for no other purpose.

Just by looking at code you therefore learn nothing: you don't learn the algorithm (because you don't know if it is implemented correctly, if you are not testing that and you can't because you don't have the algorithm descriptions) and you don't learn techniques how to transform an algorithm in code which describes the algorithm.

"Code may in one sense be the end product of the development process, but it really is, in the end, the only part the really matters, because all secondary materials can be old, irrelevant or just wrong."

Code is really the only part that is really irrelevant: because you can reproduce it from its source: the algorithms. Not the other way around. It's not a chicken-egg problem, you definitely know what was there first: the algorithm. The sourcecode is just the representation (you hope!) of the algorithm. Therefore if you don't have the algorithm description anymore, the code's value is void.

Frans Bouma - Sunday, April 4, 2004 6:57:00 PM

there is another classic book called 'Algorithms+Data Structures = Programs' by Wirth..

had that one in my undergrad days.. a good one..

SBC - Sunday, April 4, 2004 9:20:00 PM

I normally agree with most of what you have to say Frans, but this time I disagree. You can learn a lot looking at good code.

Things like how people who love coding translate a general pattern to a language... for example how does the Singleton pattern get implemented in C#, Java, etc. To much more complicated things like how to put together a lot of patterns to support a design goal like extensibility.

I learn't a lot looking at the Open source java (like James, Jakarta etc) stuff a few years ago, it started to expose me to Server design ideas, the idea of contexts etc, i.e. meaning I can now if the situation warrants it write my own servers, rather than relying on things like ASP.Net, COM+ etc.

Also most of the time I am convinced you can quickly get a feel for whether code is good or bad too, if you understand what it is trying to do.

Alex James - Monday, April 5, 2004 10:55:00 AM

I started translating data mining algorithms to code a few years ago and I was very surprised at how difficult it was to re-produce elegant code from an elegant algorithmic blue print.

My production code did not really look like the algorithm I was re-producing. I realized that while I could write commercial everyday code, thinking and coding in an algorithmic fashion was a completely different story.

I am now beta testing my translation of APriori Market Based Analysis Algorithm to a C# Component and I feel a lot better about my ability to create working models of algorithms.

Has translating algorithms to code made me a beter developer, in my opinion, YES.

Is there more to life than algorithms?

Look at Google, take a peek under the hood, what do you see?

Kingsley Tagbo - Monday, April 5, 2004 3:42:00 PM

Frans: I will give you an example of a pure algorithm (away from APIs and specs), its a "problem" that i faced few months ago.

I wanted a "special" sort algorithm to replace the standard Array.Sort (i wanted to inline the compare part of it). First i decompiled the .NET's (it's the standard recursive quicksort algorithm, the same that you find in any book that you mention), then look elsewhere googling, no results just some non-recursive versions of the same. Then i search VCL and after stdlib source code where i found (crt/src/qsort.c) a non-recursive algorithm which had special optimizations to limit the number of comparisons and which for segments smaller than a certain size switch to insertion sort.

Bottom line : If you to learn general principals read an algorithm's book.

if you want production quality algorithms look for the source code of some library or a product. At least is optimized and debugged<g>

Panos Theofanopoulos - Monday, April 5, 2004 4:14:00 PM

Panos: you thus rebuild the algorithm from the code read. Now, how can you be sure the algorithm you've rebuild from the read code is correct?

It might describe what the code does, but it's not necessarily the same as what you are looking for as an algorithm or what you think it should do.

your example is nice: (disclaimer, this is meant to be a general remark, please do not see it as a personal attack, because this is NOT my intention) with a thourough understanding of the quicksort algorithm, you can adjust the ALGORITHM to what you need and then implement it. This saves a tremendous amount of time and you know it's correct because you can proof it is correct. :)

Frans Bouma - Monday, April 5, 2004 4:27:00 PM

No doubt there are too many in our profession that don't understand the fundamentals well, and only look at and copy existing code, without really knowing how it works and why it does what it does. Yet saying that you can reproduce the code from algorithm is like saying your can reproduce the musical performance from the score -- true, but for the student of music you want to be intimate with both.

For a programmer, reading the code of good programs has some parallels with reading classic books for the writer. Grammar and punctuation are key fundamentals, necessary but insufficient for comprehending and producing great writing.

Steven E. Newton - Tuesday, April 6, 2004 5:38:00 PM

What about if you have Robert Sedgewick's book on your shelf AND you look at other peoples code?

Don't you think that you can learn from both sources? I have learnt many things from looking at peoples code that I haven't read about in books. Could it be that you CAN learn the WHY from source code by spending a little time trying to understand it?

Chris Garty - Thursday, April 8, 2004 3:11:00 AM

Learning from code means one is unable to or does not have the capability to understand and learn algorithm(s). Then to implement it one looks of codes written by others. Thats the reason for the popularity of sites like codeguru and codeproject.

Mustafa Ahmad - Thursday, April 29, 2004 11:21:00 AM

Comments have been disabled for this content.