Today I encountered a real-life problem. My wife got some fresh pecan. After we cracked the pecan and took the large pieces, there are some smaller pieces of pecan and shell fragments left in the bowl. My wife asked kids to separate the pecan from the shell. There were three proposals:
1) Pick the pecan out. The trouble is that we have more pecan pieces than shell pieces. The time for this approach is proportional to number of pecan pieces.
2) Pick the shell out. It is not clear whether the time would be small because we still need to scan (with our eye) among pecans to find shell. In additional, it is hard to be sure that we are free of shells at the end.
3) Pour them in water and hope one is heavier than water and the other is lighter. This idea was never tested.
Then I proposed the 4th approach: grab a few pieces in my palm and apply 1) or 2), which ever way is faster. This works out really well.
The key is that after I scan the pecan and shells in my palm I never scan the same pieces again. The fast algorithms, whether searching a word in a string, or sorting a list, or finding the quickest time when traveling from point A to point B, are all by finding a smart way to minimize the operations on the same data.
After my previous post about the stack space, it appears that there is perception from the feedback that recursion is bad and we should avoid deep recursion. After writing a compiler, I know that the modern computer and compiler are complex enough and one cannot automatically assume that a hand crafted code would out-perform the compiler optimization. The only way is to do some prototype to find out.
So why recursive code may not perform as well? Compilers place frames on a stack. In additional to arguments and local variables, compiles also need to place frame and program pointers on the frame, resulting in overheads.
So why hand-crafted code may not performance as well? The stack used by a compiler is a simpler data structure and can grow and shrink cleanly. To replace recursion with out own stack, our stack is allocated in the heap that is far more complicated to manage. There could be overhead as well if the compiler needs to mark objects for garbage collection. Compiler also needs to worry about the memory fragmentation.
Then there is additional complexity: CPUs have registers and multiple levels of cache. Register access is a few times faster than in-CPU cache access and is a few 10s times than on-board memory access. So it is up to the OS and compiler to maximize the use of register and in-CPU cache.
For my particular problem, I did an experiment to rewrite my c# version of recursive code with a loop and stack approach. So here are the outcomes of the two approaches:
Recursive call Loop and Stack Lines of code for the algorithm 17 46 Speed Baseline 3% faster Readability Clean Far more complex
So at the end, I was able to achieve 3% better performance with other drawbacks. My message is never assuming your sophisticated approach would automatically work out better than a simpler approach with a modern computer and compiler. Gage carefully before committing to a more complex approach.
Supposing we are running a recursive algorithm on a very large data set that requires, say, 1 million recursive calls. Normally, one would solve such a large problem by converting recursion to a loop and a stack, but what if we do not want to or cannot rewrite the algorithm?
Python has the sys.setrecursionlimit(int) method to set the number of recursions. However, this is only part of the story; the program can still run our of stack space. C# does not have a equivalent method.
Fortunately, both C# and Python have option to set the stack space when creating a thread. In C#, there is an overloaded constructor for the Thread class that accepts a parameter for the stack size:
Thread t = new Thread(work, stackSize);
In Python, we can set the stack size by calling:
We can then run our work under a new thread with increased stack size.