Unconventional Ways To Avoid SQL Recursion

Wednesday, March 10, 2004

Ken Robertson asks, “Which is better recursion or temp tables?” I would suggest that you don't have to do either. Given Ken's case (calculating the number of nodes in a tree structure for displaying image counts), it is likely that this query is going to run quite often, which means that you don't want to lock up your server to preform recursive operations every time a node is requested. One option is to cache the data somewhere, which is a pretty good option in some cases, but in this case, we want real time results. An unconventional way to avoid this breaks normalization rules slightly, but probably offers the best overall solution compared to some of the more complex ways you can do this (not to mention that it is a heck of a lot easier to code than the alternatives). Add a "path" column to each node. Now, when you run your queries, you can do something like this:

SELECT Count (*) from [Node] inner join [Item] ... Where [Node].[Path] Like (@BasePath+'%')

This could drastically improve the performance of any recursive operations. The only thing you have to make sure to do is update paths if the node is moved:

UPDATE [Node] Set [Path] = @NewPath+RIGHT([Path], len([Path])-len(@OldPath)) WHERE [Path] Like (@OldPath+'%')

Of course, moves are little slower than just changing a FK, but assuming moves don't happen with insanely high frequency, this should be perfectly reasonable (considering the time you save on selects). If you can wait till Yukon, it should have support for recursive queries built in (SQL3 syntax I assume?)...

2 Comments

Just so happens that I spent all last week searching all over the net and bookstores for what people were doing about recursions, to solve a problem I am working on, and found precious few theories.

I offer the following list as what I have come across so far in case it may be of help to anyone:

* AdjacentLists (ParentID, ChildID)

+ can GetParent, GetChildren

- but requires recursion for GetAnscestors,

+ easy to Move, Copy a node.

GetDescendants, GetIndent

* Celko's NestedSets

- cannot GetParent, GetChildren (is this correct? if not, some code samples would be nice to see)

+ can GetParent, GetChildren, GetDepth

- hard as hell to Move, Copy. Really. And is it portable to different databases that don't have stored procedures (mysql for one).

* Farrady(sp?) Fractals variation on Celko's...

+ Looks fantastic since it would solve Celko's update mess...but

- Heavy reliance on non ODBC stored procedures...and

- I saw a couple of people asking about math overflow problems...might just be a 'theory' not practical in usage.

* Materialized Lists (which was proposed at top of this page)

+ can do all GetParent, GetChildren, Anscestors, depth, nice. simple. relies on standard SQL.

* There was one other person who came across with a means of mapping every Vertex to every parent vertex - + Fast

- Impossible to know who is a child, or a descendant.

The only problem with all the above except (MAYBE!) AdjacentLists is how to keep them synched from a PocketPC to a server. Anybody, who can show conclusively that a Tree (or Graph) can be synched between a pocketPC and Server, allowing nodes to be moved around on both, and then synched up...

Sky - Saturday, April 10, 2004 11:34:00 PM

Hello!

I've been working on a hybrid of the Celko tree for quite some time, where I include the best from the adjacency model, the celko model and allows for frequent inserts.

As I read through his latest book called "Trees and Hierarchies in SQL for Smarties" I realized, that another person already had the same idea and presented it to Joe Celko (who included it in his book).

Basically you spread the distance between the nodes with a fixed interval, allowing for frequent inserts without having to reorganize (or affect) the entire tree for each insert operation.

Part of the logic that handles the insertion makes sure there's enough spread distance in the parent node before the insertion - if not, it initiates a recursive operation that walks the ancestor nodes until enough spread distance is available, then reorganizes the descending nodes (phew, what a long sentence).

I agree that the nested sets model relies more on stored procedures, but it's the best tree I've seen so far for most uses of hierarchical lists on the web.

The fact alone that you have a complete set of XPath axes (Ancestor, AncestorOrSelf, DescendantOrSelf etc.) available from the single SQL operation is extremely flexible (and fast).

If you're interested in learning more about this, don't hesistate to e-mail me at andersb[at]proactive.dk (please note the e-mail obfuscation).

With regards

Anders Borum / Denmark

Anders Borum - Tuesday, June 1, 2004 6:39:00 PM

Comments have been disabled for this content.