Storing/Retrieving Hierarchies In SQL Server Database


Hierarchies are one of the most important part of many applications. It can be in shape of categories, family tree, organization charts etc. Maintaining hierarchies in data source like XML is not a big deal but when it comes to the database like SQL Server which is consist of flat table in terms of storing data, then it become a challenge.

To overcome this (which I am sure most of us did ), we use recursion. Implementing recursion is very easy and can be applied on both back-end (by using cursors) and front-end. But yet recursive operation can be a great mess if hierarchy grow larger. Lots of repetitive quires and n number of database connection can make your application way slow.

So, the idea of the moment is to use the hierarchical data without using recursion. while searching for the solution I came across “Storing Hierarchical Data in a Database” by Gijs Van Tulder. In this article, the Dutch guy mention two method for the processing of Hierarchal data. One is the same recursion which we have discussed above and the other one is ‘modified preorder tree traversal' algorithm.

Modified preorder tree traversal :

To make you understand better let me quote what he mention in his article.

We'll start by laying out our tree in a horizontal way. Start at the root node (‘Food'), and write a 1 to its left. Follow the tree to ‘Fruit' and write a 2 next to it. In this way, you walk (traverse) along the edges of the tree while writing a number on the left and right side of each node. The last number is written at the right side of the ‘Food' node. In this image, you can see the whole numbered tree, and a few arrows to indicate the numbering order.

sitepoint_numbering

 

We'll call these numbers left and right (e.g. the left value of ‘Food' is 1, the right value is 18). As you can see, these numbers indicate the relationship between each node. Because ‘Red' has the numbers 3 and 6, it is a descendant of the 1-18 ‘Food' node. In the same way, we can say that all nodes with left values greater than 2 and right values less than 11, are descendants of 2-11 ‘Fruit'. The tree structure is now stored in the left and right values. This method of walking around the tree and counting nodes is called the ‘modified preorder tree traversal' algorithm.

Sources : http://www.sitepoint.com/article/hierarchical-data-database/2/

Note : Let me keep myself honest, the above two paragraphs and image is by Gijs Van Tulder so He deserve the credit of this content. I am just sharing the T/SQL implementation of this algorithm.

I again strongly recommend you to read the article completely  before going onward.

Schema :

Ok to start, lets create a following table

tblComments

ID is of course the primary key, where as comments is the simple text. Left and Right are the indexes of the child records where as level will show us at what level this node belongs to. so that we can set the tab index and indentation where as Entity Id is to group the nodes because in this table hundred different nodes set. So in short, Entity Id is to separate the different node set in a single table.

Following is the script of the table

   1: CREATE TABLE [dbo].[tblComments](
   2:     [Id] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
   3:     [Comments] [nvarchar](2000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
   4:     [Left] [int] NULL,
   5:     [right] [int] NULL,
   6:     [Level] [int] NULL,
   7:     [EntityId] [int] NULL,
   8:  CONSTRAINT [PK_tblComments] PRIMARY KEY CLUSTERED 
   9: (
  10:     [Id] ASC
  11: )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
  12: ) ON [PRIMARY]

 

Add Record:

Now to add the record in this table, lets have a look at the following procedure

   1: CREATE PROCEDURE [dbo].[InsertComments]
   2:     @Comments nvarchar(2000),
   3:     @EntityId int,
   4:     @ParentId int = null
   5: AS
   6: BEGIN
   7:     Declare @Left int
   8:     Declare @Right int
   9:     Declare @Level int
  10:     declare @RightParent int 
  11:     declare @rightParentOrig int
  12:     declare @leftParentOrig int
  13:     
  14:     set @RightParent = null
  15:     
  16:     if @ParentId is null 
  17:     begin
  18:         set @Left = 1
  19:         set @Right = 2
  20:     end
  21:     else
  22:     begin
  23:         select @rightParentOrig = [right],@leftParentOrig =[left] , @Right = [right] + 1,@Left=[right],@Level=[Level]+1 from tblComments where id = @ParentId
  24:         set @RightParent =  @Right  + 1
  25:     end
  26:     
  27:     UPDATE tblcomments 
  28:     set 
  29:     [right] = [right] + 2,
  30:     [left] = [left] + 2
  31:     where 
  32:     [left] >=  @Left and entityId = @EntityId
  33:  
  34:     UPDATE tblcomments 
  35:     SET 
  36:     [right] = [right] + 2
  37:     WHERE [left] < @leftParentOrig and [right] > @rightParentOrig 
  38:  
  39:     INSERT INTO [tblComments]([Comments],[Left],[right],[Level],[EntityId])
  40:      VALUES
  41:            (@Comments,@Left,@Right,@Level,@EntityId)
  42:         
  43:     if @RightParent is not null 
  44:         update tblComments set [right] = @RightParent where id = @ParentId
  45:         
  46: END

Ok, now let me define the code line by line

First 15 : Well, these lines are quit easy to understand, we have declared some variables which we will use in next lines.
Line no 16-24 : We are checking if the ParentId is null, then of course the node is the parent it self, that is why setting left to 1 and right to 2. But when the parent is specified, we need to set the right and left value according to the parent (+1 of the parent) and also need to +1 the level.
Line no 27-37 : Consider a situation, where you have a tree completed and you want to add a node in the middle of the tree. These are the lines for that. Here, we are Creating space of the node by increasing the left and right values of the sibling nodes.
Line no 39: This is the simple insert statement, where we are adding our record to the table with calculated right and left.
Line no 43: In this statement, we are updating the right side of the parent node.

Now Add some records using the above procedures, and do the select as following

Select Records:
   1: select * from tblComments where EntityId = 1
   2: order by [left] ,[right]
   3: -- 2nd option
   4: select * from tblComments where [left] >= 1 and [right] <= 10 and EntityId = 1
   5: order by [left] ,[right]

Both the above query will return the following result set.

 

tblcommentsresult

Now you can see the first record i.e id=30 is the parent record. So, the left value is 1 and the right is 10 and the level is 1. where as when you see the second record, it is the child of the first record that is why level is 2 and left is +1 greater then the parent and because all the records up to id=35 are it’s child so it has right index +1 greater then the last child.

Now if you want to see the child of id=32 you can do as follows

   1: declare @left int,
   2: @right int, 
   3: @EntityId int
   4:  
   5: select @left= [left],  @right= [right],@EntityId =  EntityId from tblComments where Id = 32
   6:  
   7: select * from tblComments where [left] >= @left and [right] <= @right and EntityId = @EntityId
   8: order by [left] ,[right]

In this way, you can exactly see the child nodes of each node.

Delete Records:

Now, lets see how can we remove the record from this table as hierarchy is depend upon the left and right index.

   1: CREATE PROCEDURE [dbo].[DeleteComments]
   2:     @CommentId int
   3: AS
   4: BEGIN
   5: Declare @Left int
   6: Declare @Right int
   7: declare @EntityId int
   8:  
   9: select @Left = [left], @Right = [right], @EntityId = [EntityId] from tblComments 
  10: where id = @CommentId
  11:  
  12:  
  13: delete from tblComments 
  14: where id = @CommentId
  15:  
  16: update tblComments 
  17: set [left] = [left] - 2,
  18: [right] = [right] - 2
  19: where [left] >= @Left and entityId = @EntityId
  20:  
  21:  
  22: UPDATE tblcomments 
  23:     SET 
  24:     [right] = [right] - 2
  25:     WHERE [left] < @Left and [right] > @Right
  26:     
  27: END

In Line no 9-10 : we are storing the left, right and entity of the record that will be deleted.
In Line no 13-14 : delete the record from table.
In Line no 16- end: updating the right and left values of the parent and sibling nodes by –2.

Conclusion:

This way, we can easily achieve the task of Hierarchal data without recursion. One more time, with full respect I like to give all the credit to Gijs Van Tulder. I have just made small modification in this to make it work according to me to.

Published Monday, March 16, 2009 7:45 PM by aghausman12
Filed under: ,

Comments

# re: Storing/Retrieving Hierarchies In SQL Server Database

Wednesday, August 5, 2009 4:47 AM by satalaj

Really nice article

# re: Storing/Retrieving Hierarchies In SQL Server Database

Thursday, August 13, 2009 3:02 AM by Keef

Cool stuff, thank you!

Just only one thing.. I think that in the Add procedure, the line no 37 should have an "entityId = @EntityId" in order to avoid updating all nodes from different trees.

# re: Storing/Retrieving Hierarchies In SQL Server Database

Saturday, August 29, 2009 7:49 AM by Robert

Thanks for the article, I'm trying to implement this for a project I'm doing right now. The link has been passed around sqlservercentral.com. Thanks!!

# re: Storing/Retrieving Hierarchies In SQL Server Database

Wednesday, October 14, 2009 11:34 PM by Yogi

very nice article. i adopt it for genealogy system. thank you very much :)

Leave a Comment

(required) 
(required) 
(optional)
(required)