Getting the Topmost Hierarchical Parent in T-SQL

Wednesday, May 14, 2014

Introduction

It is normal in databases to have hierarchical tables, that is, tables that are related with themselves, forming a parent-child relation. For example, consider this:

The parent_id column points to the parent record, which, in some cases, will not exist.

So, imagine we have a number of records, such as:

        1: INSERT INTO dbo.list (id, parent_id) VALUES (1, NULL)

       2: INSERT INTO dbo.list (id, parent_id) VALUES (2, 1)

       3: INSERT INTO dbo.list (id, parent_id) VALUES (3, 2)

       4: INSERT INTO dbo.list (id, parent_id) VALUES (4, 3)

How can we find the id of the topmost parent? In this case, it will always be 1, of course.

In SQL Server, we have two options:

A Common Table Expression (CTE);
A recursive function.

Let’s see how to implement each.

Common Table Expression Approach

We need to write a CTE that starts with some record and goes all the way up until it finds the parent. Let’s wrap it in a nice scalar function:

       1: CREATE FUNCTION dbo.GetTopmostParentCTE

       2: (

       3:     @id INT

       4: ) 

       5: RETURNS INT

       6: AS

       7:     BEGIN

       8:         DECLARE @parentId INT

       9:  

      10:         ;WITH cte AS 

      11:         (

      12:             SELECT a.id, a.parent_id

      13:             FROM dbo.list AS a 

      14:             WHERE a.id = @id

      15:             UNION ALL

      16:             SELECT b.id, b.parent_id 

      17:             FROM dbo.list AS b

      18:             INNER JOIN cte AS c

      19:             ON c.parent_id = b.id

      20:         )

      21:  

      22:         SELECT TOP 1 @parentId = id

      23:         FROM cte

      24:         WHERE parent_id IS NULL

      25:  

      26:         RETURN @parentid

      27:     END

      28: GO

I won’t explain here how CTEs work, they have been around for quite some time, and there are several posts how there for that.

Recursive Function Approach

The other approach is using a recursive function. The gotcha here is that when we create a function, it is compiled, and if it has a reference to itself – which doesn’t exist first – it will fail. Therefore, we need to first create a dummy function and then change it to do what we want:

       1: CREATE FUNCTION dbo.GetTopmostParent

       2: (

       3:     @id INT

       4: )

       5: RETURNS INT

       6: AS

       7: BEGIN

       8:     RETURN

       9:     (

      10:         SELECT 0

      11:     )

      12: END

      13: GO

      14:  

      15: ALTER FUNCTION dbo.GetTopmostParent

      16: (

      17:     @id INT

      18: )

      19: RETURNS INT

      20: AS

      21: BEGIN

      22:     RETURN

      23:     (

      24:         SELECT CASE WHEN parent_id IS NULL THEN id ELSE dbo.GetTopmostParent(parent_id) END

      25:         FROM dbo.list

      26:         WHERE id = @id

      27:     )

      28: END

      29: GO

Conclusion

You can get results from the two functions by running the following T-SQL queries:

       1: SELECT dbo.GetTopmostParent(4)

       2: SELECT dbo.GetTopmostParentCTE(4)

Interesting, both execution plans are exactly the same:

I can’t really recommend one over the other, since from my tests, both took the same amount of time (you will need far more records than the ones from my sample to tell that).

So, any thoughts from database gurus out there?

2 Comments

Hi, allthough the plans are the same, I can see much less reads and duration when using the profiler when using the recursive function. So it seems to me the recursive function performs better. BTW: the TOP 1 is not needed in the recursive function.

Hugo - Wednesday, May 14, 2014 7:14:37 AM

Hi, Hugo!

Thanks, I had that same impression, but I'm not an expert on databases.

Keep coming by!

Ricardo Peres - Wednesday, May 14, 2014 9:25:05 AM