in

ASP.NET Weblogs

Tim Walters .NET Blog

Microsoft .NET, along with XML, XSL, XSD, SQL and other cool stuff

Hierarchical XML - importing into SQL Server

Loading XML into a SQL Server 2000 database is now rather easy thanks to the inbuilt XML features (such as OPENXML) and the freely downloadable SQLXML add-on.

It becomes a little more complicated when loading in a hierarchy unless you have some way to uniquely identify each element, or you're comfortable getting deep into the Updategram structure supported by SQLXML (be ready to add lots of "at-identity" attribute to your XML).

At first glance it would seem impossible to import an infinite-depth hierarchy, as you would need your SQL statements to be recursive.

I pondered over this very problem as I had a need to import information like this, and now I am sharing my findings.

Firstly lets start with some simple XML:

<orgTree>
    <Employee FirstName="Andrew" LastName="Fuller" Title="Vice President, Sales">
        <Employee FirstName="Nancy" LastName="Davolio" Title="Sales Representative" />
        <Employee FirstName="Janet" LastName="Leverling" Title="Sales Representative" />
        <Employee FirstName="Margaret" LastName="Peacock" Title="Sales Representative" />
        <Employee FirstName="Steven" LastName="Buchanan" Title="Sales Manager">
            <Employee FirstName="Michael" LastName="Suyama" Title="Sales Representative" />
            <Employee FirstName="Robert" LastName="King" Title="Sales Representative" />
            <Employee FirstName="Anne" LastName="Dodsworth" Title="Sales Representative" />
        </Employee>
        <Employee FirstName="Laura" LastName="Callahan" Title="Inside Sales Coordinator" />
    </Employee>
</orgTree>

If you're interested in how I got that XML from the Northwind database, please see a related article "Hierarchical XML - exporting from SQL Server" (link coming soon).

As we can see from the XML, it could theoretically continue in depth forever. How do we process that!?

Take a look at the following SQL:

USE [tempdb]
if exists (select * from dbo.sysobjects where id = object_id(N'[Employees]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [Employees]
GO
CREATE TABLE [Employees] (
    [EmployeeID] [int] IDENTITY (1, 1) NOT NULL ,
    [ReportsTo] [int] NULL,
    [LastName] [nvarchar] (20) NOT NULL ,
    [FirstName] [nvarchar] (10) NOT NULL ,
    [Title] [nvarchar] (30) NULL ,
    CONSTRAINT [PK_Employees] PRIMARY KEY  CLUSTERED 
    (
        [EmployeeID]
    ),
    CONSTRAINT [FK_Employees_Employees] FOREIGN KEY 
    (
        [ReportsTo]
    ) REFERENCES [Employees] (
        [EmployeeID]
    )
)
GO
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[ImportEmployeeHierarchyXml]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[ImportEmployeeHierarchyXml]
GO

CREATE PROC dbo.ImportEmployeeHierarchyXml
(
    @Xml text,
    @Debug bit = 0
)
AS
DECLARE 
    @xmlDoc int,
    @intCacheID int,
    @intDbID int,
    @intLoopCount int,
    @intMaxLoopCount int
DECLARE @tblCache TABLE
(
    ID int NOT NULL, -- XML ID
    ParentID int NULL, -- Parent XML ID
    DbID int NULL, -- Database ID (once inserted)
    ParentDbID int NULL, -- Parent Database ID (once parent is inserted)
    -- ** all 'content' fields to follow
    FirstName varchar(10),
    LastName varchar(20),
    Title varchar(30)
)
-- initial value
SET @intLoopCount = 0
-- safety net to prevent a dead loop, remove it once we're sure everything is OK
SET @intMaxLoopCount = 10000
-- get a XML DOM handle from the XML text
EXEC sp_xml_prepareDocument @xmlDoc OUTPUT, @Xml
-- put into Cache to allow 'Identity Matching'
INSERT INTO @tblCache
(
    -- required fields for matching
    ID,
    ParentID,
    -- ** data fields for inserting
    FirstName,
    LastName,
    Title
)
SELECT
    -- required fields for matching
    ID,
    CASE
        -- root node(s)
        -- ** Element name required
        WHEN [ParentID] IS NULL OR [ParentID] = 0 OR ParentNodeName <> 'Employee' THEN NULL
        -- other node(s)
        ELSE [ParentID]
    END AS [NewParentID],
    -- ** data fields for inserting
    FirstName,
    LastName,
    Title
FROM
    -- ** Element name required
    OPENXML( @xmlDoc, '//Employee', 8 )
    WITH
    (
        -- required fields for matching
        [ID] int '@mp:id',
        [ParentID] int '@mp:parentid',
        [ParentNodeName] varchar(50) '@mp:parentlocalname',
        -- ** data fields for inserting
        [FirstName] varchar(10) '@FirstName',
        [LastName] varchar(20) '@LastName',
        [Title] varchar(30) '@Title'
    )
IF @Debug = 1
BEGIN
    SELECT * FROM @tblCache
END
-- get first item
SELECT TOP 1 @intCacheID = ID 
FROM @tblCache 
WHERE 
    DbID IS NULL 
    AND
    (
        -- root node
        ParentID IS NULL
        OR
        -- node who has a parent in the DB
        ParentDbID IS NOT NULL
    )
WHILE @intCacheID IS NOT NULL AND @intLoopCount < @intMaxLoopCount
BEGIN
    -- add to database table
    INSERT INTO Employees
    (
        -- ** name of 'ParentID' field
        ReportsTo,
        -- ** data fields for inserting
        FirstName,
        LastName,
        Title
    )
    SELECT TOP 1
        ParentDbID,
        -- ** data fields for inserting
        FirstName,
        LastName,
        Title
    FROM 
        @tblCache
    WHERE
        ID = @intCacheID
    -- get new database ID
    SET @intDbID = SCOPE_IDENTITY()
    -- update DbID
    UPDATE @tblCache
    SET DbID = @intDbID
    WHERE ID = @intCacheID
    -- update ParentDbID
    UPDATE @tblCache
    SET ParentDbID = @intDbID
    WHERE ParentID = @intCacheID
    -- set to NULL first incase following returns no rows
    SET @intCacheID = NULL
    -- get next item
    SELECT TOP 1 @intCacheID = ID 
    FROM @tblCache 
    WHERE 
        DbID IS NULL 
        AND
        (
            -- root node
            ParentID IS NULL
            OR
            -- node who has a parent in the DB
            ParentDbID IS NOT NULL
        )
    
    -- increase the loop counter
    SET @intLoopCount = @intLoopCount + 1
END
IF @Debug = 1
BEGIN
    SELECT * FROM @tblCache
    SELECT * FROM Employees
END
        
EXEC sp_xml_removeDocument @xmlDoc

The above code does a few things:

  1. Because this is a sample, it switches to the "tempdb" database (this database is emptied each time SQL Server starts up).
  2. Creates a new table called "Employees", with a sub-set of the fields in the Northwind.dbo.Employees.
  3. Creates a new stored procedure called "ImportEmployeeHierarchyXml" which does the following:
    1. Declares an in-memory table to cache the XML, note the comment starting with "-- **" as this is where you'll need to change it to customise it for your own setup.
    2. Sets some initial loop checking variables, this is just a safety check to prevent a dead loop, currently it will loop a maximum of 10,000 times.
    3. Get a handle to an XML DOM object for the XML text passed in.
    4. Load the XML into the cache table variable using OPENXML.
    5. Grabs the ID of the first cache item to add.
    6. Executes a while loop to insert each item, update it's cache entry with database ID's, and grab the ID for the next item.

Most of the code is pretty straight-forward and easy to follow, but some things might be worth explaining further:

  • The OPENXML statement provides some extra meta-property attributes which I have used, one of particular note is "@mp:parentid" which will return either NULL (if the current node is the root), 0 (if the parent node is the root node), or an integer that allows us to uniquely identify the parent. The case statement allows us to set base level node(s) in our tree to have a parent id of NULL.
  • The stored procedure takes an optional @Debug parameter, which if set to 1 will output the before/after view of the Cache table, and the after view of the Employees table to help you debug.
  • Anywhere you see a comment that starts with two *'s (-- ** comment) it's something you'll need to update if you use this for your own purposes.
  • This methodology assumes that your table uses an IDENTITY column to uniquely identify each row, if you use something different you'll need to make a few changes.

The next step is to try it out, here's a sample script you can run:

USE [tempdb]
TRUNCATE TABLE Employees
EXEC ImportEmployeeHierarchyXml '<orgTree>
    <Employee FirstName="Andrew" LastName="Fuller" Title="Vice President, Sales">
        <Employee FirstName="Nancy" LastName="Davolio" Title="Sales Representative" />
        <Employee FirstName="Janet" LastName="Leverling" Title="Sales Representative" />
        <Employee FirstName="Margaret" LastName="Peacock" Title="Sales Representative" />
        <Employee FirstName="Steven" LastName="Buchanan" Title="Sales Manager">
            <Employee FirstName="Michael" LastName="Suyama" Title="Sales Representative" />
            <Employee FirstName="Robert" LastName="King" Title="Sales Representative" />
            <Employee FirstName="Anne" LastName="Dodsworth" Title="Sales Representative" />
        </Employee>
        <Employee FirstName="Laura" LastName="Callahan" Title="Inside Sales Coordinator" />
    </Employee>
</orgTree>'
SELECT * FROM Employees

Once again we're using the tempdb. First we make sure the Employees table is empty, then we execute our stored procedure, then select all the rows to see if it was all added as we expected. At this point your results should be rather satisfying!

So, here I've demonstrated how it can be done, all that's left is for your to play with it and see if it suits your needs.

I'm keen to hear any feedback you have, good or bad. I'll update this article as required if you spot anything that needs changing (I spotted a few errors myself already, but I'll wait till someone else points them out first! :-) How many can you find?).

A challenge: If you're keen to extend this, try designing something to handle an XML document with different types of nodes, e.g. an XML representation of a file system that has <Folder> and <File> nodes, each with their own attributes.

Published Feb 25 2004, 03:00 PM by XMLEvangelist
Filed under:

Comments

 

TrackBack said:

February 24, 2004 9:20 AM
 

Stefano Demiliani said:

Interesting article, really very good!
February 25, 2004 3:57 AM
 

alex said:

Waiting for your "Hierarchical XML - EXPORTING from SQL Server" article...
July 7, 2004 5:11 PM

Leave a Comment

(required)  
(optional)
(required)  
Add