Theory and Practice of Database and Data Analysis (3) – Exhaustive Search by Traversing Table Relationships
In this post, I will discuss about building a script to find all the relevant data starting from a table name and primary key value. In the previous post, I have discussed how to a list of tables that references a table. From a parent table, we just need to get a list of child tables. We then loop through each child table to select records using the parent primary key value. If the child tables themselves have child tables, we repeat the same approach by recursively getting the grand children.
I often need to remote-desktop into clients system where I only have read privilege; I can neither create stored procedures nor execute them. So I am going to present a script that requires only read privilege.
In transact-SQL (T-SQL) batch, looping through records can be implemented using cursor or temp table. I used the later approach as temp table can grow so that I can also use it for recursion as well.
Recursion is a harder problem here. Recursion can be done in most of programming languages by calling procedures recursively. However, that is not possible here since I am restricted to a batch. In computer science, recursion can be implemented with a loop with a stack. In many programming languages, when a caller calls a callee, the program would save the current local data as well as the returning location of the caller into the stack. The area in the stack used by each call is called a stack frame. Once the callee returns, the program will restore the local data from the stack and resume from the previous location.
In my script, I use temp table #ref_contraints to accomplish both looping and recursion. referential constraints from the top table are added to the table as they are discovered and removed from the table when they are consumed (that is, no longer needed). I used the depth-first traversal in my script. With this structure, I can change to breath-first traversal with minimum efforts.
The #keyvalues temp table contains the primary key values for the table that I have already traversed. I can get the records from the child tables by joining to the key values in this table so that I only need to query each child table once for each foreign key relationship.
In order to make the script simpler, I eliminate the schema and assume all tables are under the schema “dbo”. This works with our database, and works with databases in Microsoft Dynamics CRM 2011.
I have also assumed that all primary keys contain only one integer column. This is true with out database. This is also true with Microsoft Dynamics CRM 2011 except you need to change integer id to guid.
I added comments the script so that one would know where the code would correspond to looping and procedure calling if the code is written in a language like VB.NET or C#. The comments also indicate the insertion points if additional code is needed.
In future posts of this series, I will discuss how to capture human intelligence to make search even more powerful.