One of my websites (forever unfinished, although we set May 1st as the day that we're “done“) is a comic book website that a graphic designer friend of mine and I work on in our spare time. This was the first site that I did end-to-end in ASP.NET and C# and, of course, there's a lot of things I would do differently now, a year and a half later. Basicly, the site is a database of characters, comic books and appearances (and it's a great excuse to give to our wives/girlfriends about why a couple of guys who are ostensibly adults need to go to a comic book convention). Since this was something we were doing to improve our repective skills and try new things, I made a very simple web service that allows you do to a “Back to Bacon” type of search of our database where you give it two characters from comic books and the database tries to connect them (ie, you say “A” and “B”, and the database says “A” appeared with “C” who appeared with “D” who appeared with “B” and there's your connection). The meat of the web service is the stored procedure that drives the whole thing, and it accepts two strings, each one being a character name and starts the search, and that's what I want to look at today. What you'll need to know is that there is a table called “appearances“ that contains information about characters, “issues“ that contains information about specific issues of comics and “appearance_xref“ which resolves the two and logs which characters appeared in which comics. I inherited the database schema from another programmer, so I didn't name those tables.
CREATE PROCEDURE spMarvel_CharacterSeperation @character_one varchar(100), @character_two varchar(100)
AS
set nocount on
declare @MAX_DEPTH int
select @MAX_DEPTH = 6
declare @appearance1 int
declare @appearance2 int
select @appearance1 = appearance_id from appearances where name = @character_one
select @appearance2 = appearance_id from appearances where name = @character_two
declare @intDegrees int
declare @blnDone bit
declare @strIssueName varchar(300)
declare @intFirewallCount int
declare @strName1 varchar(100)
declare @strName2 varchar(100)
declare @intIDToKeep int
if (@appearance1 is null)
begin
RAISERROR('Character 1 not found',16,1)
return (-1)
end
if (@appearance2 is null)
begin
RAISERROR('Character 2 not found',16,1)
return (-1)
end
create table #tbTempDegrees
(
appearance_one int,
appearance_two int,
name_one varchar(100),
name_two varchar(100),
issue_id int,
issue_name varchar(300),
dos int,
intState int -- 0 = just added, 1 = searching, 2 = searched
)
-- begin search
insert into #tbTempDegrees (appearance_two, appearance_one, issue_id, issue_name, dos, intState)
select distinct(x2.appearance_id), x1.appearance_id, Min(x2.issue_id), null, 1, 0
from appearance_xref x2 join
appearance_xref x1 on (x2.issue_id = x1.issue_id)
where x1.appearance_id = @appearance1 and x2.appearance_id <> @appearance1
group by x2.appearance_id, x1.appearance_id
select @intDegrees = 1
select @blnDone = 0
select @intFirewallCount = 0
while @blnDone = 0
begin
-- firewall code
select @intFirewallCount = @intFirewallCount + 1
if (@intFirewallCount > 1000)
begin
drop table #tbTempDegrees
RAISERROR('Infinite Loop encountered', 16, 1)
return (-1)
end
-- check to see if we're done with the search
if (select count(*) from #tbTempDegrees where appearance_two = @appearance2) > 0
select @blnDone = 1
select @intDegrees = @intDegrees + 1
if @intDegrees > @MAX_DEPTH
begin
drop table #tbTempDegrees
RAISERROR('Searched too far', 16, 1)
return (-1)
end
-- if we're not done with the search, add the next level
if @blnDone = 0
begin
update #tbTempDegrees set intState = 2 where intState = 1
update #tbTempDegrees set intState = 1 where intState = 0
insert into #tbTempDegrees (appearance_two, appearance_one, issue_id, issue_name, dos, intState)
select distinct(x2.appearance_id), x1.appearance_id, Min(x2.issue_id), null, @intDegrees, 0
from appearance_xref x2 join
appearance_xref x1 on (x2.issue_id = x1.issue_id)
where x1.appearance_id in (select appearance_two from #tbTempDegrees where intState = 1)
and x2.appearance_id not in (select appearance_two from #tbTempDegrees where intState = 1)
and x2.appearance_id not in (select appearance_one from #tbTempDegrees where intState > 0)
group by x2.appearance_id, x1.appearance_id
end
end
-- now we should be done, and the connection should have been made ...
-- so we delete the extraneous data
select @intDegrees = MAX(dos) from #tbTempDegrees
delete from #tbTempDegrees
where dos = @intDegrees and appearance_two <> @appearance2
-- check for multiple paths
if (select count(*) from #tbTempDegrees where appearance_two = @appearance2 and dos = @intDegrees) > 1
begin
select @intIDToKeep = MIN(appearance_one) from #tbTempDegrees where appearance_two = @appearance2 and dos = @intDegrees
delete from #tbTempDegrees where appearance_one <> @intIDToKeep and appearance_two = @appearance2 and dos = @intDegrees
end
select @strIssueName = (series + ' #' + cast(issue as varchar(10))) from issue where issue_id = (select MAX(issue_id) from #tbTempDegrees where dos = @intDegrees)
select @strName1 = name from appearances where appearance_id = (select MAX(appearance_one) from #tbTempDegrees where dos = @intDegrees)
select @strName2 = name from appearances where appearance_id = (select MAX(appearance_two) from #tbTempDegrees where dos = @intDegrees)
update #tbTempDegrees
set issue_name = @strIssueName, name_one = @strName1, name_two = @strName2
where dos = @intDegrees
while @intDegrees > 1
begin
-- firewall code
select @intFirewallCount = @intFirewallCount + 1
if (@intFirewallCount > 1000)
begin
drop table #tbTempDegrees
RAISERROR('Infinite Loop encountered', 16, 1)
return (-1)
end
select @intDegrees = @intDegrees - 1
delete from #tbTempDegrees
where dos = @intDegrees and appearance_two not in (select appearance_one from #tbTempDegrees where dos = (@intDegrees + 1))
-- check for multiple paths
if (select count(*) from #tbTempDegrees where appearance_two in (select appearance_one from #tbTempDegrees where dos = (@intDegrees + 1)) and dos = @intDegrees) > 1
begin
select @intIDToKeep = MAX(appearance_one) from #tbTempDegrees where appearance_two in (select appearance_one from #tbTempDegrees where dos = (@intDegrees + 1)) and dos = @intDegrees
delete from #tbTempDegrees where appearance_one <> @intIDToKeep and appearance_two in (select appearance_one from #tbTempDegrees where dos = (@intDegrees + 1)) and dos = @intDegrees
end
select @strIssueName = (series + ' #' + cast(issue as varchar(10))) from issue where issue_id = (select MAX(issue_id) from #tbTempDegrees where dos = @intDegrees)
select @strName1 = name from appearances where appearance_id = (select MAX(appearance_one) from #tbTempDegrees where dos = @intDegrees)
select @strName2 = name from appearances where appearance_id = (select MAX(appearance_two) from #tbTempDegrees where dos = @intDegrees)
update #tbTempDegrees
set issue_name = @strIssueName, name_one = @strName1, name_two = @strName2
where dos = @intDegrees
end
-- clean up
select distinct name_one, appearance_one, name_two, appearance_two, issue_id, issue_name, dos
from #tbTempDegrees
order by dos
drop table #tbTempDegrees
set nocount off
Let's go through all that code, shall we? The first thing we do is to declare all our local variables. I know that in Code Complete, they say that you should declare them all when you need them, and in my C# I do. I really, really do. But for some reason, the coding practices I use for T-SQL are a LOT different from what I do in C#. I have a variable in there called @MAX_DEGREES which I just use to make sure the database doesn't blow itself up trying to connect two characters that haven't yet been connected in our database. You'll also notice a variable called @intFirewallCount which I use in any stored procedure that I write that uses loops or cursors. If you see that variable, you know I wrote the proc. Every time I go through a loop, I increment that counter and if it ever gets to some high number the proc aborts. After you've written a few infinite loops on your SQL Server, you start to err on the side of caution (at my last job in PA, our dev SQL server had remote query timeout set to infinite ... which can be good or bad but in my case was always bad). We validate our parameters and then declare a temp table that we'll use for everything in here and insert our first rows of data.
Now, what we do is to look at the “current data set” and find all the characters in it. The first pass through the loop we've got one character to relate to. Then we find everyone that appeared in a comic with any of those characters and add them to our temp table with a status of “pending”. If we find our “target character” we're done and we exit that part of the loop. If not, we mark our “current” data set as “searched” and the “pending” data set as “current” and go through again. Either we find our target or we search too far and abort.
Assuming we find our target, we clear out everything in the last data set except for 1 row that contains our target and build the correct information about that particular issue that they appeared in and the characters' names. We start working our way back through the table (we saved information about which pass through the loop each row was on, and this is why) repeating this process of deleting all the rows except for 1 and looking up the issue data and names.
Now we've got everything we need so we return it to the web server and clean up the table. This isn't the best T-SQL that I could have written, but the point is that its an example of looping and taking what is basically a recursive algorithm and unrolling it into an iterative one. It also shows what a defensive programmer I am in T-SQL whereas I'm usually not so defensive in C#. This procedure is relatively slow due to the large number if “in” and “not in” clauses that I needed. I didn't know how else to do that part of the search. The algorithm is basically a brute force one, as most recursive algorithms are, and no attempt is made to cache searches anywhere.
I'm sure that there are things that I could have done better ... and I'd love to hear about them. I really like T-SQL, and my posts about it seem to generate some good discussion and so hopefully someone will have some thoughts as to what I could have done differently. I'm totally self taught with SQL Server and so some of the things that I do wind up being “second rate” techniques that I wouldn't mind getting rid of. On the other hand, I know a lot of developers that don't think to do so much work on the database and wind up generating huge numbers of calls to the SQL Server to accomplish the same task that I'd do in one, and so maybe this will get them thinking about ways that they could do more on the database.
As an aside about doing websites on my own for fun ... the graphic designers that I work with are good friends of mine and sometimes getting together to discuss things about our sites or do to things related to our sites (comic book conventions, football games, training camp) is a great reason to just get together. We would probably go to them anyway, but having a site related to that gives us extra incentive and that's really helped since I moved away. They both live in Pennsylvania and I live in New York City now, so sometimes we only see each other for site-related events.