Tagging Implementation with LINQ
I decided to implement tagging on my video library. My
first solution was to use one table for the tags. Containing
the videoid and tagname. When I loaded in 1,000,000 test
tags this became slow to generate my tag cloud data. My new
solution was to split up the tags into two tables. One
containing distinct tags and the other joining videos to
them. I will also show some things I came by when extracting
my tag data.
Tables
My tables
for tagging are as follows:
Tags = {
ID,
TagName
}
VideoTags = {
ID,
TagID,
VideoID
}
Inserting New Tags:
This is the only
task that takes a little more work using this solution. When
adding a new tag to a video, we first must check to see if
the tag ecists in the Tags table, if it doesnt we need to
add it, otherwise we just need its ID. I created a method to
aid in adding Tags for me:
public long InsertTag(GeekTubeDataContext db) {
Tag tmp = db.Tags.SingleOrDefault(t => t.TagName ==
this.TagName);
long tagID = 0;
if (tmp != null) {
tagID = tmp.ID;
} else {
db.Tags.InsertOnSubmit(this);
db.SubmitChanges();
tagID = this.ID;
}
return tagID;
}
Used
like so:
Tag newTag = new Tag() { TagName =
"testtag" };
int tagID = newTag.InsertTag(db);
Adding tags to videos:
We have two cases adding a new video with tags and updating
a videos tags.
public void
InsertVideoWithTags(GeekTubeDataContext db, List<int>
tags) {
foreach (var tag in tags) {
this.VideoTags.Add(new VideoTag { TagID = tag });
}
db.Videos.InsertOnSubmit(this);
db.SubmitChanges();
}
This inserts our videos and assigns the tags to it...
The next method Deletes a videos current tags and adds the
new ones:
public void
UpdateVideoTags(GeekTubeDataContext db, List<int>
tags) {
db.VideoTags.DeleteAllOnSubmit(this.VideoTags);
foreach (var tag in tags) {
this.VideoTags.Add(new VideoTag { TagID = tag });
}
db.SubmitChanges();
}
You would use it like this:
Video
vid = new Video();
vid.Title = "foo";
vid.InsertVideoWithTags(db,
TAGSIDLIST);
-or-
Video vid =
Video.GetVideo(db, ID);
vid.UpdateVideoTags(db,
TAGSIDLIST);
Getting tag cloud data:
Getting the tag
cloud data this way prooved fairly quick and was much easier
than with one table. My original LINQ query was this:
// Get top 20 most popular tags ordered by name...
var query = (from tag in db.Tags
join videotag in db.VideoTags on tag.ID equals
videotag.TagID into g
orderby g.Count() descending
select new TagCloudItem {
Name = tag.TagName,
Count =
g.Count()
}).Take(20).OrderBy(p=>p.Name);
return query;
This worked
perfect but after looking at the SQL it was generated
thought I could tidy it up a little so came up with this:
// Get top 20 most popular tags ordered by name...
var query = (from tag in db.Tags
join videotag in db.VideoTags on tag.ID equals
videotag.TagID into g
select
new { Name = tag.TagName, Count = g.Count() } into item
orderby item.Count descending
select new TagCloudItem {
Name = item.Name,
Count =
item.Count
}).Take(20).OrderBy(item => item.Name);
return query;
After looking at the SQL the
secound query generated and comparing the two by viewing the
Execution plan I found the secound query was twice as fast
as it had less nested selects. Qutie happy with that myself.
TagCloud Control:
I created a
simple tagcloud control to display the data for me. I will
not go into this in this post though.
Final:
I Think this example works well. One thing I am not 100%
happy about is the way I implement my data access. I think
this could be done a better way while still keeping it
seperated from the UI layer. There are still a few things
not shown here like extracting tags from frontend but that
is because I have not finished that one yet :)..... Anyway
hope noone fell asleep.
Thanks
Stefan
Update 20071222:
I have updated the
tagcloud data generation query to the blow. I found it
easier to read and it has no performance loss.
select new TagCloudItem {
Name = tag.TagName,
Count = (from videotag in db.VideoTags
where videotag.TagID == tag.ID
select videotag).Count()
}).OrderByDescending(tag => tag.Count).Take(20).OrderBy(tag => tag.Name);
Update 20071222:
I now added a Created column to the VideoTags table. This
can now allow me to get they say top 20 most recent tags.
The query would now be so:
var query = (from tag in db.Tags
select new TagCloudItem {
Name = tag.TagName,
Count = (from videotag in
db.VideoTags
where videotag.TagID
== tag.ID
select
videotag).Count(),
LastUsed = (from videotag in
db.VideoTags
where
videotag.TagID == tag.ID
orderby
videotag.Created descending
select
videotag.Created).Take(1).SingleOrDefault()
}).OrderByDescending(tag =>
tag.LastUsed).Take(20).OrderBy(tag => tag.Name);
What
we do now is get the most recent time a tag was assigned to
a video. We then order by date descending and take 20. Then
order by name and can display in our tagcloud.... Fun fun