Tagging Implementation with LINQ

Hello all,


I decided to implement tagging on my video library. My first solution was to use one table for the tags. Containing the videoid and tagname. When I loaded in 1,000,000 test tags this became slow to generate my tag cloud data. My new solution was to split up the tags into two tables. One containing distinct tags and the other joining videos to them. I will also show some things I came by when extracting my tag data.

Tables

My tables for tagging are as follows:

Tags = {
    ID,   
    TagName
 }

VideoTags = {
    ID,
    TagID,
    VideoID
}


Inserting New Tags:

This is the only task that takes a little more work using this solution. When adding a new tag to a video, we first must check to see if the tag ecists in the Tags table, if it doesnt we need to add it, otherwise we just need its ID. I created a method to aid in adding Tags for me:

  public long InsertTag(GeekTubeDataContext db) {

            Tag tmp = db.Tags.SingleOrDefault(t => t.TagName == this.TagName);
            long tagID = 0;

            if (tmp != null) {
                tagID = tmp.ID;
            } else {
                db.Tags.InsertOnSubmit(this);
                db.SubmitChanges();
                tagID = this.ID;
            }

            return tagID;
        }

Used like so:

Tag newTag = new Tag() { TagName = "testtag" };
int tagID = newTag.InsertTag(db);



Adding tags to videos:

We have two cases adding a new video with tags and updating a videos tags.

public void InsertVideoWithTags(GeekTubeDataContext db, List<int> tags) {
                       
            foreach (var tag in tags) {
                this.VideoTags.Add(new VideoTag { TagID = tag });
            }

            db.Videos.InsertOnSubmit(this);
            db.SubmitChanges();
           
        }

 
This inserts our videos and assigns the tags to it... The next method Deletes a videos current tags and adds the new ones:

public void UpdateVideoTags(GeekTubeDataContext db, List<int> tags) {

            db.VideoTags.DeleteAllOnSubmit(this.VideoTags);
           
            foreach (var tag in tags) {
                this.VideoTags.Add(new VideoTag { TagID = tag });
            }

            db.SubmitChanges();

        }

You would use it like this:

Video vid = new Video();
vid.Title = "foo";
vid.InsertVideoWithTags(db, TAGSIDLIST);

-or-

Video vid = Video.GetVideo(db, ID);
vid.UpdateVideoTags(db, TAGSIDLIST);
      

Getting tag cloud data:

Getting the tag cloud data this way prooved fairly quick and was much easier than with one table. My original LINQ query was this:

            // Get top 20 most popular tags ordered by name...
            var query = (from tag in db.Tags
                        join videotag in db.VideoTags on tag.ID equals videotag.TagID into g    
                        orderby g.Count() descending
                        select new TagCloudItem {
                            Name = tag.TagName,
                            Count = g.Count()
                        }).Take(20).OrderBy(p=>p.Name);
                       
            return query;

This worked perfect but after looking at the SQL it was generated thought I could tidy it up a little so came up with this:

            // Get top 20 most popular tags ordered by name...
            var query = (from tag in db.Tags
                         join videotag in db.VideoTags on tag.ID equals videotag.TagID into g
                         select new { Name = tag.TagName, Count = g.Count() } into item
                         orderby item.Count descending
                         select new TagCloudItem {
                             Name = item.Name,
                             Count = item.Count
                         }).Take(20).OrderBy(item => item.Name);

            return query;

After looking at the SQL the secound query generated and comparing the two by viewing the Execution plan I found the secound query was twice as fast as it had less nested selects. Qutie happy with that myself.

TagCloud Control:

I created a simple tagcloud control to display the data for me. I will not go into this in this post though.


Final:

I Think this example works well. One thing I am not 100% happy about is the way I implement my data access. I think this could be done a better way while still keeping it seperated from the UI layer. There are still a few things not shown here like extracting tags from frontend but that is because I have not finished that one yet :)..... Anyway hope noone fell asleep.

 

Thanks

Stefan 


Update 20071222:

I have updated the tagcloud data generation query to the blow. I found it easier to read and it has no performance loss.

 var query = (from tag in db.Tags                        
                         select new TagCloudItem {
                             Name = tag.TagName,
                             Count = (from videotag in db.VideoTags
                                      where videotag.TagID == tag.ID
                                      select videotag).Count()
                         }).OrderByDescending(tag => tag.Count).Take(20).OrderBy(tag => tag.Name);     


Update 20071222:

I now added a Created column to the VideoTags table. This can now allow me to get they say top 20 most recent tags. The query would now be so:

var query = (from tag in db.Tags                        
                         select new TagCloudItem {
                             Name = tag.TagName,
                             Count = (from videotag in db.VideoTags
                                      where videotag.TagID == tag.ID
                                      select videotag).Count(),
                             LastUsed = (from videotag in db.VideoTags
                                         where videotag.TagID == tag.ID
                                         orderby videotag.Created descending
                                         select videotag.Created).Take(1).SingleOrDefault()
                         }).OrderByDescending(tag => tag.LastUsed).Take(20).OrderBy(tag => tag.Name);

What we do now is get the most recent time a tag was assigned to a video. We then order by date descending and take 20. Then order by name and can display in our tagcloud.... Fun fun

 

3 Comments

  • Thanks for this helpful article.
    This is exactly what i was trying to achieve but in vain.
    Please, please please can u post the vb equivalent of the above example.
    Thanks in advance.

  • Hello,
    i found this article really helpful, thanks for such a good example. it is exactly what i was looking for, with a minor difference that I'm using photos instead of videos.
    A humble request please can u post a VB version of the code, because i dont know the c# language.
    I tried to re-write the code in VB but got stuck with this line
    "t =&gt; t.TagName == this.TagName".
    Another query is where to write these functions and procedures, in code behind of a page or in its own class.
    please guide me.
    thanks in advance.
    Moiz.

  • Moiz,

    (t => t.TagName == this.TagName) in VB would become something like Function(t) t.TagName = me.TagName, as to where to write them. I am creating partial classes for my LINQ classes. So for Tag I create a partial class and add my helper methods in there. If you need any more help just shoot me an email with what you have done and I will be more than happy to help out.

    stefan.sedich@gmail.com


    Cheers

    Stefan

Comments have been disabled for this content.