Producing a FullText search query that accounts for RANK and the freshness of content.

Recently at AnandTech, we decided to rewrite our search mechanisms to take date into account, while performing Full Text queries against Microsoft SQL Server's full text functionality.   By default, when you use CONTAINSTABLE, the search service will return a set of results with a RANK value associated with each row.  The RANK value is based on occurrence, proximity etc.  The problem comes when you order by the RANK value, and have time sensitive content.   What we've been toying with is combining RANK with the Date of the content.  The reason for this is to try and give the user results that are not only relevant but are also "fresh".   We went about this by writing a CASE statement into our query which essentially produces a weight value based on how old it is.  In the example query below, you can see how this works.   Based on how many weeks old the content is, we assign it a weight and that is combined with the RANK value that the search service returns.   Obviously we're still fairly RANK heavy, but date is taken into account.  You can adjust how RANK/Date heavy you want the results to be by changing how much you divide each RANK by at the end of each WHEN clause. 

Let me know if you've done something similar, or any thoughts on this technique.

SELECT DISTINCT TOP 21 d.idocumentid,d.vchdocumenttitle, d.dtinsertdate,
CASE
 WHEN datediff(wk, d.dtinsertdate, getdate()) between  0 and 4  THEN (allpages.rank/1.0)
 WHEN datediff(wk, d.dtinsertdate, getdate()) between  5 and 8  THEN (allpages.rank/1.2)
 WHEN datediff(wk, d.dtinsertdate, getdate()) between  9 and 12 THEN (allpages.rank/1.4)
 WHEN datediff(wk, d.dtinsertdate, getdate()) between 13 and 16 THEN (allpages.rank/1.6)
 WHEN datediff(wk, d.dtinsertdate, getdate()) between 17 and 20 THEN (allpages.rank/1.8)
 WHEN datediff(wk, d.dtinsertdate, getdate()) between 21 and 24 THEN (allpages.rank/2.0)
ELSE
 (allpages.rank/3.0)
END AS WeightedRank
FROM documents d WITH (NOLOCK)
JOIN documentpages p WITH (NOLOCK) on d.idocumentid = p.idocumentid
JOIN CONTAINSTABLE(documentpages,txpage,' "intel" AND "centrino"',2000) allpages
ON p.irowid = allpages.[Key]
GROUP BY d.idocumentid, d.vchdocumenttitle, d.dtinsertdate,allpages.rank
ORDER BY WeightedRank DESC, d.dtinsertdate DESC

 

 

2 Comments

  • If you really want to keep this dynamic, add a lookup table to account for the weights for different day periods. Instead of the huge CASE, do this:



    SELECT ...

    allpages.rank * (SELECT TOP 1 weight FROM timespan_weight WHERE days > DATEDIFF(day,d.dtinsertdate,getdate())

    ...



    The table would look like the following:



    CREATE TABLE timespan_weight (days INT,weight DECIMAL(4,2))

  • I think it works great.



    I did pretty much exactly what you did to keep more current stuff "more relevant".

Comments have been disabled for this content.