Functional Programming and Collective Intelligence - IV

Saturday, April 11, 2009

In our last post in the series, we discussed a way to get which items I should check out next through the use of the our similarity algorithms which included the Pearson Coefficient, the Euclidean Distance and Manhattan Distance among others. This time, let’s take another approach through the use of the Slope One family of algorithms.

Before we get started, let’s get caught up to where we are today:

Slope One

Just as before when we were using other algorithms, the general guidelines of using these for collaboration were the following:

Site has many registered users
Site offers many items for sale
Users have rated items based upon their likes and dislikes
We want to be able to better inform our users of items they may wish to purchase based upon their preferences.

Now getting to using the Slope One, which is a family of algorithms that was introduced by Daniel Lemire and Anna Maclachlan in the paper entitled, “Slope One Predictors for Online Rating-Based Collaborative Filtering”. In this paper, there are three mentioned slope one schemes, the Slope One Scheme, the Weighted Slope One Scheme and the Bi-Polar Slope One Scheme.

The Slope One Scheme takes into account both information from user who ranked the same item, and from other items rated by the same user. But, as well, it gathers its strength from things not factored in, such as only ratings by users who have rated a common item with the given user and only those item ratings that the given user has entered as part of the prediction ratings.

The Weighted Slope One is another scheme which takes the number of times the user rated a particular item. For example, if we want to predict the user’s rating of item A given that the user has rated both B and C. If 100x more people rated item B than item C, then the predictor for item B is going to be far better than for item C.

The Bi-Polar Slope One is different than the previous schemes in that it splits the prediction into two parts, the predictions from items user liked and another that they disliked. By splitting these two likes and dislikes, it effectively doubles the number of users, however, in terms of items, only deviations between two liked items or deviations from two disliked items are taken into account. Just as well, in terms of users, only deviations from pairs of users who rated items A and B and share a like or dislike or item A are used to predict the rating for item B.

In this implementation, I’m going to tackle the Weighted Slope One. Using the Haskell implementation by Bryan O’Sullivan as my source of inspiration. I find that going between Haskell and F# to be rather easy in some respects as they have many of the same functional data structure constructs, although the Haskell implementations tend to have much richer functions. When programming in F#, I am immutable and functional by default, imperative when I have to, such as interacting with .NET base class libraries, and object oriented to better serve my APIs to other languages.

In order to get full parity, I need the following utility functions:

#light

module Option =
  /// Based upon a function returns defaultArg or some value
  let some  (n:'b) (f:'a -> 'b) : 'a option -> 'b = function
    | None   -> n
    | Some x -> f x    

module Map =
  /// Inserts into the Map with the given function if already exists, else adds
  let insertWith (f:'a -> 'a -> 'a) (k:'tkey) (v:'a) (m:Map<'tkey, 'a>) =
    match Map.tryfind k m with 
    | None   -> Map.add k v m
    | Some x -> let res = f v x
                Map.add k res m  
  
  /// Adds or deletes from a map based on a function
  let alter (f:'a option -> 'a option) (k:'k) (m:Map<'k,'a>) : Map<'k,'a> =
    match Map.tryfind k m with
    | None -> 
        match f None with 
        | None -> m
        | Some x -> Map.add k x m
    | Some v as vo -> 
        match f vo with
        | Some x -> Map.add k x m
        | None -> Map.remove k m
  
  /// Map without the key
  let map' (f:'a -> 'b) : Map<'k, 'a> -> Map<'k, 'b> =
    Map.mapi (fun _ v -> f v)

  /// Whether the map contains the given key
  let containsKey (k:'k) (m:Map<'k,'a>) =
    m.ContainsKey k

Now comes the actual implementation of our Weighted Slope One algorithm. Bryan has a great explanation of how it works here. Once you’ve read that, let’s now line up the F# code required for the task at hand. First, we need to define the types of data we’re using. Each rating is a map of anything and a floating point number. The Slope One is a map of anything and a map of anything of an integer and floating point number tuple.

module SlopeOne =
  
  type Rating<'a> = Map<'a, float>
  type SlopeOne<'a> = Map<'a, Map<'a, (int * float)>>

In order to calculate this, we need to define the update function which will fill in our matrix with ratings such as the following:

  let addT (a, b) (c, d) = (a+c, b+d)
  
  let update (s:SlopeOne<'a>) : Rating<'a> seq -> SlopeOne<'a> =   
    let norm (a, b) = (a, b / float a)
    
    let update' s rm = 
      let rs = Map.to_seq rm
      let prod = seq {for (a, m) in rs do for(b, n) in rs -> ((a, b), (1, m - n))}

      let insert m (a, b) v = 
        let foo = Some << Option.some (Map.add b v Map.empty) (Map.insertWith addT b v)
        Map.alter foo a m
      
      Seq.fold(fun m (k, v) -> insert m k v) s prod     
    
    Map.map' (Map.map' norm) << Seq.fold update' s

What this function does is fold the incoming slope one which then iterates to calcuate the differences between a rating and every other rating that user has made. Then we can scale each of the differentials based upon the number of times we’ve seen that rating.

Now that we have in place a way to build up our matrix, it’s time that we actually make a prediction. To do this, we’ll pass in our matrix we calculated from before and a rating a user has made. We will then iterate over every item the user has rated, for items that have been rated by anyone. We accumulate our total rating and then insert it into the map. We need to filter our items that we’ve already rated as well. Lastly, we normalize our ratings when they are greater than zero to give us our predictive result.

  let predict (matrixIn:SlopeOne<'a>) (userRatings:Rating<'a>) =
    let insert oldM (item1, foundRating, userRating) =
      let (count, normRating) = foundRating
      let totalRating = (float count) * (normRating + userRating)
      Map.insertWith addT item1 (count, totalRating) oldM
    
    let freqM = 
      Seq.fold insert Map.empty (      
        seq { for (item1, innerMap) in Map.to_seq matrixIn do
              if not (Map.containsKey item1 userRatings) then
                for (userItem, userRating) in Map.to_seq userRatings do
                  if item1 <> userItem then
                    if Map.containsKey userItem innerMap then
                      let foundRating = Map.find userItem innerMap 
                      yield (item1, foundRating, userRating)
           })
    
    let normM = Map.map' (fun (count, totalRating) -> totalRating / (float count)) freqM
    
    Map.filter (fun _ normRating -> normRating > 0.) normM

We can now define the data to feed to our function. In this case, let’s take the same matrix as defined in Bryan’s post. With this, we can make a prediction by calling the SlopeOne.update with an empty map and our user data and the ratings of the squid with a 0.4 rating.

let userData = 
  List.map Map.of_list [
    [("squid", 1.0); ("cuttlefish", 0.5); ("octopus", 0.2)];
    [("squid", 1.0); ("octopus", 0.5); ("nautilus", 0.2)];
    [("squid", 0.2); ("octopus", 1.0); ("cuttlefish", 0.4); ("nautilus", 0.4)];
    [("cuttlefish", 0.9); ("octopus", 0.4); ("nautilus", 0.5)]]
    
let prediction = 
  SlopeOne.predict (SlopeOne.update Map.empty userData) (Map.of_list [("squid", 0.4)])

We can then run this code through F# interactive to display our results:

> prediction;;
val it : Map<string,float>
= seq
    [[cuttlefish, 0.25] {Key = "cuttlefish";
                         Value = 0.25;}; [nautilus, 0.1] {Key = "nautilus";
                                                          Value = 0.1;};
     [octopus, 0.233333333333333] {Key = "octopus";
                                   Value = 0.2333333333;}]

How accurate is the Slope One family of algorithms? According to the paper, they are on par with the Pearson Coefficient, but less memory intensive after analyzing the EachMovie and MovieLens datasets.

Conclusion

Where can can we go from here? Because we’re using immutable data structures, we could look for ways of working this code in parallel without much effort. The beauty of functional programming is that without these mutable data structures and shared state, opportunities can arise for parallelization optimizations.

Overall, the Slope One family of algorithms are easy to understand, and once you’ve read the paper thoroughly, you may find it easier than the other ones we’ve discussed such as the Pearson Coefficient, the Jaccard Coefficient and so on. There is still some items left in this series such as calculating how similar items are to one another among other topics.

Slope One

Conclusion

1 Comment