Building Recommendation System Using F# – Part 2
In the first part, we implemented our first F# function towards building a recommendation system, which could tell us relative closeness of two users based on their ratings given to the various music bands. During implementation, I highlighted some of the core F# programming concepts like pattern matching, options, modules etc.. In this part, we are going to concentrate on following three things:
- Setting up test data : We are going to define a small statically defined collection of items which would act as our source of test data
- Function of computing closest neighbors : A new function which would take a given username as input and then output list of users sorted on the basis of their relative closeness to the input user name. This newly added function is going to call Manhattan function for computing distance between the users.
- F# interactive for testing our code.
Note : The source code(reference) and examples listed in this post is taken from the book : A Programmers Guide to data mining.
Setting Up Test Data
The matrix that we defined in part-1 of this series, can be easily represented using Dictionary data structure. Dictionary data structure maintains a collection of key-value pair where all the keys have to be distinct. In our case, we can represent user name as key and list of their ratings to individual music bands as values. If C# the dictionary signature would have looked something like this:
Dictionary<string,IList<Ratings>>
Ratings is a F# record type entity that we have declared in part-1 of this series. We will see how we can represent our test data in key-value dictionary pair format using F#:
let data = [("Angelica",[{Name="Blues Traveler"; Rating= 3.5};{Name="Broken Bells"; Rating= 2.0}; {Name="Norah Jones";Rating=4.5};{Name="Phoenix";Rating=5.0}; {Name="Slightly Stoopid";Rating=1.5};{Name="The Strokes";Rating=2.5}; {Name="Vampire Weekend";Rating=2.0}]); ("Bill",[{Name="Blues Traveler";Rating=2.0};{Name="Broken Bells";Rating=3.5}; {Name= "Deadmau5";Rating= 4.0};{Name="Phoenix";Rating=2.0}; {Name="Slightly Stoopid";Rating=3.5};{Name="Vampire Weekend";Rating= 3.0}]); ("Chan",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 1.0}; {Name= "Deadmau5";Rating= 1.0};{Name= "Norah Jones";Rating= 3.0}; {Name= "Phoenix";Rating=5.0};{Name= "Slightly Stoopid";Rating= 1.0}]); ("Dan",[{Name="Blues Traveler";Rating= 3.0};{Name= "Broken Bells";Rating= 4.0}; {Name="Deadmau5";Rating= 4.5};{Name= "Phoenix";Rating= 3.0}; {Name="Slightly Stoopid";Rating= 4.5};{Name= "The Strokes";Rating= 4.0}; {Name="Vampire Weekend";Rating= 2.0}]); ("Hailey",[{Name="Broken Bells";Rating= 4.0};{Name= "Deadmau5";Rating= 1.0}; {Name="Norah Jones";Rating= 4.0};{Name= "The Strokes";Rating= 4.0}; {Name="Vampire Weekend";Rating= 1.0}]); ("Jordyn",[{Name="Broken Bells";Rating= 4.5};{Name= "Deadmau5";Rating= 4.0}; {Name= "Norah Jones";Rating= 5.0}; {Name="Phoenix";Rating= 5.0};{Name= "Slightly Stoopid";Rating= 4.5}; {Name="The Strokes";Rating= 4.0};{Name= "Vampire Weekend";Rating= 4.0}]); ("Sam",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 2.0}; {Name="Norah Jones";Rating= 3.0};{Name= "Phoenix";Rating= 5.0}; {Name="Slightly Stoopid";Rating= 4.0};{Name= "The Strokes";Rating= 5.0}]); ("Veronica",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 2.0}; {Name="Norah Jones";Rating= 3.0};{Name= "Phoenix";Rating= 5.0}; {Name="Slightly Stoopid";Rating= 4.0};{Name= "The Strokes";Rating= 5.0}]); ] |> dict let Users = data.Keys |> Seq.toList
The “data” variable represents our test data. Notice that we haven’t provided any explicit type definition for our “data” variable. F# has automatically deduced its type on the basis of the expression on the right hand side of the expression. If we hover over it, then we can see that its type is :
val data : Collections.Generics.IDictionary<string, Ratings list>
In F#, lists are represented by open & closed square brackets([]). Items inside the list are separated by semi-colon[;]. As we can see in the above code, we have created a list(containing ratings data), which is then pipe-lined into F# dict function, which converts the incoming list into a dictionary. The F# dict function, accepts a collection of tuple and it then takes the first value of tuple as key and maps the second value of tuple as its value. If the above case, the list is containing tuples of UserName and their corresponding ratings for every music band. For e.g.
("Angelica",[{Name="Blues Traveler"; Rating= 3.5}; {Name="Broken Bells"; Rating= 2.0}; {Name="Norah Jones";Rating=4.5}; {Name="Phoenix";Rating=5.0}; {Name="Slightly Stoopid";Rating=1.5}; {Name="The Strokes";Rating=2.5}; {Name="Vampire Weekend";Rating=2.0}])
Above defined code is a tuple of type : (string * Ratings list). Notice that F# automatically deduces the type of records as “Ratings list” based on the fields used i.e. Name and Rating. Finally, after we have created our UserName indexed collection of test data, we extract all the user names into a separate variable called “Users” by accessing all the keys of the dictionary.
Getting Closest Neighbors
Now that we have our Manhattan function and test data ready, we are now in a position to calculate relative closeness of users against a given user based on their ratings for each of the music band. For clarity, I will present the code of Manhattan function along with the code for getting closest neighbor.
let Manhattan(ratings1:Ratings list, ratings2:Ratings list) = let mutable distance = 0.0 for item in ratings1 do let otherRating = ratings2 |> List.tryFind (fun t -> t.Name = item.Name) match otherRating with | Some(v) -> distance <- distance + Math.Abs(item.Rating - v.Rating) | None -> () distance let ComputeNearestNeighbour(userName, users) = let mutable distance = [] for user in users |> List.filter (fun x -> x <> userName) do let dist = Manhattan(data.[user], data.[userName]) distance <- distance @ [(dist,user)] distance <- distance |> List.sortBy (fun (x,y) -> x) distance
Lets discuss in a step by step manner as what the “ComputeNearestNeighbour” function is doing.
- It accepts the name of user(username) for whom we would like to calculate list of nearest neighbors. “users” parameter corresponds to list of all the usernames in the test data.
- It then declares a mutable list called dictionary.
- In the for loop, for every user other than the input “userName” we do the following
- Calculate Manhattan distance between then input user and the current iterating user
- Create a tuple containing the calculate distance and name of the iterating user i.e. (dist, user) and then append it to the mutable “distance” list created above.
- Finally, we sort the mutable list on the basis of Manhattan distance value contained in every tuple value in the list.
- Lastly return the “distance” list
Notice that how well the various F# concepts like pipelining, pattern matching, list comprehensions are playing so well with each other in making the code so readable.
F# Interactive
We have written two fairly simple F# functions, created test data but till now we haven’t verified that whether the code works or not. Normally in C#, we will have to either write some unit test code or invoke the functions from main method, in order to verify its correctness. In F#, we can make use of “F# Interactive” to test our code as an when we are writing it. To see F# interactive in action, just copy paste and highlight the code below in F# source file(In the first part, we started by creating an F# source file called Recommend.fs) and then right and select “Execute In Interactive”.
open System type Ratings = {Name: string; Rating: float} let data = [("Angelica",[{Name="Blues Traveler"; Rating= 3.5};{Name="Broken Bells"; Rating= 2.0}; {Name="Norah Jones";Rating=4.5};{Name="Phoenix";Rating=5.0}; {Name="Slightly Stoopid";Rating=1.5};{Name="The Strokes";Rating=2.5}; {Name="Vampire Weekend";Rating=2.0}]); ("Bill",[{Name="Blues Traveler";Rating=2.0};{Name="Broken Bells";Rating=3.5}; {Name= "Deadmau5";Rating= 4.0};{Name="Phoenix";Rating=2.0}; {Name="Slightly Stoopid";Rating=3.5};{Name="Vampire Weekend";Rating= 3.0}]); ("Chan",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 1.0}; {Name= "Deadmau5";Rating= 1.0};{Name= "Norah Jones";Rating= 3.0}; {Name= "Phoenix";Rating=5.0};{Name= "Slightly Stoopid";Rating= 1.0}]); ("Dan",[{Name="Blues Traveler";Rating= 3.0};{Name= "Broken Bells";Rating= 4.0}; {Name="Deadmau5";Rating= 4.5};{Name= "Phoenix";Rating= 3.0}; {Name="Slightly Stoopid";Rating= 4.5};{Name= "The Strokes";Rating= 4.0}; {Name="Vampire Weekend";Rating= 2.0}]); ("Hailey",[{Name="Broken Bells";Rating= 4.0};{Name= "Deadmau5";Rating= 1.0}; {Name="Norah Jones";Rating= 4.0};{Name= "The Strokes";Rating= 4.0}; {Name="Vampire Weekend";Rating= 1.0}]); ("Jordyn",[{Name="Broken Bells";Rating= 4.5};{Name= "Deadmau5";Rating= 4.0}; {Name= "Norah Jones";Rating= 5.0}; {Name="Phoenix";Rating= 5.0};{Name= "Slightly Stoopid";Rating= 4.5}; {Name="The Strokes";Rating= 4.0};{Name= "Vampire Weekend";Rating= 4.0}]); ("Sam",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 2.0}; {Name="Norah Jones";Rating= 3.0};{Name= "Phoenix";Rating= 5.0}; {Name="Slightly Stoopid";Rating= 4.0};{Name= "The Strokes";Rating= 5.0}]); ("Veronica",[{Name="Blues Traveler";Rating= 5.0};{Name= "Broken Bells";Rating= 2.0}; {Name="Norah Jones";Rating= 3.0};{Name= "Phoenix";Rating= 5.0}; {Name="Slightly Stoopid";Rating= 4.0};{Name= "The Strokes";Rating= 5.0}]); ] |> dict let Users = data.Keys |> Seq.toList let Manhattan(ratings1:Ratings list, ratings2:Ratings list) = let mutable distance = 0.0 for item in ratings1 do let otherRating = ratings2 |> List.tryFind (fun t -> t.Name = item.Name) match otherRating with | Some(v) -> distance <- distance + Math.Abs(item.Rating - v.Rating) | None -> () distance let ComputeNearestNeighbour(userName, users) = let mutable distance = [] for user in users |> List.filter (fun x -> x <> userName) do let dist = Manhattan(data.[user], data.[userName]) distance <- distance @ [(dist,user)] distance <- distance |> List.sortBy (fun (x,y) -> x) distance
In the bottom pane of the Visual Studio, you can find the F# Interactive window. If everything goes right, then F# would be able to infer the types defined in our code and would present it in the interactive window as given below:
type Ratings = {Name: string; Rating: float;} val data : System.Collections.Generic.IDictionary<string,Ratings list> val Users : string list = ["Angelica"; "Bill"; "Chan"; "Dan"; "Hailey"; "Jordyn"; "Sam"; "Veronica"] val Manhattan : ratings1:Ratings list * ratings2:Ratings list -> float val ComputeNearestNeighbour : userName:string * users:string list -> (float * string) list
As we can see that it has quiet neatly identified all the types and listed down their signatures. Also notice that, F# has also printed the content of “Users” collection. Now its time to call the “ComputeNearestNeighbor” function for any one of our user. Lets say call it for “Hailey” :
> ComputeNearestNeighbour("Hailey", Users);; val it : (float * string) list = [(4.0, "Chan"); (4.0, "Sam"); (4.0, "Veronica"); (4.5, "Dan"); (5.0, "Angelica"); (5.5, "Bill"); (7.5, "Jordyn")]
In F# interactive, you end the function call statement or basically any statement by using two consecutive semi-colons[;;]. As we can see, except “Hailey”, all other users are listed in increasing order of their relative distance from “Hailey”. The distance value is calculated using the “Manhattan” function. Also notice that in ComputeNearestNeighbour function, we haven’t provided explicit signature to our mutable “distance” variable.
In this part, we have covered the core of building a simplified recommendation system. In the next part, we will provide an extra method which will extract content from the output of “ComputeNearestNeighbour” function and it will present it as recommendation of “music bands” for a given user.