Going Functional : Symbol Graph Implementation in F#
In the previous post, we have seen the implementation of undirected graph data structure in F#. In this post, we will make use of the Graph data structure to implement the Symbol Graph data structure. You can read more about Symbol Graph data structure here. Given below is the java based implementation of Symbol Graph data structure. The implementation if taken from Algorithms 4th Edition by Robert Sedgewick and Kevin Wayne.
public class SymbolGraph {
private STst; // string -> index
private String[] keys; // index -> string
private Graph G;
public SymbolGraph(String filename, String delimiter) {
st = new ST();
In in = new In(filename);
// while (in.hasNextLine()) {
while (!in.isEmpty()) {
String[] a = in.readLine().split(delimiter);
for (int i = 0; i < a.length; i++) {
if (!st.contains(a[i]))
st.put(a[i], st.size());
}
}
// inverted index to get string keys in an aray
keys = new String[st.size()];
for (String name : st.keys()) {
keys[st.get(name)] = name;
}
G = new Graph(st.size());
in = new In(filename);
while (in.hasNextLine()) {
String[] a = in.readLine().split(delimiter);
int v = st.get(a[0]);
for (int i = 1; i < a.length; i++) {
int w = st.get(a[i]);
G.addEdge(v, w);
}
}
}
public boolean contains(String s) {
return st.contains(s);
}
public int index(String s) {
return st.get(s);
}
public String name(int v) {
return keys[v];
}
public Graph G() {
return G;
}
}
One of the common use of Symbol Graph data structure is in identifying the Degree Of Separation in nodes. For e.g. given a graph of movies & actors wherein movie names and actors are vertices and there are edges between movie name and all of the actors who have acted in the movie. You can read more about this here : http://algs4.cs.princeton.edu/41graph/. Degree Of Separation is defined as given two nodes(say two actors), then if these two actors are connected then what is the degree of separation between them. Degree of separation uses Symbol graph data structure to answer this question. Given below is the Java based implementation of the code.
public class DegreesOfSeparation {
// this class cannot be instantiated
private DegreesOfSeparation() { }
public static void main(String[] args) {
String filename = args[0];
String delimiter = args[1];
String source = args[2];
// StdOut.println("Source: " + source);
SymbolGraph sg = new SymbolGraph(filename, delimiter);
Graph G = sg.G();
if (!sg.contains(source)) {
StdOut.println(source + " not in database.");
return;
}
int s = sg.index(source);
BreadthFirstPaths bfs = new BreadthFirstPaths(G, s);
while (!StdIn.isEmpty()) {
String sink = StdIn.readLine();
if (sg.contains(sink)) {
int t = sg.index(sink);
if (bfs.hasPathTo(t)) {
for (int v : bfs.pathTo(t)) {
StdOut.println(" " + sg.name(v));
}
}
else {
StdOut.println("Not connected");
}
}
else {
StdOut.println(" Not in database.");
}
}
}
}
Degree of separation uses Symbol Graph & BreadthFirstPath data structure. Next we will see the F# based implementation of the above two code snippets combined.
open Graph
open System.IO
type SymbolGraph (path:string, sep:char) =
let mutable map = Map.empty
let mutable keys : string array = Array.zeroCreate map.Count
let mutable graph = new Graph(map.Count)
let rec tokenize (tempMap:Map) (items:string list) =
match items with
| [] -> map <- tempMap
| hd::tl when map.ContainsKey hd -> tokenize tempMap tl
| hd::tl ->
let tmp = tempMap.Add(hd, tempMap.Count)
tokenize tmp tl
do
use txtReader = new StreamReader(path)
let rec initializeMap (tmpReader:TextReader) =
match tmpReader.Peek() >= 0 with
| false -> ()
| true ->
tmpReader.ReadLine().Split [|sep|] |> Array.toList |> tokenize map
initializeMap tmpReader
initializeMap txtReader |> ignore
keys <- Array.zeroCreate map.Count
map |> Map.iter (fun x y -> keys.[map.[x]] <- x)
graph <- new Graph(map.Count)
use reader = new StreamReader(path)
let rec createGraph (tempMap:Map) (reader:TextReader) =
match reader.Peek() >= 0 with
| false -> ()
| true ->
let items = reader.ReadLine().Split [|sep|]
let vertex = tempMap.[items.[0]]
for i in [1..items.Length-1] do
let v = tempMap.[items.[i]]
graph.AddEdge(vertex, v)
createGraph map reader |> ignore
member x.Graph with get() = graph
member x.Contains (s:string) = map.ContainsKey s
member x.Index (s:string) = map.Item s
member x.Name (v:int) = keys.[v]
let DegreeOfSeparation =
let sg = SymbolGraph("..\Dev\FSharp\DataStructure\DataStructure\movies.csv", '/')
let graph = sg.Graph
let index = sg.Index "Gray, Ian (I)"
let bfs = BreadthFirstPath(sg.Graph, index)
let target = sg.Index "Thompson, Jack (I)"
let printPath =
match bfs.Path(target) with
| None -> printfn "Not connected"
| Some v ->
for i in v do
printfn " %A" (sg.Name i)
printPath |> ignore
Lets break down the F# code snippet and try to understand the various functional concepts used in the above code.
type SymbolGraph (path:string, sep:char)
The above line declares a class called SymbolGraph whose primary constructor takes two parameters are input namely : path of the file and the separator.
let mutable map = Map.empty<string, int> let mutable keys : string array = Array.zeroCreate map.Count let mutable graph = new Graph(map.Count)
Next we have declared three mutable data structures. Their usage will be self-explanatory based on their usage in the remaining code.
let rec tokenize (tempMap:Map) (items:string list) =
match items with
| [] -> map <- tempMap
| hd::tl when map.ContainsKey hd -> tokenize tempMap tl
| hd::tl ->
let tmp = tempMap.Add(hd, tempMap.Count)
tokenize tmp tl
Tokenize is a helper function which takes as input a string with records separated by separator and then breaks it down and stores in in key-value pair format in map data structure. In F# by default list, dictionary data structures are immutable i.e. if in a collection any value is added or removed, then internally F# creates a new collection. That’s why in the above code, every time I am adding a new value, I am storing the returned collection in temporary variable and passing it back as input parameter to the function. Once I am done processing the record, I am assigning temporary collection back to our main map data structure.
use txtReader = new StreamReader(path)
let rec initializeMap (tmpReader:TextReader) =
match tmpReader.Peek() >= 0 with
| false -> ()
| true ->
tmpReader.ReadLine().Split [|sep|] |> Array.toList |> tokenize map
initializeMap tmpReader
initializeMap txtReader |> ignore
keys <- Array.zeroCreate map.Count
map |> Map.iter (fun x y -> keys.[map.[x]] <- x)
graph <- new Graph(map.Count)
use reader = new StreamReader(path)
let rec createGraph (tempMap:Map) (reader:TextReader) =
match reader.Peek() >= 0 with
| false -> ()
| true ->
let items = reader.ReadLine().Split [|sep|]
let vertex = tempMap.[items.[0]]
for i in [1..items.Length-1] do
let v = tempMap.[items.[i]]
graph.AddEdge(vertex, v)
createGraph map reader |> ignore
In the above code, we are first creating a StreamReader instance for reading lines from the file. Next we have defined a recursive function initializeMap, which read a line from file and call the tokenize function on that line. Using the map data structure, I have initialized the keys data structure. Next create a graph instance and then using the map data structure, I am initializing the graph.
let DegreeOfSeparation =Next is the DegreeOfSeparation code implementation. It uses the SymbolGraph type and using the BreadthFirstPath traversal technique, print the path from source node to target node.
let sg = SymbolGraph("..\Dev\FSharp\DataStructure\DataStructure\movies_new.csv", '/')
let graph = sg.Graph
let index = sg.Index "Gray, Ian (I)"
let bfs = BreadthFirstPath(sg.Graph, index)
let target = sg.Index "Thompson, Jack (I)"
let printPath =
match bfs.Path(target) with
| None -> printfn "Not connected"
| Some v ->
for i in v do
printfn " %A" (sg.Name i)
printPath |> ignore
Once again to know more about the Symbol Graph data structure and DegreeOfSeparation concept here : http://algs4.cs.princeton.edu/41graph/.