Discover OpenSource : NSketch (Sketch-based algorithms)

As I was going through sourceforge, I noticed quite a few projects which are hidden in the depths of sourceforge's projects. For my own reference and for the few blog readers, I have decided to cover them here briefly.

The first project is NSketch, which is written in C# by Joannès Vermorel (blog).

From the website:

"The NSketch library provides implementations of most common sketch-based algorithms (histograms, frequent items, bloom filter ...). The library is written in C# for .Net 2.0 and released under LGPL.
 
A sketch is a compact yet approximate representation of some data. Intuitively, if exactness is not a requirement, approximation can provide a huge performance gain against a limited error.
 
The version 0.1 of NSketch (previously named 'DataStreams') includes histograms (naive, sechap, exponential), frequent item selection (lossy counting), bloom filters, fast generic hash function. "

Read more and download from:
http://datastreams.sourceforge.net/
https://sourceforge.net/projects/datastreams/

Ref:
Approximate Frequency Counts over Data Streams
http://infolab.stanford.edu/~manku/papers/02vldb-freq.pdf (pdf)

Bloom Filter
http://en.wikipedia.org/wiki/Bloom_filter
http://blogs.msdn.com/devdev/archive/2005/08/17/452827.aspx
Network Applications of Bloom Filters (http://citeseer.ist.psu.edu/broder02network.html)

Personal Experience: None

No Comments