Bloom Filters
There are several examples of this algorithm in varuous languages such as perl and python but I couldn't find any in a .NET language so I created the following implementation as a starting point
public class BloomFilter
{
private BitArray _bits;
private SHA1 sha1 = new SHA1Managed();
private ushort[] GetIndexes(string key)
{
// TODO: make this call multiple hash functions or salt SHA1
byte[] hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(key));
// SHA1 generates a 20byte hash (160 bits) so treat those
// 20 bytes as 10 16 bit integers.
ushort[] idxs = new ushort[10];
for (int i=0,j=0; i<20; i+=2,j++)
{
idxs[j] = (ushort)(hash[i]<<8);
idxs[j] |= hash[i+1];
idxs[j] = (ushort)(idxs[j] % _bits.Count);
}
return idxs;
}
public BloomFilter(int size)
{
_bits = new BitArray(size);
}
public void Add(string key)
{
foreach (int idx in GetIndexes(key))
_bits.Set(idx, true);
}
public bool Contains(string key)
{
// check that each bit of the hashes
// are set in the filter
foreach (int idx in GetIndexes(key))
if (!_bits.Get(idx))
return false;
return true;
}
}
I know I'm kind of cheating by using only one SHA1 hash but hey, this is just the first shot at it.
Right off hand there seem to be hundreds of places that this could be used to speed up performance. Email opt out lists, spell checkers, mmorpg games with lots of inventory items.
"My mind is aglow with whirling, transient nodes of thought careening thru a cosmic vapor of invention." - Hedley Lamarr in Blazing Saddles. :-)
Check out these links for more info on the algorithm:
http://www.flipcode.com/tutorials/tut_bloomfilter.shtml
http://www.perl.com/lpt/a/2004/04/08/bloom_filters.html