Bloom Filters

A friend of mine saw an article on slashdot about the LOAF system. LOAF uses an algorithm called a "Bloom Filter", named for it's inventor Burton H. Bloom, that allows you to share your contact list in a social network kind of way. It's pretty dang cool by itself. But uppon further investigation of the algorithm, the bloom filter is just as interesting.

There are several examples of this algorithm in varuous languages such as perl and python but I couldn't find any in a .NET language so I created the following implementation as a starting point

    public class BloomFilter
    {
        private BitArray _bits;
        private SHA1 sha1 = new SHA1Managed();

        private ushort[] GetIndexes(string key)
        {
            // TODO: make this call multiple hash functions or salt SHA1
            byte[] hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(key));

            // SHA1 generates a 20byte hash (160 bits) so treat those
            // 20 bytes as 10 16 bit integers.
            ushort[] idxs = new ushort[10];
            for (int i=0,j=0; i<20; i+=2,j++)
            {
                idxs[j] = (ushort)(hash[i]<<8);
                idxs[j] |= hash[i+1];
                idxs[j] = (ushort)(idxs[j] % _bits.Count);
            }
            return idxs;
        }

        public BloomFilter(int size)
        {
            _bits = new BitArray(size);
        }

        public void Add(string key)
        {
            foreach (int idx in GetIndexes(key))
                _bits.Set(idx, true);
        }

        public bool Contains(string key)
        {
            // check that each bit of the hashes
            // are set in the filter
            foreach (int idx in GetIndexes(key))
                if (!_bits.Get(idx))
                    return false;
            return true;
        }
    }

I know I'm kind of cheating by using only one SHA1 hash but hey, this is just the first shot at it.

Right off hand there seem to be hundreds of places that this could be used to speed up performance. Email opt out lists, spell checkers,  mmorpg games with lots of inventory items.

"My mind is aglow with whirling, transient nodes of thought careening thru a cosmic vapor of invention." - Hedley Lamarr in Blazing Saddles. :-)

Check out these links for more info on the algorithm:
http://www.flipcode.com/tutorials/tut_bloomfilter.shtml
http://www.perl.com/lpt/a/2004/04/08/bloom_filters.html

No Comments