Ely Porat gave a Google engEdu talk, "The Bloom filter", with a good survey of Bloom filters and variants on Bloom filters.
Bloom filters let you quickly determine if an item is a member of a set with no false negatives but some false positives. They work well for very large data sets since they require about 10 bits per item for a 1% false positive rate and offer constant time inserts and lookups.
Ely's talk provides some useful motivation at the beginning, then has a nice, light introduction to Bloom filters starting at 8:40. The discussion of variants that support deletion or compression is interesting but, if you find yourself dragging, make sure to at least jump to 30:40 for the cute discussion of Cukoo Hashing and the Cukoo Bloom Filter.
Please see also the "Bloom filter" Wikipedia page which contains a nice summary of Bloom filters and many of the variants described in Ely's talk.
Subscribe to:
Post Comments (Atom)
1 comment:
I'm transcoding this now to my scalecast podcast:
http://scalecast.wordpress.com/
.... so you can watch it on ipods.
Post a Comment