[Solved]: Why do Bloom filters work?

Problem Detail: Let’s say I am using Bloom filters to create a function to check if a word exists in a document or not. If I pick a hash function to fill out a bit bucket for all words in my document. Then if for a given number of words, wouldn’t the whole bit bucket be all 1s? If so then checking for any word will return true? What am I missing here?

Asked By : user220201

Answered By : Wandering Logic

Given that you want to insert $n$ words into the Bloom filter, and you want a false positive probability of $p$, the wikipedia page on Bloom filters gives the following formulas for choosing $m$, the number of bits in your table and $k$, the number of hash functions that you are going to use. They give $ m = – frac{n ln p}{(ln 2)^2} $ and $$ k = frac{m}{n}ln 2=-frac{ln p}{ln 2}=-lg_2p, $$ so you should choose $$ m=frac{nk}{ln 2}. $$ That actually works out quite nicely. You are going to get a table with about half the bits set and half cleared, so the entropy per bit is going to be maximal, and the probability of a false positive is going to be $0.5^k$.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/12834

Leave a Reply