Asked By : CodyBugstein
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/11029
Answered By : Mario Cervera
- Keys ${0,12,24,36,…}$ will be hashed to bucket $0$.
- Keys ${3,15,27,39,…}$ will be hashed to bucket $3$.
- Keys ${6,18,30,42,…}$ will be hashed to bucket $6$.
- Keys ${9,21,33,45,…}$ will be hashed to bucket $9$.
If $K$ is uniformly distributed (i.e., every key in $K$ is equally likely to occur), then the choice of $m$ is not so critical. But, what happens if $K$ is not uniformly distributed? Imagine that the keys that are most likely to occur are the multiples of $3$. In this case, all of the buckets that are not multiples of $3$ will be empty with high probability (which is really bad in terms of hash table performance). This situation is more common that it may seem. Imagine, for instance, that you are keeping track of objects based on where they are stored in memory. If your computer’s word size is four bytes, then you will be hashing keys that are multiples of $4$. Needless to say that choosing $m$ to be a multiple of $4$ would be a terrible choice: you would have $3m/4$ buckets completely empty, and all of your keys colliding in the remaining $m/4$ buckets. In general:
Every key in $K$ that shares a common factor with the number of buckets $m$ will be hashed to a bucket that is a multiple of this factor.
Therefore, to minimize collisions, it is important to reduce the number of common factors between $m$ and the elements of $K$. How can this be achieved? By choosing $m$ to be a number that has very few factors: a prime number.