Jianwei Xie

Last Modified: 2022-10

Key Points in this Paper
trade off hash size and rejection time. allowable error rate. only false positive good for applications rejecting most of items the algorithrm is super simple and easy to implement
Paper Details	My Notes
Page 423: Two methods	The paper listed two methods but only Method 2 is common used.
Possible Improvements and Open Questions
Double Hashing	Instead of using k hashing functions, double hashing is commonly used. See this post and the paper behind it ("Less Hashing, Same Performance: Building a Better Bloom Filter").
Hashing Function	All hashing functions I can found in real implementations are Murmur family, e.g., guava (murmur3), libbloom (murmur2).
Num of Entries	All probability analysis are based on num of insertion entries, which is maximum entries. I actually saw an application estimating the num of entries incorrectly (unbounded in that use case) which causes very bad performance later.