Key Points in this Paper
|
|
- trade off hash size and rejection time.
- allowable error rate. only false positive
- good for applications rejecting most of items
- the algorithrm is super simple and easy to implement
|
|
Paper Details
|
My Notes
|
Page 423: Two methods
|
The paper listed two methods but only Method 2
is common used.
|
Possible Improvements and Open Questions
|
|
Double Hashing
|
Instead of using k hashing functions, double
hashing is commonly used. See this
post and the
paper
behind it ("Less Hashing, Same Performance: Building a
Better Bloom Filter").
|
Hashing Function
|
All hashing functions I can found in real
implementations are Murmur family, e.g.,
guava (murmur3),
libbloom (murmur2).
|
Num of Entries
|
All probability analysis are based on num of
insertion entries, which is maximum entries. I
actually saw an application estimating the num
of entries incorrectly (unbounded in that use
case) which causes very bad performance later.
|