Buy Instant PDF Access
Available.
Instant access upon order completion.
Share
Efficient Update Control of Bloom Filter Replicas in Large Scale Distributed Systems
Yifeng Zhu, Hong Jiang
Copyright: © 2010
|Pages: 23
DOI: 10.4018/978-1-60566-661-7.ch034
Abstract
This chapter discusses the false rates of Bloom filters in a distributed environment. A Bloom filter (BF) is a space-efficient data structure to support probabilistic membership query. In distributed systems, a Bloom filter is often used to summarize local services or objects and this Bloom filter is replicated to remote hosts. This allows remote hosts to perform fast membership query without contacting the original host. However, when the services or objects are changed, the remote Bloom replica may become stale. This chapter analyzes the impact of staleness on the false positive and false negative for membership queries on a Bloom filter replica. An efficient update control mechanism is then proposed based on the analytical results to minimize the updating overhead. This chapter validates the analytical models and the update control mechanism through simulation experiments.
TopIntroduction To Bloom Filters
A standard Bloom filter (BF) (Bloom, 1970) is a lossy but space-efficient data structure to support membership queries within a constant delay. As shown in Figure 1, a BF includes k independent random hash functions and a vector B of a length of m bits. It is assumed that the BF represents a finite set S = {x1, x2,…,xn} of n elements from a universe
. The hash functions hi(x), 1 ≤ i ≤ k, map the universe U to the bit address space [1,m], shown as follows,
H(
x) = {
hi(
x) | 1 ≤
hi(
x) ≤
m for 1 ≤
i ≤
k}
(1)Definition 1. For all
x ∈
U,
B[
H(
x)] ≡ {
B[
hi(
x)] | 1 ≤
i ≤
k}.
Figure 1. A Bloom filter with a bit vector of m bits, and k independent hash functions. When an element x is added into the set represented, all bits indexed by those hash functions are set to 1.
This notation facilitates the description of operations on the subset of
addressed by the hash functions. For example, B[H(x)] = 1 represents the condition in which all the bits in B at the positions of h1(x),…, and hk(x) are 1. “Setting B[H(x)]” means that the bits at these positions in B are set to 1.
Representing the set S using a BF B is fast and simple. Initially, all the bits in B are set to 0. Then for each x ∈ S, an operation of setting B[H(x)] is performed. Given an element x, to check whether
is in
, one only needs to test whether B[H(x)] = 1. If no, then x is not a member of S; If yes, x is conjectured to be in S. Figure 1 shows the results after the element x is inserted into the Bloom filter.
A standard BF has two well-known properties that are described by the following two theorems.
Theorem 1.Zero false negative
For ∀
x ∈
U, if ∃
i,
B[
hi(
x)] ≠ 1, then
![978-1-60566-661-7.ch034.m05](https://igiprodst.blob.core.windows.net:443/source-content/9781605666617_498/978-1-60566-661-7.ch034.m05.png?sv=2015-12-11&sr=c&sig=XBOlwpq%2FGtbZUZI0pkFd4DJorkYQiYtV2o6dVpdc90s%3D&se=2019-11-15T11%3A04%3A16Z&sp=r)
.
For a static set S whose elements are not dynamically deleted, the bit vector indexed by those hash functions always never returns a false negative. The proof is easy and is not given in this chapter.
Key Terms in this Chapter
Distributed Membership Query: Membership query is one fundamental function that reports where the target data, resource, or service is located. The membership query can be performed by a centralized server or by a group of distributed server. The latter approach has a stronger scalability and is referred as distributed memory query.
False Positive: A false positive happens when an element is not a member of the set that a Bloom filter represents but the Bloom filter mistakenly reports it is. The probability of false positives can be very slow when the Bloom filter is appropriately designed.
Bloom Filter Array: A Bloom filter array, consisted of multiple Bloom filters, represents multiple sets. It is a space-efficient data structure to evaluate whether an element is within these sets and which set this element belongs to if yes.
False Negative: A false negative happens when an element is a member of the set that a Bloom filter represents but the Bloom filter mistakenly reports it is not. A standard Bloom filter has no false negatives. However, in a distributed system, a Bloom filter replica can generate false negatives when the replica is not timely updated.
Bloom Filter Update Protocol: When the set that a Bloom filter represents is changed over time, the corresponding Bloom filter replica becomes out-dated. In order to reduce the probability that the Bloom filter replica reports the membership incorrectly, the replica needs to be updated frequently. The Bloom filter update protocol determines when a Bloom filter replica needs to be updated.
Bloom Filter: A Bloom filter is a space-efficient data structure that supports membership queries. It consists of a bit array and all bits are initially set to 0. It uses a fix number of predefined independent hash functions. For each element, all hashed bits are set to 1. To check whether an element belongs to the set represented by a Bloom filter, one simply checks all bits pointed by the hash functions are 1. If not, the element is not in the set. If yes, the element is consider as a member.
Bloom Filter Replica: A Bloom filter replica is a replication of a Bloom filter. In a distributed environment, the original and replicated Bloom filters are typically stored on different servers for improved performance and fault tolerance. A Bloom filter replica will generate both false positives and false negatives.