Can someone explain how the Count Sketch Algorithm works? I still can't figure out how hashes are used, for example. I have a hard time understanding this paper.
This streaming algorithm instantiates the following framework.
Find a randomized streaming algorithm whose output (as a random variable) has the desired expectation but usually high variance (i.e., noise).
To reduce the variance/noise, run many independent copies in parallel and combine their outputs.
Usually 1 is more interesting than 2. This algorithm's 2 actually is somewhat nonstandard, but I'm going to talk about 1 only.
Suppose we're processing the input
a b c a b a .
With three counters, there's no need to hash.
a: 3, b: 2, c: 1
Let's suppose however that we have just one. There are eight possible functions h : {a, b, c} -> {+1, -1}
. Here is a table of the outcomes.
h |
abc | X = counter
----+--------------
+++ | +3 +2 +1 = 6
++- | +3 +2 -1 = 4
+-- | +3 -2 -1 = 0
+-+ | +3 -2 +1 = 2
--+ | -3 -2 +1 = -4
--- | -3 -2 -1 = -6
-+- | -3 +2 -1 = -2
-++ | -3 +2 +1 = 0
Now we can calculate expectations
(6 + 4 + 0 + 2) - (-4 + -6 + -2 + 0)
E[h(a) X] = ------------------------------------ = 24/8 = 3
8
(6 + 4 + -2 + 0) - (0 + 2 + -4 + -6)
E[h(b) X] = ------------------------------------ = 16/8 = 2
8
(6 + 2 + -4 + 0) - (4 + 0 + -6 + -2)
E[h(c) X] = ------------------------------------ = 8/8 = 1 .
8
What's going on here? For a
, say, we can decompose X = Y + Z
, where Y
is the change in the sum for the a
s, and Z
is the sum for the non-a
s. By the linearity of expectation, we have
E[h(a) X] = E[h(a) Y] + E[h(a) Z] .
E[h(a) Y]
is a sum with a term for each occurrence of a
that is h(a)^2 = 1
, so E[h(a) Y]
is the number of occurrences of a
. The other term E[h(a) Z]
is zero; even given h(a)
, each other hash value is equally likely to be plus or minus one and so contributes zero in expectation.
In fact, the hash function doesn't need to be uniform random, and good thing: there would be no way to store it. It suffices for the hash function to be pairwise independent (any two particular hash values are independent). For our simple example, a random choice of the following four functions suffices.
abc
+++
+--
-+-
--+
I'll leave the new calculations to you.