Can someone explain in simple terms how reduce function with its arguments reduceAdd
, reduceSum
, reduceRemove
works in crossfilter
?
Remember that map reduce reduces a dataset by keys of a particular dimension. For example lets use a crossfilter instance with records:
[
{ name: "Gates", age: 57, worth: 72000000000, gender: "m" },
{ name: "Buffet", age: 59, worth: 58000000000, gender: "m" },
{ name: "Winfrey", age: 83, worth: 2900000000, gender: "f" },
{ name: "Bloomberg", age: 71, worth: 31000000000, gender: "m" },
{ name: "Walton", age: 64, worth: 33000000000, gender: "f" },
]
and dimensions name, age, worth, and gender. We will reduce the gender dimension using the reduce method.
First we define the reduceAdd, reduceRemove, and reduceInitial callback methods.
reduceInitial
returns an object with the form of the reduced object and the initial values. It takes no parameters.
function reduceInitial() {
return {
worth: 0,
count: 0
};
}
reduceAdd
defines what happens when a record is being 'filtered into' the reduced object for a particular key. The first parameter is a transient instance of the reduced object. The second object is the current record. The method will return the augmented transient reduced object.
function reduceAdd(p, v) {
p.worth = p.worth + v.worth;
p.count = p.count + 1;
return p;
}
reduceRemove
does the opposite of reduceAdd
(at least in this example). It takes the same parameters as reduceAdd
. It is needed because group reduces are updated as records are filtered and sometimes records need to be removed from a previously computed group reduction.
function reduceRemove(p, v) {
p.worth = p.worth - v.worth;
p.count = p.count - 1;
return p;
}
Invoking the reduce method would look like this:
mycf.dimensions.gender.reduce(reduceAdd, reduceRemove, reduceInitial)
To take a peek at the reduced values, use the all
method. To see the top n values use the top(n)
method.
mycf.dimensions.gender.reduce(reduceAdd, reduceRemove, reduceInitial).all()
The returned array would (should) look like:
[
{ key: "m", value: { worth: 161000000000, count: 3 } },
{ key: "f", value: { worth: 35000000000, count: 2 } },
]
The goals of reducing a dataset is to derive a new dataset by first grouping records by common keys, then reducing a dimension those groupings into a single value for each key. In this case we grouped by gender and reduced the worth dimension of that grouping by adding the values of records that shared the same key.
The other reduceX methods are convience methods for the reduce method.
For this example reduceSum
would be the most appropriate replacement.
mycf.dimensions.gender.reduceSum(function(d) {
return d.worth;
});
Invoking all
on the returned grouping would (should) look like:
[
{ key: "m", value: 161000000000 },
{ key: "f", value: 35000000000 },
]
reduceCount
will count records
mycf.dimensions.gender.reduceCount();
Invoking all
on the returned grouping would (should) look like:
[
{ key: "m", value: 3 },
{ key: "f", value: 2 },
]
Hope this helps :)
Source: https://github.com/square/crossfilter/wiki/API-Reference