Should I worry about this custom reduce?


#1

Hello,

I created a thread about rereduce, but haven’t received any replies. I thought I should post a full source instead.

I have this custom reduce, which sort values by createdAt field then choose one value randomly.

function (key, values, rereduce) {

    function getRandomInt(min, max) {
        return Math.floor(Math.random() * (max - min + 1)) + min;
    }

    function sortByKey(array, key) {
        return array.sort(function(a, b) {
            var x = a[key]; var y = b[key];
            return ((x < y) ? -1 : ((x > y) ? 1 : 0));
        });
    }

    sortByKey(values, 'createdAt');

    return values[getRandomInt(0, values.length - 1)];

}

this has been working fine, but I just can’t be sure if I can simply ignore ‘rereduce’ variable or not.

Is this custom view safe to use?


#2

I think it’s probably okay to ignore rereduce. I believe that’s there if you need to use recursion.

Your view function is curious though. Why sort them if you’re just going to get one at random? Also, have you considering using N1QL instead of a Map/Reduce function?


#3

Hi @matthew.groves

Thank you for the answer!

Yeah, that’s not actually how it should work. I found it out after posting it :slight_smile:

I first tried to use n1ql, but then had an issue. We have about 48 million documents and n1ql index stopped building at around 38% or so. We need to investigate it when we get a chance.


#4

Do you know if there are more docs on ‘rereduce’? I still can’t get my head around it even after reading the doc. The official doc has a few examples, but does not mention ‘why’ and ‘when’ we need to use it clearly.


#5

Re-reduce is used if the range you’re trying to reduce needs to be composed of underlying reductions. Say for instance your key is a compound key [YYYY, MM, DD]. If you just query the whole thing reduced with say a _count, you’ll get the aggregate count of everything in that view.

Now, if you were to query with [2016,09] to [2016,10], the view engine will walk the B+Tree to the beginning of that range and re-reduce the stored aggregation on that node against others in the range. For the _count, that simply means walking across the B+Tree and adding up the stored _counts on the interior nodes of the B+Tree.

You will need re-reduce any time views re-reduces on a different range or if you add data to the range in question.

See also:
http://docs.couchbase.com/developer/dev-guide-3.0/reduce-rereduce.html
http://docs.couchbase.com/couchbase-manual-2.0/#handling-rereduce

Note that whatever you write here needs to be a pure function. The output should always be the same for the same inputs and not subject to (or cause) any side effects.


#6

Hello @ingenthr

Thank you for the insight. I think I need to go through a few more exercises and examples myself to understand it.

In my case, my reduce returns a random value from an array. Is there a problem with it since it is not a pure function?


#7

It’s technically not correct that way, but I don’t see any harm in how you’ve used it at the moment. The main problem is that it will need to be run each time I believe based on what I read. That’ll be async with respect to the query unless you use stale_false, but that’s okay.


#8

Thank you again @ingenthr

I see that it re-indexes frequently, but it’s used by a temporary api. I think I’m okay then :slight_smile: