viewQuery using reduce with group_level

giles · February 9, 2018, 3:28pm

I’ve got a view on my bucket with the built-in _count reduce function. If I set group_level to 1 I get all of the unique key, together with a count of docs with that key, which is great. However, there is no metadata with a total_rows count. This would be really useful to find out how many unique keys there are without having to get all of the keys.

If I remove group_level and set “reduce” to false, the query does return total_rows (I can even set limit to 0 so that I don’t actually get any of the values, which returns a very quick result). However, this is the total_rows of all docs output from the map function, not just the unique keys.

I can see two possible solutions to this:

Read all of the unique keys (group_level 1) and call rows.length. As there could be several hundred thousand unique keys I really don’t want to do this just to find out how many unique keys there are, mostly because it’s slow with a lot of keys.
Use a custom reduce function, similar to the _count function, but just returning 1 for the value on the first pass, then count the results on the rereduce pass (I don’t care about the count of each unique values, I just want to know how many there are).

function(key, values, rereduce) {
  var retval;
  
  if (!rereduce) {
    retval = 1;
  } else {
    retval = 0;
    for (i = 0; i < values.length; i++) {
      retval += values[i];
    }
  }
  
  return retval;
}

However, this does not work! In my current data there are 32,628 unique keys. However, the reduce from this function (group_level = 0) returns 2620.

What is wrong with this reduce function? Or is there another solution to this problem? It doesn’t seem to me like an unusual use case…?

Thanks,
Giles

brett19 · February 9, 2018, 4:49pm

Hey @giles,

I don’t believe that the view executor is actually capable of returning the number of rows as part of its processing due to the fact that the rows get post-processed to such a high degree. I’ll poke someone who knows a bit more about this to try and confirm that that is the case.

In terms of the view function that you have written, I think there may be a slight miss-understanding in terms of how rereduce works. The view function is essentially invoked on a subset of the total rows each time generating an intermediary set of numbers, following this, a subset of the intermediary results are passed as a rereduce to allow them to be processed further, each time this occurs the view function takes a set of rows and generates a single result. This goes on as long as needed in order to reduce the set down to a set of numbers matching the group level. In practice, this means that it is not possible to ‘control’ the result-set at the level that you are looking for (being able to return the number of rows in the first entry) as there the reductions do not have any concept of their location within the final result set.

The only way that I can imagine being able to solve this would be to use two separate view calls. One to get the result counts, and then another call to get the actual resulting rows.

Cheers, Brett

giles · February 9, 2018, 5:11pm

Thanks for the reply Brett. I don’t really understand how the rereduce works but I get that I can’t use it in the way that I was attempting.

“The only way that I can imagine being able to solve this would be to use two separate view calls. One to get the result counts, and then another call to get the actual resulting rows.”

Can you expand on this please as I don’t understand. I currently have a view that gets the result counts (I think). This produces either a count of total docs in the map, or a count of all the docs for each key at the group level I request. However, I don’t see how I can use this to get the number of unique keys, and I don’t see how a separate view call helps. What am I missing?

Thanks,
Giles

brett19 · February 9, 2018, 5:47pm

Hey @giles,

Turns out that I may have miss-understood what you were trying to achieve here. I think the way to go about getting the count of unique keys would involve grouping based on the key itself (to reduce that level) but I’m not actually sure. Let me pull in an expert.

Cheers, Brett

giles · February 9, 2018, 6:08pm

Hi Brett,

I think I am doing grouping based on the key itself, unless I’m misunderstanding.

Thanks for talking to your expert. Here’s some more detail on what I’m trying to achieve so that it’s clear…

Assume you have the following docs:

{ “name”: “Giles”, “report”: 1 }
{ “name”: “Giles”, “report”: 2 }
{ “name”: “Brett”, “report”: 14 }
{ “name”: “Brett”, “report”: 16 }
{ “name”: “Brett”, “report”: 7 }
{ “name”: “Joe”, “report”: 3 }

I would like to make a query that returns the number of unique names. I don’t really care how many reports each name has. So in this case I would like to return “3”.

With the following view:

map: function(doc, meta) { if (doc.name) emit(doc.name); }
reduce: _count

I can query this and get a total number of docs with a name by setting group_level to 0 (or just don’t specify group or group_level which does the same thing). So that would return “6”. I can also set group_level to 1, which would return the following:

[ {key: “Giles”, value: 2 }, {key: “Brett”, value: 3}, {key: “Joe”, value: 1} ]

Alternatively, I can set reduce to false and limit to 0, which will return no rows, but the query metadata would contain total_rows = 6.

However, as far as I know, there is no way to use this view (or even a different view) to just return “3”. The only way that I can see is to use group_level = 1 to return an array with all the unique names then do rows.length. This is fine for a small number of unique names, but if there were a couple of hundred thousand, it begins to get a little slow and I would also worry that there might be other resource issues as this number increases. If the query just returned metadata with the total_rows count (as it does with reduce set to false) this isn’t an issue.

Hope this is all clear.

Thanks,
Giles

socketman2016 · April 9, 2019, 5:37am

@giles Do you found solution?

giles · April 18, 2019, 8:46am

Hi @socketman2016,

As far as I know, it isn’t possible to do this. The only solution I found was as described in my previous comment - use group_level = 1, then rows.length. To cope with a lot of rows, you could do this repeatedly using limit and skip. I don’t need to do this at the moment though.

@brett19 was going to ask an expert whether there is any way to do this. Brett, did you find out anything?

Thanks,
Giles

suraj · April 23, 2019, 6:16am

Hi @giles,

Sorry for the late reply, your solution of using group_level=1, then counting the number of rows is the only way to do it in Views. But, in case the number of distinct users are very large, then reduction would be a very inefficient operation. Note that, using limit and skip, won’t help in your case as the view engine still needs to process all the docs, there will be no increase in speed of the query.

But, even if the number of documents is very large, but the number of distinct users are very less, then it will work well.

Thanks,
Suraj Naik,
Views, Couchbase

giles · April 23, 2019, 9:07am

Thanks for the confirmation @suraj. Is it possible that the view engine might be improved in the future to allow this, or is there no way to do it?

Regarding efficiency of limit and skip, I though the views were calculated in advance, which is why it takes a long time to make a query when the view is created but is very quick after that. If so, then I don’t really understand why the view engine would need to process all docs to return a query with skip. If it’s precalculated, can’t (e.g. skip=100) just skip over 100 precalculated rows without processing the docs again? Or have I misunderstood the way views work?

suraj · April 23, 2019, 10:14am

Hey @giles,

The way the View engine works is that even though you specify skip = 100, the call reads through the 100 elements but ignores them, so there is no increase in the speed or performance.

Thank you,
Suraj